Ouray Viney ouray at viney.ca
Mon Jan 28 14:05:31 PST 2008
Hi All:

I have spent most of the day parsing through the documentation, but have
been unsuccessful in solving the problem.

Environment setup:

1 master (host=3Dpgmaster, db=3Dssm)
1 slave (host=3Dpgslave1, db=3Dssm_slave)

Commands run on the master:

Note:  I will not include the db/user/pgplsql commands

1. run the schema script

 [root at caotta000381 scripts]# psql -U ssm -d ssm -f ~/sessionmanager-
postgresql-create.sql

2. initialize the cluster (run on master node)

[root at caotta000381 ~]# slonik_init_cluster

# INIT CLUSTER
cluster name =3D ssm_cluster;
 node 1 admin conninfo=3D'host=3Dpgmaster dbname=3Dssm user=3Dpostgres port=
=3D5432';
 node 2 admin conninfo=3D'host=3Dpgslave1 dbname=3Dssm_slave user=3Dpostgres
port=3D5432';
  init cluster (id =3D 1, comment =3D 'Node 1 - ssm at pgmaster');

# STORE NODE
  store node (id =3D 2, event node =3D 1, comment =3D 'Node 2 - ssm_slave at p=
gslave1
');
  echo 'Set up replication nodes';

# STORE PATH
  echo 'Next: configure paths for each node/origin';
  store path (server =3D 1, client =3D 2, conninfo =3D 'host=3Dpgmaster dbn=
ame=3Dssm
user=3Dpostgres port=3D5432');
  store path (server =3D 2, client =3D 1, conninfo =3D 'host=3Dpgslave1
dbname=3Dssm_slave user=3Dpostgres port=3D5432');
  echo 'Replication nodes prepared';
  echo 'Please start a slon replication daemon for each node';
[root at caotta000381 ~]# slonik_init_cluster | slonik
<stdin>:10: Set up replication nodes
<stdin>:13: Next: configure paths for each node/origin
<stdin>:16: Replication nodes prepared
<stdin>:17: Please start a slon replication daemon for each node

3. Start slon process

[root at caotta000381 ~]# slon_start 1
Invoke slon for node 1 - /usr/bin/slon -s 1000 -d2 ssm_cluster
'host=3Dpgmaster dbname=3Dssm user=3Dpostgres port=3D5432' 2>&1 >
/var/log/slony/slony1/node1/ssm-2008-01-28_13:47:29.log &
Slon successfully started for cluster ssm_cluster, node node1
PID [12806]
Start the watchdog process as well...
[root at caotta000381 ~]# slon_start 2
Invoke slon for node 2 - /usr/bin/slon -s 1000 -d2 ssm_cluster
'host=3Dpgslave1 dbname=3Dssm_slave user=3Dpostgres port=3D5432' 2>&1 >
/var/log/slony/slony1/node2/ssm_slave-2008-01-28_13:47:38.log &
Slon successfully started for cluster ssm_cluster, node node2
PID [12849]
Start the watchdog process as well...
[root at caotta000381 ~]#

4. build skeleton for slon_tools.conf

build the skeleton script for slon_tools.conf

slonik_build_env -node pgmaster:ssm:postgres -node
pgslave1:ssm_slave:postgres

[root at caotta000381 pgsql]# slonik_build_env -node pgmaster:ssm:postgres
-node pgslave1:ssm_slave:postgres
&add_node(host =3D> 'pgmaster', dbname =3D> 'ssm', port =3D>5432,
        user=3D>'postgres', password=3D>'', node=3D>1 );
&add_node(host =3D> 'pgslave1', dbname =3D> 'ssm_slave', port =3D>5432,
        user=3D>'postgres', password=3D>'', node=3D>2 , parent=3D>1);
@KEYEDTABLES=3D(
        "public.managednumberrouteset",
        "public.mediationstrategy",
        "public.nonmanagednumbestfitrouteset",
        "public.normalization",
        "public.normalizationtype",
        "public.numberforwarding",
        "public.outboundrouteorder",
        "public.outboundrouteset",
        "public.route",
        "public.routetype",
        "public.routetype_normalization",
);
@SEQUENCES=3D(
        "public.nonmanagednumbestfitrouteset_outboundroutelogicid_seq",
        "public.normalization_normalizationid_seq",
);

5.  Add that data to the slon_tools.conf

6. Create the replication sets

[root at caotta000381 etc]# slonik_create_set 1
cluster name =3D ssm_cluster;
 node 1 admin conninfo=3D'host=3Dpgmaster dbname=3Dssm user=3Dpostgres port=
=3D5432';
 node 2 admin conninfo=3D'host=3Dpgslave1 dbname=3Dssm_slave user=3Dpostgres
port=3D5432';

# TABLE ADD KEY

# CREATE SET
  try {
    create set (id =3D 1, origin =3D 1, comment =3D 'Set 1 for ssm_cluster'=
);
  } on error {
    echo 'Could not create subscription set 1 for ssm_cluster!';
    exit -1;
  }

# SET ADD TABLE
  echo 'Subscription set 1 created';
  echo 'Adding tables to the subscription set';
  set add table (set id =3D 1, origin =3D 1, id =3D 1,
                 full qualified name =3D 'public.nonmanagednumbestfitroutes=
et',
key=3D'public.normalization',
                 comment =3D 'Table public.nonmanagednumbestfitrouteset with
candidate primary key public.normalization');
  echo 'Add candidate primary keyed table
public.nonmanagednumbestfitrouteset';
  set add table (set id =3D 1, origin =3D 1, id =3D 2,
                 full qualified name =3D 'public.outboundrouteorder', key=
=3D'
public.outboundrouteset',
                 comment =3D 'Table public.outboundrouteorder with candidate
primary key public.outboundrouteset');
  echo 'Add candidate primary keyed table public.outboundrouteorder';
  set add table (set id =3D 1, origin =3D 1, id =3D 3,
                 full qualified name =3D 'public.normalizationtype', key=3D'
public.numberforwarding',
                 comment =3D 'Table public.normalizationtype with candidate
primary key public.numberforwarding');
  echo 'Add candidate primary keyed table public.normalizationtype';
  set add table (set id =3D 1, origin =3D 1, id =3D 4,
                 full qualified name =3D 'public.route', key=3D'public.rout=
etype
',
                 comment =3D 'Table public.route with candidate primary key
public.routetype');
  echo 'Add candidate primary keyed table public.route';
  set add table (set id =3D 1, origin =3D 1, id =3D 5,
                 full qualified name =3D 'public.managednumberrouteset', ke=
y=3D'
public.mediationstrategy',
                 comment =3D 'Table public.managednumberrouteset with
candidate primary key public.mediationstrategy');
  echo 'Add candidate primary keyed table public.managednumberrouteset';
  set add table (set id =3D 1, origin =3D 1, id =3D 6,
                 full qualified name =3D 'public.routetype_normalization',
key=3D'',
                 comment =3D 'Table public.routetype_normalization with
candidate primary key ');
  echo 'Add candidate primary keyed table public.routetype_normalization';

# SET ADD SEQUENCE
  echo 'Adding sequences to the subscription set';
  set add sequence (set id =3D 1, origin =3D 1, id =3D 1,
                    full qualified name =3D '
public.nonmanagednumbestfitrouteset_outboundroutelogicid_seq',
                    comment =3D 'Sequence
public.nonmanagednumbestfitrouteset_outboundroutelogicid_seq');
  echo 'Add sequence
public.nonmanagednumbestfitrouteset_outboundroutelogicid_seq';
  set add sequence (set id =3D 1, origin =3D 1, id =3D 2,
                    full qualified name =3D '
public.normalization_normalizationid_seq',
                    comment =3D 'Sequence
public.normalization_normalizationid_seq');
  echo 'Add sequence public.normalization_normalizationid_seq';
  echo 'All tables added';

  [root at caotta000381 etc]# slonik_create_set 1 | slonik
<stdin>:16: Subscription set 1 created
<stdin>:17: Adding tables to the subscription set
<stdin>:18: PGRES_FATAL_ERROR select "_ssm_cluster".determineIdxnameUnique('
public.nonmanagednumbestfitrouteset', 'public.normalization');  - ERROR:
Slony-I: table "public"."nonmanagednumbestfitrouteset" has no unique index
public.normalization

Note:  I ignored the error and moved on.  Perhaps this was a bad thing?

7. Subscribed the nodes to set

[root at caotta000381 etc]# slonik_subscribe_set 1 2
cluster name =3D ssm_cluster;
 node 1 admin conninfo=3D'host=3Dpgmaster dbname=3Dssm user=3Dpostgres port=
=3D5432';
 node 2 admin conninfo=3D'host=3Dpgslave1 dbname=3Dssm_slave user=3Dpostgres
port=3D5432';
  try {
    subscribe set (id =3D 1, provider =3D 1, receiver =3D 2, forward =3D ye=
s);
  }
  on error {
    exit 1;
  }
  echo 'Subscribed nodes to set 1';

  [root at caotta000381 etc]# slonik_subscribe_set 1 2 | slonik
<stdin>:10: Subscribed nodes to set 1

8. When using the altperl scripts to setup the cluster, it isn't clear what
tasks need to be performed on the slave node.  From what I read, the only
thing that needs to be done on the slave nodes it what I read here:
http://linuxfinances.info/info/addthings.html section: (12.4. Adding A Node
To Replication).

9. Since I was a little confused I ran the following on the slave node
assuming that I was doing the right thing (in accordance with the above
reference):

I ran this on the slave node:

cluster name =3D ssm_cluster;
 node 1 admin conninfo=3D'host=3Dpgmaster dbname=3Dssm user=3Dpostgres port=
=3D5432';
 node 2 admin conninfo=3D'host=3Dpgslave1 dbname=3Dssmslave user=3Dpostgres
port=3D5432';
  try {
    subscribe set (id =3D 1, provider =3D 1, receiver =3D 2, forward =3D ye=
s);
  }
  on error {
    exit 1;
  }
  echo 'Subscribed nodes to set 1';

10. I inserted some data into the master db.

11. Checked the slon logs and didn't see any inserts getting replicated, so
I knew something was wrong.

12. I ran the check_slony_cluster.sh on both nodes:

Slave:
-bash-3.00$ ./check_slony_cluster.sh ssm_cluster ssm_slave pgslave1
ERROR - 1 of 1 nodes not in sync

Master:
-bash-3.00$ ./check_slony_cluster.sh ssm_cluster ssm pgmaster
OK - 1 nodes in sync

Running yet another replication test script:

-bash-3.00$ ./psql_replication_check.pl -h pgslave1 -d ssm_slave -p 5432 -U
postgres
ERROR:  relation "replication_status" does not exist
-bash-3.00$ ./psql_replication_check.pl -h pgmaster -d ssm -p 5432 -U ssm
ERROR:  relation "replication_status" does not exist

At this point I have no idea were the problem lies.  I can guess my way
through based on the commands that I have run.  But at this point I am not
seeing any replication nor any sign after manipulating the data that any
events were triggering replication.

Here is a snippet of the slon log for the slave node:

<snippet>
2008-01-28 16:57:05 EST DEBUG2 remoteWorkerThread_2: forward confirm 1,1140
received by 2
2008-01-28 16:57:13 EST DEBUG2 syncThread: new sl_action_seq 1 - SYNC 1141
2008-01-28 16:57:13 EST DEBUG2 localListenThread: Received event 1,1141 SYNC
2008-01-28 16:57:16 EST DEBUG2 remoteListenThread_2: queue event 2,457 SYNC
2008-01-28 16:57:16 EST DEBUG2 remoteWorkerThread_2: Received event 2,457
SYNC
2008-01-28 16:57:16 EST DEBUG2 remoteWorkerThread_2: SYNC 457 processing
2008-01-28 16:57:16 EST DEBUG2 remoteWorkerThread_2: no sets need syncing
for this event
2008-01-28 16:57:16 EST DEBUG2 remoteWorkerThread_2: forward confirm 1,1141
received by 2
2008-01-28 16:57:23 EST DEBUG2 syncThread: new sl_action_seq 1 - SYNC 1142
2008-01-28 16:57:23 EST DEBUG2 localListenThread: Received event 1,1142 SYNC
2008-01-28 16:57:27 EST DEBUG2 remoteListenThread_2: queue event 2,458 SYNC
2008-01-28 16:57:27 EST DEBUG2 remoteWorkerThread_2: Received event 2,458
SYNC
2008-01-28 16:57:27 EST DEBUG2 remoteWorkerThread_2: SYNC 458 processing
2008-01-28 16:57:27 EST DEBUG2 remoteWorkerThread_2: no sets need syncing
for this event
2008-01-28 16:57:27 EST DEBUG2 remoteWorkerThread_2: forward confirm 1,1142
received by 2
2008-01-28 16:57:31 EST DEBUG2 remoteListenThread_2: queue event 2,459 SYNC
2008-01-28 16:57:31 EST DEBUG2 remoteWorkerThread_2: Received event 2,459
SYNC
2008-01-28 16:57:31 EST DEBUG2 remoteWorkerThread_2: SYNC 459 processing
2008-01-28 16:57:31 EST DEBUG2 remoteWorkerThread_2: no sets need syncing
for this event
</snippet>

I also did a simple query on the table that I had added data to on both
nodes:

master:
=3D=3D=3D=3D=3D=3D=3D=3D=3D
ssm=3D# select * from outboundrouteset;
   outboundroutesetid   |     comments
------------------------+------------------
 ATP_CMA_ROUTESET_ID_1  | ATP_TEST_COMMENT
 ATP_CMB_ROUTESET_ID_1  | ATP_TEST_COMMENT
 ATP_PSTN_ROUTESET_ID_1 | ATP_TEST_COMMENT
(3 rows)

slave:
=3D=3D=3D=3D=3D=3D=3D=3D=3D
ssm_slave=3D# select * from outboundrouteset;
 outboundroutesetid | comments
--------------------+----------
(0 rows)

>From what I can tell, no sign of any replication occurring, which I would
expect to look something like this:

<snippet>
2007-10-31 06:57:04 EDT DEBUG2 remoteHelperThread_1_1: inserts=3D530
updates=3D530 deletes=3D0
2007-10-31 06:57:04 EDT DEBUG2 remoteHelperThread_1_1: inserts=3D211
updates=3D211 deletes=3D0
2007-10-31 06:57:05 EDT DEBUG2 remoteHelperThread_1_1: inserts=3D197
updates=3D197 deletes=3D0
2007-10-31 06:57:06 EDT DEBUG2 remoteHelperThread_1_1: inserts=3D255
updates=3D255 deletes=3D0
2007-10-31 06:57:07 EDT DEBUG2 remoteHelperThread_1_1: inserts=3D116
updates=3D116 deletes=3D0
</snippet>

Any help in diagnosing replication failures would be greatly appreciated.

Kind Rgds,

Ouray


-- =

Ouray Viney
https://www.viney.ca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-general/attachments/20080128/=
160aecba/attachment-0001.htm


More information about the Slony1-general mailing list