Melvin Davidson mdavidson
Fri Dec 23 18:18:48 PST 2005
slony1-1.1.2
PostgreSQL 8.0.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.3 
20030502 (Red Hat Linux 3.2.3-49)

sl_subscribe is not being updated correctly after a "FAILOVER"

I have the following config
node     1 admin conninfo='dbname=control    host=main.comp.com   
port=5450 user=postgres';
node 101 admin conninfo='dbname=masterdb host=main.comp.com   port=5480 
user=postgres';
node 151 admin conninfo='dbname=masterdb host=slavea.comp.com port=5450 
user=postgres';
node 201 admin conninfo='dbname=masterdb host=slaveb.comp.com port=5480 
user=postgres';
node 251 admin conninfo='dbname=masterdb host=slavec.comp.com port=5480 
user=postgres';

node 1 exists only as a controller and is not subscribed to any node;
node 101 is the initial master
node 151 subscribes to node 101
node 201 subscribes to node 101
node 251 subscribes to node 251

pg_ctl and slon are stopped on node 101 to simulate system down

Before failover I have

sub_set |sub_provider |sub_receiver    |sub_forward |sub_active
1      |101        |151        |t         |t
1      |101        |201        |t         |t
1      |201        |251        |t         |t

However after

failover (id = 101, backup node = 201);

I have
sub_set |sub_provider |sub_receiver    |sub_forward |sub_active
1      |201        |251        |t         |t
1      |151        |201        |t         |t
1      |201        |151        |t         |t

on all nodes! Which is obviously wrong.

I have tried correcting the problem by manually deleting the incorrect 
provider and then
cleaning sl_confirm, sl_event, sl_seqlog and sl_setsync on all nodes with

delete from sl_confirm;
delete from sl_event;
delete from sl_seqlog;
delete from sl_setsync;

after which slon can be restarted, but slony still thinks the new 
provider node is
replicated, as evidenced by

slaveb=# insert into activation_code_prefix
slaveb-# (code_prefix, product_id)
slaveb-# values
slaveb-# ('XX99', 300);
ERROR:  Slony-I: Table activation_code_prefix is replicated and cannot 
be modified on a subscriber node

In plain language, this is very, very bad. :(

A fix or workaround would be greatly appreciated.

TIA,
Melvin Davidson


More information about the Slony1-general mailing list