Sebastien Lardiere sebastien at lardiere.net
Wed Feb 27 01:53:11 PST 2008
Hello 

I'm testing a setup with Pg 8.3.0 and Slony 1.2.13 on RHEL4.

I've got setup 3 box, and run Slony. Everything is ok (replication, move set, drop node, store node, ...) , until i try a failover. 

The script is : 

#!/bin/sh
slonik <<_EOF_
cluster name = foo_repl;
node 1 admin conninfo = 'dbname=bar host=10.99.29.38 user=postgres';
node 2 admin conninfo = 'dbname=bar host=10.99.29.49 user=postgres';
node 3 admin conninfo = 'dbname=bar host=10.99.29.120 user=postgres';

#failover (id = 2, backup node = 1);

_EOF_

And the output is : 

[root at foo01 ~]# ./slonik_failover.sh
<stdin>:7: NOTICE:  failedNode: set 1 has other direct receivers - change providers only
<stdin>:7: PGRES_FATAL_ERROR select "_qsr_repl".failedNode(2, 1);  - ERROR:  null value in column "li_provider" violates not-null constraint
CONTEXT:  SQL statement "INSERT INTO "_qsr_repl".sl_listen (li_origin, li_provider, li_receiver) select distinct set_origin, sub_provider,  $1  from "_qsr_repl".sl_set, "_qsr_repl".sl_subscribe where set_origin =  $2  and sub_set = set_id and sub_receiver =  $3  and sub_active"
PL/pgSQL function "rebuildlistenentries" line 75 at SQL statement
SQL statement "SELECT  "_qsr_repl".RebuildListenEntries()"
PL/pgSQL function "failednode" line 155 at PERFORM

I debug this pl and see that slony try to use node 2 ( old master in failover ), and i think it's not correct. Nothing is done, and, if i re-activate the old master, replication work. 

I try to switch over with move set, it's ok, but a new failover with the new master failed with the same error. 

I unable to reproduce this bug on Debian boxes, with the same version of slony and PostgreSQL : it works !

WTF ?

--
Sébastien 



More information about the Slony1-general mailing list