<span style='color:000000;'>I'm trying to run a failover on a three node cluster (for testing purposes) and it doesn't seem to be working, no matter how I try it.<br />
<br />
I've tried running the following in slonik:<br />
<br />
node 1 admin conninfo = 'dbname=$dbname host=$host1 port=$port user=$user ';<br />
node 2 admin conninfo = 'dbname=$dbname host=$host2 port=$port user=$user ';<br />
node 3 admin conninfo = 'dbname=$dbname host=$host3 port=$port user=$user ';<br />
<br />
echo 'Failing over...';<br />
<br />
failover (id = 1, backup node = 2);<br />
echo 'Dropping node 1...';<br />
drop node (id = 1, event node = 2);<br />
echo 'Failover complete';<br />
<br />
I have also tried to (as per the "Failover With Complex Node Set" instructions) run subscribe set to update the subscription info for other nodes before failing over to node 2, but the subscribe set command fails with "could not connect to server: Connection refused" (even though none of the nodes used in the subscribe set command are the master node). So I went back to just running failover and letting the failover function take care of subscribing nodes and junk.<br />
<br />
The results have been ... well, they have been sort of random. It does occasionally seem to report a successful run, but even then, node 3 usually has some incorrect information about the new structure of the cluster. The most common ocurrance (and the only one I have logs for), though, is that I receive the following output from the above slonik commands:<br />
<br />
<stdin>:13: Failing over...<br />
INFO: calling failedNode(1,2) on node 1<br />
<stdin>:15: NOTICE: failedNode: set 1 has other direct receivers - change providers only<br />
<stdin>:15: PGRES_FATAL_ERROR select "_vmprod".failedNode(1, 2); - ERROR: null value in column "li_provider" violates not-null constraint<br />
CONTEXT: SQL statement "insert into "_vmprod".sl_listen (li_origin, li_provider, li_receiver) select distinct set_origin, sub_provider, $1 from "_vmprod".sl_set, "_vmprod".sl_subscribe where set_origin = $2 and sub_set = set_id and sub_receiver = $3 and sub_active"<br />
PL/pgSQL function "rebuildlistenentries" line 75 at SQL statement<br />
SQL statement "SELECT "_vmprod".RebuildListenEntries()"<br />
PL/pgSQL function "failednode" line 177 at PERFORM<br />
<br />
What would cause that sort of an error? And why won't subscribe set work when the master database is down (especially since the documentation for a failover with a complex node set says to run subscribe set before running the failover and dropping the master)?<br />
<br />
This is on a CentOS 5 box, linux kernel 2.6.18, using postgres 8.3.9 and slony 1.2.20 built from source.<br type="_moz" /></span>