Julian Scarfe julian
Tue Nov 14 12:07:50 PST 2006
> I changed the behaviour of the watchdog process. If the child terminates 
> with exit code zero, it will restart it immediately instead of waiting for 
> 10 seconds.

Thanks, I'll try it with that patched.  That will presumably minimise the 
time that it takes to recover.

> In any case, the multi-set move procedure should lock all sets first, then 
> attempt to move them.

I had previously used a lock-then-move for each set in turn.  I've corrected 
that (lock all 10 then move all 10) but the problem persists as before. 
Should I be using a "wait for event" between each move?  When I tried that 
with lock-then-move it made no difference.

> As for the duplicate key errors, can it be that you have sequences 
> replicated and that those actually are in a different set than the table 
> they belong to?

No, I don't think so.  But what's weird is that the INSERTs that are failing 
on node 3 are all rows that are timestamped with the original insertion time 
on node 1 and they are from between 10 and 20 minutes before the switchover. 
So in some sense they are "old" events and I presume that they have already 
been processed on node 3 because the content of node 3 remains a perfect 
replica of nodes 1 and 2 (at least after slon restarts 10 times).

Hope that helps.

Julian 





More information about the Slony1-general mailing list