Knut Ingvald Dietzel knut.ingvald.dietzel at redpill-linpro.com
Wed Sep 5 02:46:40 PDT 2012
On Fri, Aug 31, 2012 at 08:59:56AM -0400, Steve Singer wrote:
> On 12-08-31 04:16 AM, Knut Ingvald Dietzel wrote:
[cut]
> >  From what I have been able to find out so far, slonik should wait for
> > the slon engine to restart, and then call failedNode2() on the node with
> > the highest SYNC.  Though, from the log above failedNode2() appears to
> > be called twice, the second instance fails in getting lock, and the
> > process of failing over node 1 to 4 fails.
> >
> > Firstly, is my interpretation in the vicinity of being correct?
> 
> When Slonik (<=2.1.x) does a fail over it generates a 'fake'
> FAILOVER event using a ev_origin=$failed_node with the highest
> sequence number it can see of that failed node.  It pushes this
> event into sl_event on one of the remaining nodes.  In the test case
> you describe it sounds like that slon is still running on the failed
> node.  Slony <=2.1.x have numerous race conditions with failover one
> of the ones I've seen is where a 'real' SYNC event ie 1,1234 that
> escaped from the failed node can conflict with the faked FAILOVER
> event 1,1234.

Hi, Steve.

Thanks for the insight, and your explanation sounds very reasonable.

> I rewrote a lot of the failover logic in 2.2 to try to address many
> of these issues.  It should do a much better job at waiting for
> slons to restart etc...  2.2 is still beta and I wouldn't recommend
> it for production use yet but I encourage you to look at it to see
> if it addresses your issues.

That's very good to hear. We'll look into possibilities of testing the
2.2b version.

Again, thanks!


-- 
Best regards,
Knut Ingvald Dietzel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
Url : http://lists.slony.info/pipermail/slony1-general/attachments/20120905/0eeaa5d6/attachment.pgp 


More information about the Slony1-general mailing list