Tue Feb 16 13:30:58 PST 2010
- Previous message: [Slony1-general] drop node not working correctly
- Next message: [Slony1-general] drop node not working correctly
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Andy Dale <andy.dale at gmail.com> writes: > However, I have found that if I put a sleep (even 1 second works in my > environment), the DROP_NODE command succeeds, and everything proceeds > happily. > > > > I can also confirm that adding a sleep period between the FAILOVER and DROP > NODE commands seems to work (for 10 seconds in my case) I had a chat with Jan about this on Friday; we both agreed that there seemed to be something wrong with the idea of having the failover being treated as issued by the failed node. After all, if that node is to, momentarily, be treated as "shunned," it doesn't make much sense to have *any* events coming out of it. I'd tend to think that the *new* origin ought to be a good source for that event. Jan thought there might have been some reason why the event *was* submitted on behalf of the failed node. The reason may no longer hold; it'll take a bit of research to determine that. I suppose that a thing to think about is what could break if the event was submitted as being from the new origin. A road to think down... - Suppose the event is treated as coming from new origin - Suppose there is a subscriber that is somewhat behind - Can that subscriber: a) get confused b) outright lose data (say, because sl_log_* gets trimmed too early) as a result of a FAILOVER that takes place under these circumstances? No-development-required answer: Don't drop the failed node until all the other nodes are aware that they shouldn't be using it anymore. Possible change to DROP NODE command: Perhaps DROP NODE should check against *all nodes* that this node isn't considered a provider, and fail if it is. One of the problems with Slony-I, in retrospect, is that it tries a bit too hard to be asynchronous, and that makes it rather hard to debug issues surrounding configuration changes to the cluster. Perhaps configuration should be pretty much synchronous, checking state against all the nodes, and griping if *any* of them disagree. For instance, in this case, if DROP NODE were changed to verify on *all* nodes that the node is no longer in use, that would protect from the problem observed in this discussion thread. -- let name="cbbrowne" and tld="ca.afilias.info" in name ^ "@" ^ tld;; Christopher Browne "Bother," said Pooh, "Eeyore, ready two photon torpedoes and lock phasers on the Heffalump, Piglet, meet me in transporter room three"
- Previous message: [Slony1-general] drop node not working correctly
- Next message: [Slony1-general] drop node not working correctly
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list