Fri Dec 10 13:54:48 PST 2010
- Previous message: [Slony1-hackers] automatic WAIT FOR proposal
- Next message: [Slony1-hackers] automatic WAIT FOR proposal
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Steve Singer <ssinger at ca.afilias.info> writes: > The Problem > ---------------- > An informal survey of the slony mailing list shows that almost no users > understand how WAIT FOR should be used. I daresay that as a not-unsophisticated user of Slony, I don't always get it right. I think there's only one person who has successful "intuition" on WAIT FOR behaviour, and I don't think it takes much guessing as to who that is! > Proposal > ------------ > The goal is to make slonik handle the proper waiting between events. If > based on the previous set of slonik commands the next command needs to > wait until a previous command is confirmed by another node then slonik > should be smart enough to figure this out and to wait. ... > What would solve this? > > 1) If we had a global ordering on events maybe assigned by a cluster > coordinator node (c) would be able to process the events in the right order. > 2) When a event is created on node (b) if we store the fact that it has > already seen/confirmed event (a),1234 from node (a) we could transmit > this pre-condition as part of the event so node (c) can know that it > can't process the event from b until it has seen 1234 from (a). This > way node (c) will process things in the right order but we can submit > events to (b) - which is up to date without having to wait for the busy > node (c) to get caught up. > 3) We could disallow or discourage the use of multiple event nodes and > require all slonik command events to originate on a single cluster node > (other than store path and maybe subscribe set) and provide facilities > for dealing with cases where that event node fails or is split. > 4) We really do require the cluster be caught up before using a > different event node. This is where we automatically do the WAIT FOR ALL. > > The approach proposed here is to go with (4) where before switching > event nodes slonik will WAIT FOR all nodes to confirm the last event Related to #2... We might introduce a new event that tries to coordinate between nodes. In effect, a "WAIT FOR EVENT" event... So, we submit, against node #1, WAIT_FOR_EVENT (2,355). The intent of this event is that processing of the stream of events for node #1 holds back until it has received event #355 from node #2. That doesn't mandate waiting for *EVERY* node, just one node. Multiple WAIT FOR EVENT requests could get you a "wait on all." Note that this is on the slon side, not so much the slonik side... > 1) STORE PATH - the event node is dictated by how you are setting up the > path. Furthoremore if the backwards path isn't yet set up the node won't > recive the confirm message There's an argument to be made that STORE PATH perhaps should be going directly to nodes, and doesn't need to be involved in event propagation. It's pretty cool to propagate STORE PATH requests everywhere, but it's not hugely necessary. ...[erm, rethinking]... The conninfo field only ever matters on the node where it is used. But computation of listen paths requires that all nodes have the [from,to] data. So there's a partial truth there. conninfo isn't necessary, but [from,to] is... > 2) SUBSCRIBE set (in 2.0.5+) always gets submitted at the origin. So if > you are subscribing multiple sets slonik will switch event nodes. This > means that subscribing to multiple sets (with different set origins) in > parallel will be harder (you will need to disable automatic wait-for or > use different slonik invocations). You can still do parallel subscribes > to the same set because the subscribe set always goes to the origin in > 2.0.5+ not the provider or the receiver. I have always been a little uncomfortable about this change, and this underlines that discomfort. But that doesn't mean I'm right... > 3) STORE/DROP listen goes to specific nodes based on the arguments but > you shouldn't need STORE/DROP listen commands anyway in 1.2 or 2.0 Right. > 4) CREATE/DROP SET must go to the set origin. If your creating sets the > cluster probably needs to be caught up. And if these events are lost, due to a FAILOVER partition or such, if they were only in the partition of the cluster that was lost, it doesn't matter... > 5) ADD TABLE/ADD SEQUENCE - must go to the origin. Again if your > manipulating sets you must stick to a single set origin or have your > cluster be caught up > 6) MOVE TABLE goes to the origin - but the docs already warn you about > trying this if your cluster isn't caught up (with respect to this set) > 8) MOVE SET - Doing this with a behind cluster is already a bad idea > 9) FAILOVER - See multi-node failover discussion There's a mix of needful semantics here. For instance, SET ADD TABLE/SEQUENCE only forcibly need to propagate alongside the successful propagation of subscriptions to those sets. That's different from the propagation needs for other events. It seems to me that we might want to classify the "propagation needs"; if there are good names for the different classifications, then we're likely really onto something. Good names aren't arriving to me on Friday afternoon :-). > STORE PATH > ----------- > A WAIT FOR ALL nodes won't work unless all of the paths are stored. > When I say 'all' I mean there must exist a route from every node to > every other node. The routes don't need to be direct. There are certain > common usage patterns that shouldn't be excluded. It would be good if > slonik could detect missing paths before 'changing things' because > otherwise users might be left with a half complete script. I'd classify this two ways: a) When bootstrapping a cluster, WAIT FOR ALL can't work if there aren't enough paths yet. I'm not sure it makes sense to go to the extent of computing spanning trees or such to validate this. If we try to validate at every point, then you can't have a sequence of... Set up all nodes... INIT CLUSTER STORE NODE STORE NODE STORE NODE Then, set up paths... STORE PATH STORE PATH STORE PATH STORE PATH It seems like a logical idea to construct a cluster by setting up all nodes, then to set up communications between them. It doesn't thrill me if we make that impossible. > The easy answer is: Don't write scripts that can leave your cluster in > an indetermined state. What we should do if someone tries is an open > question. We could a) check that all code paths (cross product) leave > the cluster consistent/complete. b) Assume the try blocks always finish > successfully c) don't do the parse tree analysis described above for the > entire script at parse time but instead do it for each block before > entering that block. > I am leaning towards c. If we're going down a "prevent nondetermined states" road, then it seems to me there needs to be a presentation of a would-be algebra of cluster states so we can talk about this analytically. I think having that algebra is a prerequisite to deciding between any of those alternatives. > - How do we want to handle TRY blocks. See discussion above WAIT FOR and TRY are right well incompatible with each other, unless we determine, within the algebra, that there is some subset of commands that make state changes that we consider don't need to be guarded by WAIT FOR that are permissible in a TRY block. -- select 'cbbrowne' || '@' || 'afilias.info'; Christopher Browne "Bother," said Pooh, "Eeyore, ready two photon torpedoes and lock phasers on the Heffalump, Piglet, meet me in transporter room three"
- Previous message: [Slony1-hackers] automatic WAIT FOR proposal
- Next message: [Slony1-hackers] automatic WAIT FOR proposal
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-hackers mailing list