[Slony1-general] Enhancement Proposal: Automatic WAIT FOR EVENT

Wed Nov 17 14:06:08 PST 2010

A thing that several of us have been ruminating over for a while is the
problem that people get confused about how you submit Slonik scripts,
you may have some actions that require waits.

For instance if it takes 20 minutes for SUBSCRIBE SET to complete, it's
pretty likely that you want to wait for that to be complete before
proceeding with other configuration that depends on it.

That is already supported today, after a fashion - you 'merely' need to
sprinkle your Slonik script with WAIT FOR EVENT requests.

But the word 'merely' seems unfair; it is rarely particularly obvious
what semantics are appropriate.  (It is frequently not obvious to me,
and I have touched a lot of the Slony codebase!)

The "obvious" thought which has occurred is to have Slonik commands
automatically wait for the appropriate events.  In effect, we'd go thru
each Slonik command, and have it automatically call slonik_wait_event()
(found in src/slonik/slonik.c), or some refactoring thereof.

A few questions and issues occur to me...

1.  Does this seem like a worthwhile exercise?  (Alternatively...  Are
there other Much Bigger Issues that should be looked at first?)

2.  This obviously requires going through and determining a reasonable
default "wait policy" for each of the Slonik commands.

It's not a huge list:
  <http://slony.info/documentation/2.0/cmds.html>

Presumably we could analyze this in a tabular fashion:
create table wait_policy (tabname text, origin text, confirmed text,
wait_on text);

foo at localhost->  select * from wait_policy ;
    tabname    |   origin   | confirmed  |  wait_on
---------------+------------+------------+------------
 echo          | none       | none       | none
 exit          | none       | none       | none
 init cluster  | none       | none       | none
 store node    | none       | none       | none
 drop node     | event node | event node | event node
 subscribe set | provider   | receiver   | receiver
(6 rows)

Which obviously needs to get extended a bit :-).

Some of these may need some more functionality - SUBSCRIBE SET generates
a pair of events so that it may be necessary to wait for a subsequent
event, and perhaps to request synthesizing a SYNC, and waiting for
*that*.

Complications are a given, but it seems reasonable to list everything,
handle the simple items simply, and list (and handle) the complications.

3.  How to integrate this in?

It seems to me we'd take whatever falls from #2, and make sure there is
a wait_for_whatever_needs_waiting() function (and possibly helpers) so
that implementation can frequently amount to adding a single line of
code to the functions in src/slonik/slonik.c that implement the Slonik
commands.

As already mentioned, there may be some complications, and hence some
extra functions to help handle the "uglier" scenarios.

4.  Do we want overrides?

Perhaps some might want the ability to revert to today's functionality,
so one can run a "fire and forget" series of SUBSCRIBE SET requests.

I think this could be handled by adding an extra option to Slonik
commands, to suppress waiting:
   NOWAIT

Thus...

   subscribe set (id=1,provider=3,subscriber=4,forward=yes,nowait);

5.  One might go further down the path of #4, and have a series of
options:

  -- Default behaviour:
  drop set(id=3, origin=4);
  -- Also default behaviour, which makes this option pretty worthless
  drop set(id=3, origin=4, wait policy=default);
  -- More complex, and possibly frivolous
  drop set(id=3, origin=4, wait origin=3,wait confirm=4,wait on=1,
           timeout=300);

6.  Do we possibly need for there to be a way to force aborting a script
if the WAIT times out?

   This presumably means Still More Options added to Slonik commands.
   That's not notably horrible - the logic for processing the abort only
   need be written once.

7.  I think this invalidates TRY { } ON ERROR { } ON SUCCESS { }
    handling, for the most part.

    At the very least, if we're waiting for things to succeed on a
    remote node, it invalidates the notion that we're performing the
    contents of the TRY block as a single transaction on the initial
    node.

    It's not particularly obvious how TRY requests get grouped into a
    single "transaction" anyways.  Perhaps this points at there being
    something invalid/broken about TRY.

8.  This is possibly not totally friendly towards tools like pgAdmin,
    which, I think, presently take the approach that all you need do to
    configure Slony clusters is to call the stored functions.

    But perhaps I'm tilting at a windmill here.
-- 
let name="cbbrowne" and tld="afilias.info" in name ^ "@" ^ tld;;
Christopher Browne
"Bother,"  said Pooh,  "Eeyore, ready  two photon  torpedoes  and lock
phasers on the Heffalump, Piglet, meet me in transporter room three"