Christopher Browne cbbrowne at ca.afilias.info
Fri Nov 19 08:50:12 PST 2010
Stuart Bishop <stuart at stuartbishop.net> writes:
> On Thu, Nov 18, 2010 at 5:06 AM, Christopher Browne
> <cbbrowne at ca.afilias.info> wrote:
>> A thing that several of us have been ruminating over for a while is the
>> problem that people get confused about how you submit Slonik scripts,
>> you may have some actions that require waits.
>>
>> For instance if it takes 20 minutes for SUBSCRIBE SET to complete, it's
>> pretty likely that you want to wait for that to be complete before
>> proceeding with other configuration that depends on it.
>>
>> That is already supported today, after a fashion - you 'merely' need to
>> sprinkle your Slonik script with WAIT FOR EVENT requests.
>>
>> But the word 'merely' seems unfair; it is rarely particularly obvious
>> what semantics are appropriate.  (It is frequently not obvious to me,
>> and I have touched a lot of the Slony codebase!)
>>
>> The "obvious" thought which has occurred is to have Slonik commands
>> automatically wait for the appropriate events.  In effect, we'd go thru
>> each Slonik command, and have it automatically call slonik_wait_event()
>> (found in src/slonik/slonik.c), or some refactoring thereof.
>>
>> A few questions and issues occur to me...
>>
>> 1.  Does this seem like a worthwhile exercise?  (Alternatively...  Are
>> there other Much Bigger Issues that should be looked at first?)
>
> I'd love it. I've gotten into the habit of sync/wait after nearly
> every statement to avoid shooting myself in the foot.
>
>> Some of these may need some more functionality - SUBSCRIBE SET generates
>> a pair of events so that it may be necessary to wait for a subsequent
>> event, and perhaps to request synthesizing a SYNC, and waiting for
>> *that*.
>
> Which is exactly why I do sync/wait. I never know what node I should
> be waiting for confirmation from for a particular statement, so just
> shove a sync though the system and wait for my entire cluster to
> process it.

I can't offer a straight answer on that, which is to say that you're
onto something :-).

>> 4.  Do we want overrides?
>>
>> Perhaps some might want the ability to revert to today's functionality,
>> so one can run a "fire and forget" series of SUBSCRIBE SET requests.
>
> How do you know what statements you can "fire and forget"? Is this
> something guaranteed to not change between Slony releases? Its
> something that has never been clear to me and I just don't risk it any
> more.

Nobody goes in looking to introduce incompatibilities, but if we
guarantee "it won't change between releases," that might guarantee we
can't make improvements, so I expect that's a non-starter.

Again, methinks you're onto something...

>> 6.  Do we possibly need for there to be a way to force aborting a script
>> if the WAIT times out?
>
> I'd be more interested in knowing why something is blocked that having
> my scripts abort midway, leaving my system in an indeterminate state.

Yep, that's a larger issue that lurks there.  Probably this needs to be
thought about a bit more before deciding on solutions.

>> 7.  I think this invalidates TRY { } ON ERROR { } ON SUCCESS { }
>>    handling, for the most part.
>>
>>    At the very least, if we're waiting for things to succeed on a
>>    remote node, it invalidates the notion that we're performing the
>>    contents of the TRY block as a single transaction on the initial
>>    node.
>>
>>    It's not particularly obvious how TRY requests get grouped into a
>>    single "transaction" anyways.  Perhaps this points at there being
>>    something invalid/broken about TRY.
>
> Try seems broken to me. If it only works in limited circumstances then
> attempting to use non-transactional commands inside it should fail
> (whatever they are - how do you know?). We might be able to get things
> working in a properly transactional manner if we used two-phase
> commit.

I think every year I've talked about Slony at PGCon or similar, there
has been a brief discussion about 2PC, which hasn't headed towards "we
want to use it."

I wonder if we could set it up to fail if one requests unsuitable
commands in the block.  That in effect requires documenting clearly (to
the point of them being expressed as part of the compiler grammar!) the
delineation between suitable and non-suitable.

[Adding this...]

>> 8.  This is possibly not totally friendly towards tools like pgAdmin,
>>    which, I think, presently take the approach that all you need do
>> to    configure Slony clusters is to call the stored functions.
>
> I suspect tools that talk directly to the stored functions do so due
> to limitations in slonik, which you are attempting to address. I've
> been considering dropping slonik and going direct to stored SQL
> myself, but that too doesn't seem clear to me (what statements need to
> be run on what node in what order waiting for what confirmations).

No, we were hoping that a GUI might emerge via people coding to talk to
the stored functions.  That's a feature, not a bug, or at least, that's
what was hoped...

There are a few operations (FAILOVER comes particularly to mind) where
it's necessary to talk to several nodes, so that it's problematic to
just "fire a stored function."

Operations that are rather "thicker" than just a wrapper around a stored
function are:

- EXECUTE SCRIPT
   - Splits DDL into a series of statements

- WAIT FOR EVENT
   - Checks multiple nodes

- STORE NODE
   - Pulls configuration from source node to copy to new node

- FAILOVER
   - Has rather a lot of logic!

FYI, I'm *very* pleased to see discussion continue on this thread; the
valuable thing is for ideas to fall out of it, and they surely are doing
so.  I don't know what will get done out of it, but ideas are needed in
order to have a "so what next?"
-- 
(reverse (concatenate 'string "ofni.sailifa.ac" "@" "enworbbc"))
Christopher Browne
"Bother,"  said Pooh,  "Eeyore, ready  two photon  torpedoes  and lock
phasers on the Heffalump, Piglet, meet me in transporter room three"


More information about the Slony1-general mailing list