Jaime Casanova jcasanov at systemguards.com.ec
Mon Apr 26 13:55:21 PDT 2010
On Mon, Apr 26, 2010 at 3:46 PM, Steve Singer <ssinger at ca.afilias.info> wrote:
> Jan Wieck wrote:
>
>>
>> That copy_set() failed due to the catalog inconsistency. What Jaime
>> tried then was an UNSUBSCRIBE SET, which slonik issued against the half
>> subscribed node 2, deleting the sl_subscribe row. The code in copy_set()
>> doesn't use the parameters from the event, but expects the in memory
>> runtime configuration data to know the data provider for the set. Since
>> the sl_subscribe row is gone now, that information is missing and the -1
>> is the default value for a set, the node isn't subscribed to.
>>
>> I don't know exactly what the right fix for this bug is. My first gut
>> feeling is to ignore the ENABLE_SUBSCRIPTION and generate another
>> UNSUBSCRIBE_SET event just to clear out any sl_subscribe row existing in
>> the cluster. Since I am in Toronto right now, I can discuss this with
>> Steve Singer tomorrow morning.
>
> The approaches that come to mind are:
>
> 1) When slon processes an ENABLE_SUBSCRIPTION but is unable to find the
> sl_subscribe row log a warning and either continue on or do some cleanup
> if required to ensure the set really is unsubscribed. There might be
> other commands we want to do this for as well (move set? merge set?)
>

the cleanup part seems safer than just continue... actually, that's
what i think will happen but instead i see it continually trying to
complete the subscription and failing again and again... maybe a max
for retries before give up and cleanup?

> 2) Modify things so that the UNSUBSCRIBE action won't get processed on
> the subscriber if there is an inactive subscription that has been stored
> but not yet enabled.
>

+1

> 3) Modify the flow of unsubscribes so they get inserted into the event
> queue of the origin and are processed in order.  Problems with this
> include that if your set is already subscribed but your origin looses
> communication with the subscriber (or if your subscriber is really far
> behind say because every row in a large table was updated) your
> unsubscribe request won't be processed until the subscriber is caught up
> (which serves little point because you are unsubscribing).
>

agreed that doesn't seems very useful...

> I'm inclined to say that 2 is the correct solution.  However, if you do
> a subscribe and your copy set fails (as happened to Jamie) there is no
> easy way to not subscribe.

why is that?

> I think we also need a way of safely
> removing some commands from the slony command queue (a 2.1 feature maybe).
>

mmm... a safe way of shooting yourself in the foot? yeah i like it ;)

-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


More information about the Slony1-general mailing list