Steve Singer ssinger at ca.afilias.info
Mon Apr 26 13:46:07 PDT 2010
Jan Wieck wrote:

> 
> That copy_set() failed due to the catalog inconsistency. What Jaime 
> tried then was an UNSUBSCRIBE SET, which slonik issued against the half 
> subscribed node 2, deleting the sl_subscribe row. The code in copy_set() 
> doesn't use the parameters from the event, but expects the in memory 
> runtime configuration data to know the data provider for the set. Since 
> the sl_subscribe row is gone now, that information is missing and the -1 
> is the default value for a set, the node isn't subscribed to.
> 
> I don't know exactly what the right fix for this bug is. My first gut 
> feeling is to ignore the ENABLE_SUBSCRIPTION and generate another 
> UNSUBSCRIBE_SET event just to clear out any sl_subscribe row existing in 
> the cluster. Since I am in Toronto right now, I can discuss this with 
> Steve Singer tomorrow morning.

The approaches that come to mind are:

1) When slon processes an ENABLE_SUBSCRIPTION but is unable to find the 
sl_subscribe row log a warning and either continue on or do some cleanup 
if required to ensure the set really is unsubscribed. There might be 
other commands we want to do this for as well (move set? merge set?)

2) Modify things so that the UNSUBSCRIBE action won't get processed on 
the subscriber if there is an inactive subscription that has been stored 
but not yet enabled.

3) Modify the flow of unsubscribes so they get inserted into the event 
queue of the origin and are processed in order.  Problems with this 
include that if your set is already subscribed but your origin looses 
communication with the subscriber (or if your subscriber is really far 
behind say because every row in a large table was updated) your 
unsubscribe request won't be processed until the subscriber is caught up 
(which serves little point because you are unsubscribing).

I'm inclined to say that 2 is the correct solution.  However, if you do 
a subscribe and your copy set fails (as happened to Jamie) there is no 
easy way to not subscribe.  I think we also need a way of safely 
removing some commands from the slony command queue (a 2.1 feature maybe).





> 
> Thank you Jaime. Your patience on this matter helped to track down a 
> very nasty bug that apparently had been lingering in the system for a 
> long time.
> 
> 
> Jan
> 


-- 
Steve Singer
Afilias Canada
Data Services Developer
416-673-1142


More information about the Slony1-general mailing list