Steve Singer ssinger at ca.afilias.info
Fri Oct 19 06:46:18 PDT 2012
On 12-10-17 09:38 PM, Joe Conway wrote:

>
> But the fact that failover seems so fragile is troubling. If it fails to
> failover so often in our "migration" tests, why should we think that it
> won't fail to failover when we really need it? Is failover fragile
> because we need to STONITH before doing the failover? Would that prevent
> these race conditions?

Yes failover is troubling, it's been on the slony concern list for over 
two years.  I am hopeful that the changes in 2.2 address much of this 
but we will see.

We uncovered many different race conditions and issues when we looked 
into failover.  Some get solved by a STONITH but others don't. I have 
seen real world failovers not work properly, but the clusters were 
repairable with manual intervention.  The issues I have seen effect the 
slony configuration not user data so as a last resort slony can be 
reinstalled.

My impression is that failover tends to work pretty well in clusters 
with a single set and only one provider.  More complicated cluster 
configurations tended to expose a wider range of issues.  In the work I 
did for 2.2 I determined that some types of cluster configurations are 
incompatible with a safe failover.  In 2.2 we introduced a view 
sl_failover_targets to help with this.





>
>> If your going to move forward with Jan's idea of provisioning a box with
>> both slony 2.1.0 and slony 2.1.2 (I am not convinced that the failover
>> bug you hit is fixed in 2.1.2/ is #260 ) you will need to put two
>> versions of slony on the same machine.  A 2.2 feature we added puts the
>> slony version number inside of the slony_funcs.so filename to make this
>> work nicely.  We have back-ported this to the 2.1 branch here
>> https://github.com/ssinger/slony1-engine/tree/REL_2_1_STABLE_multiversion
>>
>> I've built 2.1.1 and 2.1.2 versions from the multiversion branch, these
>> should install on the same system as your existing 2.1.0 binaries. I
>> also have RPM spec files that allows an install multiple slony versions
>> at the same time. Your policies probably also prevent you from deploying
>> code on a new machine from a random github branch, so this might not be
>> much help.
>
> I'm not sure how much that helps but I appreciate the pointers as we may
> ultimately need them :-)
>
> Joe
>
>



More information about the Slony1-hackers mailing list