[Slony1-general] Long timeout during slony fail over

Tue Feb 5 02:25:36 PST 2008

Hi

I am using the slony replication system for the postgresql database, which keeps database in sync between two machines: the active (master) and backup (slave) host. If, for some reason, the active host goes down, the backup host takes control and becomes the active host. During this switch-over process from backup to active, the failover script is executed on the backup machine. The script attempts to contact the peer host (which is now down), and gives up only after about 20 seconds. I would like to reduce this timeout in order to speed up the switch-over process.

I haven't succeeded in locating and configuring this 20 seconds timeout in slony, and was advised to decrease the number of TCP SYN retries of the operating system (net.ipv4.tcp_syn_retries, in /etc/sysctl.conf), instead. Reducing this value to 0, led to a timeout of ~5 seconds. A value of 1 led to a timeout of ~10 seconds etc. Thus it seems that every retry takes about 5 seconds. However, I feel that this parameter may have a broader range of implications on the system, and would like to know if there's a way of controlling the timeout within the slony.

Regards,

J 

_________________________________________________________________
Helping your favorite cause is as easy as instant messaging. You IM, we give.
http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join