Slon daemons

3. Slon daemons

The programs that actually perform Slony-I replication are the slon daemons.

You need to run one slon instance for each node in a Slony-I cluster, whether you consider that node a "master" or a "slave". On Windows™ when running as a service things are slightly different. One slon service is installed, and a seperate configuration file registered for each node to be serviced by that machine. The main service then manages the individual slons itself. Since a MOVE SET or FAILOVER can switch the roles of nodes, slon needs to be able to function for both providers and subscribers. It is not essential that these daemons run on any particular host, but there are some principles worth considering:

  • Each slon needs to be able to communicate quickly with the database whose "node controller" it is. Therefore, if a Slony-I cluster runs across some form of Wide Area Network, each slon process should run on or nearby the databases each is controlling. If you break this rule, no particular disaster should ensue, but the added latency introduced to monitoring events on the slon's "own node" will cause it to replicate in a somewhat less timely manner.

  • The very fastest results would be achieved by having each slon run on the database server that it is servicing. If it runs somewhere within a fast local network, performance will not be noticeably degraded.

  • It is an attractive idea to run many of the slon processes for a cluster on one machine, as this makes it easy to monitor them both in terms of log files and process tables from one location. This also eliminates the need to login to several hosts in order to look at log files or to restart slon instances.

Warning

Do not run a slon that is responsible to service a particular node across a WAN link if at all possible. Any problems with that connection can kill the connection whilst leaving "zombied" database connections on the node that (typically) will not die off for around two hours. This prevents starting up another slon, as described in the FAQ under multiple slon connections.

Historically, slon processes have been fairly fragile, dying if they encounter just about any significant error. This behaviour mandated running some form of "watchdog" which would watch to make sure that if one slon fell over, it would be replaced by another.

There are two "watchdog" scripts currently available in the Slony-I source tree:

  • tools/altperl/slon_watchdog - an "early" version that basically wraps a loop around the invocation of slon, restarting any time it falls over

  • tools/altperl/slon_watchdog2 - a somewhat more intelligent version that periodically polls the database, checking to see if a SYNC has taken place recently. We have had VPN connections that occasionally fall over without signalling the application, so that the slon stops working, but doesn't actually die; this polling addresses that issue.

The slon_watchdog2 script is probably usually the preferable thing to run. It was at one point not preferable to run it whilst subscribing a very large replication set where it is expected to take many hours to do the COPY SET (the main event that processes a SUBSCRIBE SET request). The problem that came up in that case was that it figured that since it hasn't done a SYNC in 2 hours, something was broken requiring restarting slon, thereby restarting the COPY SET event. More recently, the script has been changed to detect COPY SET in progress.

In Slony-I version 1.2, the structure of the slon has been revised fairly substantially to make it much less fragile. The main process should only die off if you expressly signal it asking it to be killed.

A new approach is available in the Section 21.4 script which uses slon configuration files and which may be invoked as part of your system startup process.