Slony-I Maintenance

6. Slony-I Maintenance

Slony-I actually does a lot of its necessary maintenance itself, in a "cleanup" thread:

  • Deletes old data from various tables in the Slony-I cluster's namespace, notably entries in sl_log_1, sl_log_2, and sl_seqlog.

  • Vacuum certain tables used by Slony-I. As of 1.0.5, this includes pg_listener; in earlier versions, you must vacuum that table heavily, otherwise you'll find replication slowing down because Slony-I raises plenty of events, which leads to that table having plenty of dead tuples.

    In some versions (1.1, for sure; possibly 1.0.5) there is the option of not bothering to vacuum any of these tables if you are using something like pg_autovacuum to handle vacuuming of these tables. Unfortunately, it has been quite possible for pg_autovacuum to not vacuum quite frequently enough, so you may prefer to use the internal vacuums. Vacuuming pg_listener "too often" isn't nearly as hazardous as not vacuuming it frequently enough.

    Unfortunately, if you have long-running transactions, vacuums cannot clear out dead tuples that are newer than the eldest transaction that is still running. This will most notably lead to pg_listener growing large and will slow replication.

  • The Duplicate Key Violation bug has helped track down a number of rather obscure PostgreSQL race conditions, so that in modern versions of Slony-I and PostgreSQL, there should be little to worry about.

  • As of version 1.2, "log switching" functionality is in place; every so often (by default, once per week, though you may induce it by calling the stored function logswitch_start()), it seeks to switch between storing data in sl_log_1 and sl_log_2 so that it may seek to TRUNCATE the "elder" data.

    That means that on a regular basis, these tables are completely cleared out, so that you will not suffer from them having grown to some significant size, due to heavy load, after which they are incapable of shrinking back down

    In version 2.0, DELETE is no longer used to clear out data in sl_log_1 and sl_log_2; instead, the log switch logic is induced frequently, every time the cleanup loop does not find a switch in progress, and these tables are purely cleared out via TRUNCATE. This eliminates the need to vacuum these tables.

6.1. Interaction with PostgreSQL autovacuum

Recent versions of PostgreSQL support an "autovacuum" process which notices when tables are modified, thereby creating dead tuples, and vacuums those tables, "on demand." It has been observed that this can interact somewhat negatively with Slony-I's own vacuuming policies on its own tables.

Slony-I requests vacuums on its tables immediately after completing transactions that are expected to clean out old data, which may be expected to be the ideal time to do so. It appears as though autovacuum may notice the changes a bit earlier, and attempts vacuuming when transactions are not complete, rendering the work pretty useless. It seems preferable to configure autovacuum to avoid vacuum Slony-I-managed configuration tables.

The following query (change the cluster name to match your local configuration) will identify the tables that autovacuum should be configured not to process:

mycluster=# select oid, relname from pg_class where relnamespace = (select oid from pg_namespace where nspname = '_' || 'MyCluster') and relhasindex;
  oid  |   relname    
-------+--------------
 17946 | sl_nodelock
 17963 | sl_setsync
 17994 | sl_trigger
 17980 | sl_table
 18003 | sl_sequence
 17937 | sl_node
 18034 | sl_listen
 18017 | sl_path
 18048 | sl_subscribe
 17951 | sl_set
 18062 | sl_event
 18069 | sl_confirm
 18074 | sl_seqlog
 18078 | sl_log_1
 18085 | sl_log_2
(15 rows)

The following query will populate pg_catalog.pg_autovacuum with suitable configuration information: INSERT INTO pg_catalog.pg_autovacuum (vacrelid, enabled, vac_base_thresh, vac_scale_factor, anl_base_thresh, anl_scale_factor, vac_cost_delay, vac_cost_limit, freeze_min_age, freeze_max_age) SELECT oid, 'f', -1, -1, -1, -1, -1, -1, -1, -1 FROM pg_catalog.pg_class WHERE relnamespace = (SELECT OID FROM pg_namespace WHERE nspname = '_' || 'MyCluster') AND relhasindex;

6.2. Watchdogs: Keeping Slons Running

There are a couple of "watchdog" scripts available that monitor things, and restart the slon processes should they happen to die for some reason, such as a network "glitch" that causes loss of connectivity.

You might want to run them...

The "best new way" of managing slon processes is via the combination of Section 21.2, which creates a configuration file for each node in a cluster, and Section 21.4, which uses those configuration files.

This approach is preferable to elder "watchdog" systems in that you can very precisely "nail down," in each config file, the exact desired configuration for each node, and not need to be concerned with what options the watchdog script may or may not give you. This is particularly important if you are using log shipping , where forgetting the -a option could ruin your log shipped node, and thereby your whole day.

6.3. Parallel to Watchdog: generate_syncs.sh

A new script for Slony-I 1.1 is generate_syncs.sh, which addresses the following kind of situation.

Supposing you have some possibly-flakey server where the slon daemon that might not run all the time, you might return from a weekend away only to discover the following situation.

On Friday night, something went "bump" and while the database came back up, none of the slon daemons survived. Your online application then saw nearly three days worth of reasonably heavy transaction load.

When you restart slon on Monday, it hasn't done a SYNC on the master since Friday, so that the next "SYNC set" comprises all of the updates between Friday and Monday. Yuck.

If you run generate_syncs.sh as a cron job every 20 minutes, it will force in a periodic SYNC on the origin, which means that between Friday and Monday, the numerous updates are split into more than 100 syncs, which can be applied incrementally, making the cleanup a lot less unpleasant.

Note that if SYNCs are running regularly, this script won't bother doing anything.

6.4. Testing Slony-I State

In the tools directory, you will find Section 5.1 scripts called test_slony_state.pl and test_slony_state-dbi.pl. One uses the Perl/DBI interface; the other uses the Pg interface.

Both do essentially the same thing, namely to connect to a Slony-I node (you can pick any one), and from that, determine all the nodes in the cluster. They then run a series of queries (read only, so this should be quite safe to run) which examine various Slony-I tables, looking for a variety of sorts of conditions suggestive of problems, including:

  • Bloating of tables like pg_listener, sl_log_1, sl_log_2, sl_seqlog

  • Listen paths

  • Analysis of Event propagation

  • Analysis of Event confirmation propagation

    If communications is a little broken, replication may happen, but confirmations may not get back, which prevents nodes from clearing out old events and old replication data.

Running this once an hour or once a day can help you detect symptoms of problems early, before they lead to performance degradation.

6.5. Replication Test Scripts

In the directory tools may be found four scripts that may be used to do monitoring of Slony-I instances:

  • test_slony_replication is a Perl script to which you can pass connection information to get to a Slony-I node. It then queries sl_path and other information on that node in order to determine the shape of the requested replication set.

    It then injects some test queries to a test table called slony_test which is defined as follows, and which needs to be added to the set of tables being replicated:

    CREATE TABLE slony_test (
        description text,
        mod_date timestamp with time zone,
        "_Slony-I_testcluster_rowID" bigint DEFAULT nextval('"_testcluster".sl_rowid_seq'::text) NOT NULL
    );

    The last column in that table was defined by Slony-I as one lacking a primary key...

    This script generates a line of output for each Slony-I node that is active for the requested replication set in a file called cluster.fact.log.

    There is an additional finalquery option that allows you to pass in an application-specific SQL query that can determine something about the state of your application.

  • log.pm is a Perl module that manages logging for the Perl scripts.

  • run_rep_tests.sh is a "wrapper" script that runs test_slony_replication.

    If you have several Slony-I clusters, you might set up configuration in this file to connect to all those clusters.

  • nagios_slony_test is a script that was constructed to query the log files so that you might run the replication tests every so often (we run them every 6 minutes), and then a system monitoring tool such as Nagios can be set up to use this script to query the state indicated in those logs.

    It seemed rather more efficient to have a cron job run the tests and have Nagios check the results rather than having Nagios run the tests directly. The tests can exercise the whole Slony-I cluster at once rather than Nagios invoking updates over and over again.

6.6. Other Replication Tests

The methodology of the previous section is designed with a view to minimizing the cost of submitting replication test queries; on a busy cluster, supporting hundreds of users, the cost associated with running a few queries is likely to be pretty irrelevant, and the setup cost to configure the tables and data injectors is pretty high.

Three other methods for analyzing the state of replication have stood out:

  • For an application-oriented test, it has been useful to set up a view on some frequently updated table that pulls application-specific information.

    For instance, one might look either at some statistics about a most recently created application object, or an application transaction. For instance:

    create view replication_test as select now() - txn_time as age, object_name from transaction_table order by txn_time desc limit 1;

    create view replication_test as select now() - created_on as age, object_name from object_table order by id desc limit 1;

    There is a downside: This approach requires that you have regular activity going through the system that will lead to there being new transactions on a regular basis. If something breaks down with your application, you may start getting spurious warnings about replication being behind, despite the fact that replication is working fine.

  • The Slony-I-defined view, sl_status provides information as to how up to date different nodes are. Its contents are only really interesting on origin nodes, as the events generated on other nodes are generally ignorable.

  • See also the Section 5.3 discussion.

6.7. Log Files

slon daemons generate some more-or-less verbose log files, depending on what debugging level is turned on. You might assortedly wish to:

  • Use a log rotator like Apache rotatelogs to have a sequence of log files so that no one of them gets too big;

  • Purge out old log files, periodically.

6.8. mkservice

6.8.1. slon-mkservice.sh

Create a slon service directory for use with svscan from daemontools. This uses multilog in a pretty basic way, which seems to be standard for daemontools / multilog setups. If you want clever logging, see logrep below. Currently this script has very limited error handling capabilities.

For non-interactive use, set the following environment variables. BASEDIR SYSUSR PASSFILE DBUSER HOST PORT DATABASE CLUSTER SLON_BINARY If any of the above are not set, the script asks for configuration information interactively.

  • BASEDIR where you want the service directory structure for the slon to be created. This should not be the /var/service directory.

  • SYSUSR the unix user under which the slon (and multilog) process should run.

  • PASSFILE location of the .pgpass file to be used. (default ~sysusr/.pgpass)

  • DBUSER the postgres user the slon should connect as (default slony)

  • HOST what database server to connect to (default localhost)

  • PORT what port to connect to (default 5432)

  • DATABASE which database to connect to (default dbuser)

  • CLUSTER the name of your Slony1 cluster? (default database)

  • SLON_BINARY the full path name of the slon binary (default which slon)

6.8.2. logrep-mkservice.sh

This uses tail -F to pull data from log files allowing you to use multilog filters (by setting the CRITERIA) to create special purpose log files. The goal is to provide a way to monitor log files in near realtime for "interesting" data without either hacking up the initial log file or wasting CPU/IO by re-scanning the same log repeatedly.

For non-interactive use, set the following environment variables. BASEDIR SYSUSR SOURCE EXTENSION CRITERIA If any of the above are not set, the script asks for configuration information interactively.

  • BASEDIR where you want the service directory structure for the logrep to be created. This should not be the /var/service directory.

  • SYSUSR unix user under which the service should run.

  • SOURCE name of the service with the log you want to follow.

  • EXTENSION a tag to differentiate this logrep from others using the same source.

  • CRITERIA the multilog filter you want to use.

A trivial example of this would be to provide a log file of all slon ERROR messages which could be used to trigger a nagios alarm. EXTENSION='ERRORS' CRITERIA="'-*' '+* * ERROR*'" (Reset the monitor by rotating the log using svc -a $svc_dir)

A more interesting application is a subscription progress log. EXTENSION='COPY' CRITERIA="'-*' '+* * ERROR*' '+* * WARN*' '+* * CONFIG enableSubscription*' '+* * DEBUG2 remoteWorkerThread_* prepare to copy table*' '+* * DEBUG2 remoteWorkerThread_* all tables for set * found on subscriber*' '+* * DEBUG2 remoteWorkerThread_* copy*' '+* * DEBUG2 remoteWorkerThread_* Begin COPY of table*' '+* * DEBUG2 remoteWorkerThread_* * bytes copied for table*' '+* * DEBUG2 remoteWorkerThread_* * seconds to*' '+* * DEBUG2 remoteWorkerThread_* set last_value of sequence*' '+* * DEBUG2 remoteWorkerThread_* copy_set*'"

If you have a subscription log then it's easy to determine if a given slon is in the process of handling copies or other subscription activity. If the log isn't empty, and doesn't end with a "CONFIG enableSubscription: sub_set:1" (or whatever set number you've subscribed) then the slon is currently in the middle of initial copies.

If you happen to be monitoring the mtime of your primary slony logs to determine if your slon has gone brain-dead, checking this is a good way to avoid mistakenly clobbering it in the middle of a subscribe. As a bonus, recall that since the the slons are running under svscan, you only need to kill it (via the svc interface) and let svscan start it up again laster. I've also found the COPY logs handy for following subscribe activity interactively.