Log Shipping - Slony-I with Files

4.4. Log Shipping - Slony-I with Files

Slony-I has the ability to serialize the updates to go out into log files that can be kept in a spool directory.

The spool files could then be transferred via whatever means was desired to a "slave system," whether that be via FTP, rsync, or perhaps even by pushing them onto a 1GB "USB key" to be sent to the destination by clipping it to the ankle of some sort of "avian transport" system.

There are plenty of neat things you can do with a data stream in this form, including:

  • Replicating to nodes that aren't securable

  • Replicating to destinations where it is not possible to set up bidirection communications

  • Supporting a different form of PITR (Point In Time Recovery) that filters out read-only transactions and updates to tables that are not of interest.

  • If some disaster strikes, you can look at the logs of queries in detail

    This makes log shipping potentially useful even though you might not intend to actually create a log-shipped node.

  • This is a really slick scheme for building load for doing tests

  • We have a data "escrow" system that would become incredibly cheaper given log shipping

  • You may apply triggers on the "disconnected node " to do additional processing on the data

    For instance, you might take a fairly "stateful" database and turn it into a "temporal" one by use of triggers that implement the techniques described in [Developing Time-Oriented Database Applications in SQL ] by Richard T. Snodgrass.

4.4.1. Where are the "spool files" for a subscription set generated?
4.4.2. What takes place when a SLONIK FAILOVER/ SLONIK MOVE SET takes place?
4.4.3. What if we run out of "spool space"?
4.4.4. How do we set up a subscription?
4.4.5. What are the limitations of log shipping?

4.4.1. Where are the "spool files" for a subscription set generated?

Any slon subscriber node can generate them by adding the -a option.

Note: Notice that this implies that in order to use log shipping, you must have at least one subscriber node.

4.4.2. What takes place when a SLONIK FAILOVER/ SLONIK MOVE SET takes place?

Nothing special. So long as the archiving node remains a subscriber, it will continue to generate logs.

If the archiving node becomes the origin, on the other hand, it will continue to generate logs.

4.4.3. What if we run out of "spool space"?

The node will stop accepting SYNCs until this problem is alleviated. The database being subscribed to will also fall behind.

4.4.4. How do we set up a subscription?

The script in tools called slony1_dump.sh is a shell script that dumps the "present" state of the subscriber node.

You need to start the slon for the subscriber node with logging turned on. At any point after that, you can run slony1_dump.sh, which will pull the state of that subscriber as of some SYNC event. Once the dump completes, all the SYNC logs generated from the time that dump started may be added to the dump in order to get a "log shipping subscriber."

4.4.5. What are the limitations of log shipping?

In the initial release, there are rather a lot of limitations. As releases progress, hopefully some of these limitations may be alleviated/eliminated.

The log shipping functionality amounts to "sniffing" the data applied at a particular subscriber node. As a result, you must have at least one "regular" node; you cannot have a cluster that consists solely of an origin and a set of "log shipping nodes.".

The "log shipping node" tracks the entirety of the traffic going to a subscriber. You cannot separate things out if there are multiple replication sets.

The "log shipping node" presently only fully tracks SYNC events. This should be sufficient to cope with some changes in cluster configuration, but not others.

A number of event types are handled in such a way that log shipping copes with them:

  • SYNC events are, of course, handled.

  • DDL_SCRIPT is handled.

  • UNSUBSCRIBE_SET

    This event, much like SUBSCRIBE_SET is not handled by the log shipping code. But its effect is, namely that SYNC events on the subscriber node will no longer contain updates to the set.

    Similarly, SET_DROP_TABLE, SET_DROP_SEQUENCE, SET_MOVE_TABLE, SET_MOVE_SEQUENCE, DROP_SET, MERGE_SET, SUBSCRIBE_SET will be handled "apropriately".

  • The various events involved in node configuration are irrelevant to log shipping: STORE_NODE, ENABLE_NODE, DROP_NODE, STORE_PATH, DROP_PATH, STORE_LISTEN, DROP_LISTEN

  • Events involved in describing how particular sets are to be initially configured are similarly irrelevant: STORE_SET, SET_ADD_TABLE, SET_ADD_SEQUENCE, STORE_TRIGGER, DROP_TRIGGER,

It would be nice to be able to turn a "log shipped" node into a fully communicating Slony-I node that you could failover to. This would be quite useful if you were trying to construct a cluster of (say) 6 nodes; you could start by creating one subscriber, and then use log shipping to populate the other 4 in parallel.

This usage is not supported, but presumably one could take an application outage and promote the log-shipping node to a normal slony node with the OMIT COPY option of SUBSCRIBE SET.

4.4.1. Usage Hints

Note: Here are some more-or-less disorganized notes about how you might want to use log shipping...

  • You don't want to blindly apply SYNC files because any given SYNC file may not be the right one. If it's wrong, then the result will be that the call to setsyncTracking_offline() will fail, and your psql session will ABORT, and then run through the remainder of that SYNC file looking for a COMMIT or ROLLBACK so that it can try to move on to the next transaction.

    But we know that the entire remainder of the file will fail! It is futile to go through the parsing effort of reading the remainder of the file.

    Better idea:

    • The table, on the log shipped node, tracks which log it most recently applied in table sl_archive_tracking.

      Thus, you may predict the ID number of the next file by taking the latest counter from this table and adding 1.

    • There is still variation as to the filename, depending on what the overall set of nodes in the cluster are. All nodes periodically generate SYNC events, even if they are not an origin node, and the log shipping system does generate logs for such events.

      As a result, when searching for the next file, it is necessary to search for files in a manner similar to the following:

      ARCHIVEDIR=/var/spool/slony/archivelogs/node4
      SLONYCLUSTER=mycluster
      PGDATABASE=logshipdb
      PGHOST=logshiphost
      NEXTQUERY="select at_counter+1 from \"_${SLONYCLUSTER}\".sl_archive_tracking;"
      nextseq=`psql -d ${PGDATABASE} -h ${PGHOST} -A -t -c "${NEXTQUERY}"
      filespec=`printf "slony1_log_*_%20d.sql"
      for file in `find $ARCHIVEDIR -name "${filespec}"; do
         psql -d ${PGDATABASE} -h ${PGHOST} -f ${file}
      done

4.4.2. find-triggers-to-deactivate.sh

It was once pointed out ( Bugzilla bug #19) that the dump of a schema may include triggers and rules that you may not wish to have running on the log shipped node.

The tool tools/find-triggers-to-deactivate.sh was created to assist with this task. It may be run against the node that is to be used as a schema source, and it will list the rules and triggers present on that node that may, in turn need to be deactivated.

It includes logtrigger and denyaccess triggers which will may be left out of the extracted schema, but it is still worth the Gentle Administrator verifying that such triggers are kept out of the log shipped replica.

4.4.3. slony_logshipper Tool

As of version 1.2.12, Slony-I has a tool designed to help apply logs, called slony_logshipper. It is run with three sorts of parameters:

  • Options, chosen from the following:

    • h

      display this help text and exit

    • v

      display program version and exit

    • q

      quiet mode

    • l

      cause running daemon to reopen its logfile

    • r

      cause running daemon to resume after error

    • t

      cause running daemon to enter smart shutdown mode

    • T

      cause running daemon to enter immediate shutdown mode

    • c

      destroy existing semaphore set and message queue (use with caution)

    • f

      stay in foreground (don't daemonize)

    • w

      enter smart shutdown mode immediately

  • A specified log shipper configuration file

    This configuration file consists of the following specifications:

    • logfile = './offline_logs/logshipper.log';

      Where the log shipper will leave messages.

    • cluster name = 'T1';

      Cluster name

    • destination database = 'dbname=slony_test3';

      Optional conninfo for the destination database. If given, the log shipper will connect to this database, and apply logs to it.

    • archive dir = './offline_logs';

      The archive directory is required when running in "database-connected" mode to have a place to scan for missing (unapplied) archives.

    • destination dir = './offline_result';

      If specified, the log shipper will write the results of data massaging into result logfiles in this directory.

    • max archives = 3600;

      This fights eventual resource leakage; the daemon will enter "smart shutdown" mode automatically after processing this many archives.

    • ignore table "public"."history";

      One may filter out single tables from log shipped replication

    • ignore namespace "public";

      One may filter out entire namespaces from log shipped replication

    • rename namespace "public"."history" to "site_001"."history";

      One may rename specific tables.

    • rename namespace "public" to "site_001";

      One may rename entire namespaces.

    • post processing command = 'gzip -9 $inarchive';

      Pre- and post-processing commands are executed via system(3).

    An "@" as the first character causes the exit code to be ignored. Otherwise, a nonzero exit code is treated as an error and causes processing to abort.

    Pre- and post-processing commands have two further special variables defined:

    • $inarchive - indicating incoming archive filename

    • $outnarchive - indicating outgoing archive filename

  • error command = ' ( echo "archive=$inarchive" echo "error messages:" echo "$errortext" ) | mail -s "Slony log shipping failed" postgres@localhost ';

    The error command indicates a command to execute upon encountering an error. All logging since the last successful completion of an archive is available in the $errortext variable.

    In the example shown, this sends an email to the DBAs upon encountering an error.

  • Archive File Names

    Each filename is added to the SystemV Message queue for processing by a slony_logshipper process.