SLONIK FAILOVER

SLONIK FAILOVER

Name

FAILOVER --  Fail a broken replication set over to a backup node

Synopsis

FAILOVER (options);

Description

The FAILOVER command causes the backup node to take over all sets that currently originate on the failed node. slonik will contact all other direct subscribers of the failed node to determine which node has the highest sync status for each set. If another node has a higher sync status than the backup node, the replication will first be redirected so that the backup node replicates against that other node, before assuming the origin role and allowing update activity.

After successful failover, all former direct subscribers of the failed node become direct subscribers of the backup node. The failed node is abandoned, and can and should be removed from the configuration with SLONIK DROP NODE.

ID = ival

ID of the failed node

BACKUP NODE = ival

Node ID of the node that will take over all sets originating on the failed node

This uses schemadocfailednode(p_backup_node integer, p_failed_node integer).

Example

FAILOVER (
   ID = 1,
   BACKUP NODE = 2
);
    

Locking Behaviour

Exclusive locks on each replicated table will be taken out on both the new origin node as replication triggers are changed. If the new origin was not completely up to date, and replication data must be drawn from some other node that is more up to date, the new origin will not become usable until those updates are complete.

Dangerous/Unintuitive Behaviour

This command will abandon the status of the failed node. There is no possibility to let the failed node join the cluster again without rebuilding it from scratch as a slave. If at all possible, you would likely prefer to use SLONIK MOVE SET instead, as that does not abandon the failed node.

If there are many nodes in a cluster, and failover includes dropping out additional nodes (e.g. when it is necessary to treat all nodes at a site including an origin as well as subscribers as failed), it is necessary to carefully sequence the actions.

Slonik Event Confirmation Behaviour

Slonik will submit the FAILOVER_EVENT without waiting but wait until the most ahead node has received confirmations of the FAILOVER_EVENT from all nodes before completing.

Version Information

This command was introduced in Slony-I 1.0

In version 2.0, the default BACKUP NODE value of 1 was removed, so it is mandatory to provide a value for this parameter.