Peter Geoghegan peter at 2ndquadrant.com
Wed May 25 05:03:24 PDT 2011
On 25 May 2011 12:43, Ger Timmens <Ger.Timmens at adyen.com> wrote:

>    Alternatively, this might occur because the slon for this node
> has been broken for a long time, and there are an enormous number of
> entries in sl_event on this or other nodes for the node to work
> through, and it is taking more than slon_conf_remote_listen_timeout
> seconds to run the query. In older versions of Slony-I, that
> configuration parameter did not exist; the timeout was fixed at 300
> seconds. In newer versions, you might increase that timeout in the
> slon config file to a larger value so that it can continue to
> completion. And then investigate why nobody was monitoring things
> such that replication broke for such a long time...

If this is the case, then you can change the listen timeout to
something in the hundreds of seconds.

> Replication seems to continue fine after this error.
> Is it save to continue ?
> Or should we start from scratch ?
> If so what do we have to do to prevent this error from happening again ?

In general, Slony will not allow slaves to enter an inconsistent state.

Look at the "test_slony_state" Perl script which looks at various
parts of the configuration and verifies that things are running
correctly:

http://slony.info/documentation/2.0/monitoring.html

This should form part of your monitoring setup. It is common to
automatically run the script at regular intervals.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


More information about the Slony1-general mailing list