[Slony1-general] CESTERROR remoteListenThread_1: timeout (300 s) for event selection

Wed May 25 04:43:34 PDT 2011

We are replicating a 500Gb database from postgresql 8.3 to
postgresql 9.0 using slony1-2.0.6.

We got the following error in our slon logs during the copy set of
one of the bigger tables:

CESTERROR  remoteListenThread_1: timeout (300 s) for event selection

The documentation:

    ERROR: remoteListenThread_%d: timeout for event selection

    This means that the listener thread (src/slon/remote_listener.c)
timed out when trying to determine what events were outstanding for it.

    This could occur because network connections broke, in which
case restarting the slon might help.

    Alternatively, this might occur because the slon for this node
has been broken for a long time, and there are an enormous number of
entries in sl_event on this or other nodes for the node to work
through, and it is taking more than slon_conf_remote_listen_timeout
seconds to run the query. In older versions of Slony-I, that
configuration parameter did not exist; the timeout was fixed at 300
seconds. In newer versions, you might increase that timeout in the
slon config file to a larger value so that it can continue to
completion. And then investigate why nobody was monitoring things
such that replication broke for such a long time...

Replication seems to continue fine after this error.
Is it save to continue ?
Or should we start from scratch ?
If so what do we have to do to prevent this error from happening again ?

Ger