[Slony1-general] Slony Watchdog failed starting up the child process

Tue Jul 23 12:34:30 PDT 2013

On 07/23/13 15:21, Rose Nancy wrote:
> Hi Chris,
> 
> On 13-07-23 03:07 PM, Christopher Browne wrote:
>> My intuition from seeing it say "FATAL" is that that's indicating 
>> "death of process," and that there's not much coming back from it.
>>
>> This behaviour is pretty consistent with what happens with a Postgres 
>> postmaster; if the attempt to start up fails due to seeming already to 
>> have a postmaster, it doesn't retry, pg_ctl immediately gives up.
> 
> In my case the slons are running in a separate server which I rebooted.
>>
>> By the way, is this possibly because of a zombied old connection that 
>> got disconnected due to firewall glitch or such?  If so, you should 
>> probably see about lowering the TCP keepalive parameters both in the 
>> slon.conf file and in postgresql.conf
> You're right, the duplicated key error was caused by a zombie old 
> connection that got disconnected due to the slony server reboot.

Which is something that TCP keepalives should clean out after a while.

My recollection of the discussion, that led to the current behavior, is
that we had several options.

A) have it retry no matter what. In the case of a zombie that doesn't
time out, that doesn't help, but at least it creates a constant stream
of error messages.

B) Treat the first connection attempt different, so that the system
startup script or a later manual invocation will signal a problem.

C) Kick the existing (assumed to be dead) slon out (by killing its
backend). But if you really have two slons, that are alive and possibly
running on different systems, they will keep stabbing each other in the
back and eventually never get anything accomplished.

We chose B) at that time.

I am open for new ideas.

Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin