[Slony1-general] proper procedure for re-starting slony after replication slave reboots

Thu Feb 21 05:43:27 PST 2008

On Wed, Feb 20, 2008 at 07:13:37PM -0500, Geoffrey wrote:

> thing I didn't mention is the actual configuration.  Two boxes connected 
> to a single data silo.  It's a hot/hot configuration. Separate 
> postmaster for each database.  Half the postmasters run on one server, 
> the other half on the other.  If/when one fails, the other picks up the 
> postmaster processes. 

How do you guarantee that the first is actually dead before the other "picks
up the postmaster process"?  I'm assuming what you mean is something like
this:

server1 <-------> disk <-------> server2

Server1 and server2 are both attached to the disk at the same time.  When
server1 "goes away", server2 fires up a postgres instance on the same data
area server1 was using, goes into recovery mode, and takes over the hostname
and IP of server1.  

In order for this to work, you have to be absolutely certain that server1 is
dead and disconnected from the shared disk before server2 starts the
postgres process on the same data area.  Without that, you are sure to have
database corruption of some kind.  That is, the data from server1 MUST BE
FLUSHED and on the platters before server2 starts using the same data area. 
So it might not be enough to be sure server1 is dead.  You have to be sure
the disk's cache is flushed too, or you could have a mess.

A