[Slony1-general] proper procedure for re-starting slony after replication slave reboots

Tue Feb 19 07:30:44 PST 2008

Andrew Sullivan wrote:
> On Tue, Feb 19, 2008 at 08:46:49AM -0500, Geoffrey wrote:
>> It's now obvious that there's a problem with slony for this one 
>> database.  For example, on one table, the primary node has grown in the 
>> number of records by about 6000 records, still the slave is sitting at 
>> the same value from yesterday.
>>
>> Should I restart the daemons?  Do I need to start over?  I don't seen 
>> anything in the logs that tells me there's a problem.
> 
> I would restart the daemons and see if that fixes it, yes (although I'd
> expect an error).  But I wonder whether your slave is properly configured. 
> Are you sure you have fsync enabled and write cacheing turned off?  Was this
> a controlled reboot, or did it reboot itself?  If you have lost cached data,
> then your replica could indeed be missing data: Slony's only going to be as
> reliable as the underlying PostgreSQL installation.

We never actually rebooted the box, we stop/started the database by 
stopping the daemons, restart the postmaster and start the daemons.

What concerns me now is I'm finding other inconsistencies in other 
databases where the postmaster has been up and running since replication 
was initiated.

Now that I'm starting to research this, I see that there are a number of 
databases with inconsistencies.  We are replicating 12 databases.

While I was checking the count(*) of one table in another database, 
there was a difference of a couple 100 records between the primary and 
slave.  I saw the primary node count go up by one and the slave did the 
same.  Thus it appears that the difference is going to stay the same.

When comparing the differences in the data, it's quite irregular.  That 
is, it does not appear to be a chunk of data from a specific time frame 
that was missed, it's from different days.

The postmasters for this database on both the primary and slave have 
been running since before replication was started, so that is apparently 
not the issue at all.

I don't understand how this is even possible.

-- 
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin