Christopher Browne cbbrowne
Tue Feb 8 17:55:16 PST 2005
Vivek Khera wrote:

>
> On Feb 8, 2005, at 10:42 AM, Christopher Browne wrote:
>
>> 1.  You should leave the slon running against the origin node all the 
>> time, including when it is undergoing the heavy processing.
>>
>> If you shut it off, you'll discover that all of the changes during 
>> that 30 minute period are treated as one really big SYNC, and things 
>> will behave badly when you turn on the "subscriber" slon as it tries 
>> to grab all the data at once.
>>
>
> I will second this motion.  I did something to my DB last week that 
> caused the server to run out of file handles globally, causing an 
> unclean termination of all connections followed by a recovery on the 
> origin.  I neglected to restart the slon talking to the origin for 
> about 13 hours (can someone say "need better monitoring scripts"?).  
> It took about 5 days to catch up, and the DB was dog slow the whole 
> time.  So much so that we were having trouble doing work.
>
> [ ... ]

You're a good candidate then for looking at generate_syncs.sh; stick a 
cron job in that runs it every 10 minutes and you'll be WAY better off 
if something like that happens again.

>> 2.  You should try to keep the slons running most of the time so that 
>> the systems are largely kept in sync and so that there is not a large 
>> buildup of rows in sl_log_1 and sl_seqlog.
>>
>> If you shut off replication for days at a time, those tables will 
>> often build up, and performance will be questionable.
>>
>
> In the end, after I was all caught up, I had to vacuum full the 
> sl_log_1 table (and reindex for good measure) on the subscriber.  
> There were millions of dead tuples.  Curiously, I had to kill the 
> subscriber slon to release them to the vaccum, even though it wasn't 
> in a transaction.  Either that or it was a coincidence that the second 
> vacuum got to reclaim the rows :-)
>
> Performance is again excellent, if not better than before.  Perhaps a 
> vacuum full of the slony tables after the initial copy is a good idea?
>

VACUUM FULL of the tables where???  On the origin?  Or on the subscriber?

I'd be comfortable enough doing it on a subscriber that is still 
catching up and therefore isn't being used for production purposes...

If you'd been running generate_syncs.sh, it would have come up to date 
in a much "friendlier" fashion, and not needed the vacuum nearly so 
badly.  It probably would have been able to do enough cleanup while 
getting up to date that the FSM wouldn't have been blown out...


More information about the Slony1-general mailing list