[Slony1-general] pg_dump and replication lag in 2.0.7

Thu Sep 8 08:43:41 PDT 2011

>From: Steve Singer <ssinger at ca.afilias.info>
>
>Yes this is expected behaviour.  The only work around is to skip the 
>slony schema in pg_dump.
>
>

Ok, cool, I'll do that then.

>

>The issue isn't the cleanup event as much as it is the SYNC event.
>When slony generates a SYNC event it gets an exclusive lock on sl_event.
>It can't get this exclusive lock if a pg_dump is running that has read 
>sl_event.  Changing the cleanupEvent() code won't change this.
>
>Slony 1.2 and 2.0 both behave this way, this shouldn't be new behaviour 
>with 2.0.
>

Hmm, my nagios graphs disagree (or for some reason were oblivious), we check syncs every 10 minutes and they only started showing warnings the night after we moved to 2.0. The check is basically just looks at sl_status, and flags up a problem if st_lag_num_events or st_lag_time go over a threshold via the following query.

    SELECT st_origin, st_received, st_lag_num_events, round(extract(epoch from st_lag_time)) 
    FROM "<my_replication_cluster>".sl_status;

A graph for the weeks leading up to and after the upgrade is attached.  I upgraded on the night of the 25th/26th and ignoring any other downtime where I was obviously fiddling with things, you can see the syncs going out after that date.  As you can imagine, I'm massively embarrassed that it took me 3 months to notice it happening.

>In Slony 2.1.0 the SYNC event gets the lock on sl_event_lock a table 
>that stores no data and is only used for locking.  You could then 
>exclude just that table from pg_dump but I don't see a lot of value of 
>dumping sl_event anyway.
>

Great, thanks.

Glyn
-------------- next part --------------
A non-text attachment was scrubbed...
Name: syncs.png
Type: image/png
Size: 38425 bytes
Desc: not available
Url : http://lists.slony.info/pipermail/slony1-general/attachments/20110908/639e4208/attachment-0001.png