Steve Singer ssinger at ca.afilias.info
Fri Nov 12 12:50:56 PST 2010
On 10-11-12 01:56 PM, Aleksey Tsalolikhin wrote:
> Any pointers for troubleshooting slony replication lag?  We're running
> slony-I 1.2.20, and replicating a 23 GB database across WAN (between
> nearby cities).
>
> last afternoon, slony lag started growing, and has been steadily
> slowly inching up... the lag is now 2 hours 48 minutes...
>
> I don't see any "troubleshooting" documentation on the slony web site
> and hope someone can help me out...
>


http://www.slony.info/documentation/1.2/loganalysis.html is worth reviewing



I would check to see if it is one of the following three situations

1) A query was executed on the master that touched many (millions?) of 
rows in table.  Something like a "UPDATE foo SET count=1;" on a table 
might be fast on the master but slony will replicate this one row at a 
time.  Over a WAN the round-trip latency on table can be pretty painful.
If this is the case you either have to wait until it finishes or give up 
teardown replication and start again.  I can't think of a way to 
determine how far along it is since nothing becomes visible until the 
transaction completes+commits.

2) Something has happened so the replication (insert,update,delete) is 
failing on the slave. You should see errors in the slon logs if this is 
the case.

3) You have a high volume database and a long running transaction.  The 
long running transaction will have prevented log switch from happening 
(slony can't truncate sl_log until the long running transaction finishes 
and replicates).  Your sl_log_1 has grown very big (and will keep 
growing since the truncate + log switch can't happen until the 
transaction finishes).  Since sl_log is so big each SYNC (both 
generating and replicating) takes a long time,  If this is the case you 
should see slony making progress (confirming events) just at a slow rate.



> I've tried restarting the slony slave, and the slony master and slave,
> and even rebooting the slave server...  the lag is still growing.
> I've confirmed the slave can get to the master's DB, and the master
> can get to the slave's DB.
>
> i've checked latency in our network monitoring system, between the two
> sites, and things are humming along.
>
> what else do I look at?
>
> oh, yeah, and I initiated a slony log switch yesterday, that didn't help...
>
> What else should I try besides dropping the replication set and re-creating it?
>
> Best,
> -at
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general



More information about the Slony1-general mailing list