[Slony1-general] log switch issues

Fri Jan 9 14:46:19 PST 2009

All,

I've been using slony 2.0 (RC2 and RC1...now on 2.0REL)  for a while now 
replicating a single table. Things were going okay then it started to 
slow down and my slave db got out of sync, I tweaked some parameters on 
the master and slave to get the slave "caught" up with replication and 
things are okay for now.

One thing I noticed in the master node is that I keep getting the 'log 
switch to sl_log_2 still in progress - sl_log_1 not truncated' message, 
and my sl_log_2 table is around 12GB (sl_log_1 is small). I'm getting 
around a 30-40 second delay to first row in the log files (on the 
slave), which was .001 seconds when I first started replicating several 
months ago. This growth in time makes sense since the table is getting 
larger and larger. From what I can tell the logs should flip back and 
forth from sl_log_1 and sl_log_2 and truncate the old logs to keep these 
tables to a minimum size and thus it shouldn't take 30 seconds to get 
all the SYNC events.

My question is this: What are the steps needed to unwedge this log 
switch scenario, I can't see any obvious functions to call to force it. 
 From what I can tell the two tables are in sync. sl_status() shows lag 
of 1-2
If I'm understanding the system correctly the logswitch code is waiting 
for the transactions that remain in the 'old' log to be taken care of 
before it truncates and complets the switch to log_2. Is there something 
that can tell the slon daemons to take a look at these dangling 
transactions, or a function to flush(delete) them? Or is starting 
completely over the solution?

I've looked through the docs and other archives on how to troubleshoot 
this and didn't find anything to usefull. I've REINDEX'd the tables, 
vacuumed them just for good measure and I still see the same amount of 
time to perform the sync.

There is also a chance that there could be things hanging around if 
there were issues with the other release candidates, this is the second 
time the issue happend. The first time it happened I just dropped the 
slave node and resynced to a new db. I'm willing to setup everything 
from scratch again. I just want to know what to do if I run into this 
situation again after starting over.

Software setup:
Ubuntu linux 7.10
Master: Postgres 8.3.3
Slave: Postgres 8.3.1
Slony: 2.0.0

Thanks in advance.
-Steve