Brian Fehrle brianf at consistentstate.com
Tue May 25 10:27:26 PDT 2010
Hi all,
    I'm having some trouble determining why replication isn't happening 
on a replication table. I have a two node slony cluster. I have a table 
in the slony replication set that has 72332 records on the master, 
however it has 71225 records on the slave. It's been this way for a few 
hours at least (could be more as that is when we first noticed it). This 
table was added to the replication set several weeks ago, so it's not 
stalled mid-publish. The slon daemons are running, and the logs for the 
daemons report no abnormalities. I've restarted the slon daemons to see 
if it would clear anything up, but it remains the same.

Looking at sl_status, the lag events never go above 1, and the lag time 
never goes above a couple of minutes.

Best reasons I can think of are, either something is causing the 
replication on this particular table to be on "hold" and not update the 
remaining rows on the slave, while not alerting me via the slon logs. Or 
something went screwy and replication for that table is out of sync and 
I need to drop the table from the set and add it back again, let it sync 
up (however this solution is not ideal.)

Any tips of places I should look to see what may be going on?

Thanks in advance.

       - Brian Fehrle

Data that may be important:

Commands that start the slon daemons:
/usr/local/pgsql/bin/slon -p /usr/local/pgsql/log/slon.node1.pid -s 
60000 -t 300000 SLONY "dbname=$MASTERDBNAME port=$MASTERPORT 
host=$MASTERHOST user=$REPUSER"  > /usr/local/pgsql/log/slon.node1.log 
2>&1 &
/usr/local/pgsql/bin/slon -p  /usr/local/pgsql/log/slon.node2.pid-s 
60000 -a /usr/local/pgsql/slon_logs -t 300000 -x "log_parsing_script" 
SLONY "dbname=$SLAVEDBNAME port=$SLAVEPORT host=$SLAVEHOST 
user=$REPUSER"  > /usr/local/pgsql/log/slon.node2.log 2>&1 &

slony version 1.2.20
master PostgreSQL version 8.4.1
slave PostgreSQL version 8.4.2




More information about the Slony1-general mailing list