dba at richyen.com dba at richyen.com
Tue Aug 31 17:47:45 PDT 2010
Hi everyone,

Just wanted to look for an explanation regarding what happens on the
subscriber's end of a replication set.  I currently have 4 nodes (1 thru
4), and node 4 also has the "-a <dir>" flag turned on for log shipping (but
I think this is irrelevant)

Occasionally, I will see in test_slony_state_dbi.pl, that one of the
subscribers has really old events or that the provider is lagging behind
the provider, so I decided to harvest some data.  Wrote up a cronjob that
will fetch the average slony lag on node 4 (I could've picked any of them,
but just chose this one because load was lowest).

Basically, I ran this command from the shell every hour:  `psql -tc
"select avg(st_lag_time) from _slony_schema.sl_status" mydb postgres`

Now, I logged it into a file (http://pgsql.privatepaste.com/e4ce8f8f67)
and it shows that the other nodes average > 70 days'  worth of lag at
times.  (see period from May 10 to Jul 09)

There are 3 other replication clusters I tracked, and one of them even
went up to 153 days before dropping right back down to zero.

Could someone explain why this happens, or perhaps more importantly--what
causes the lag to drop from high 70s of days down to 0.  Is it sl_log_{1,2}
rotation?

Sorry, I might be able to find the answer by scouring the logs, but I'm
hoping to find a quick answer here.

Using Slony 2.0.3, postgres 8.4.2 on CentOS 2.6.18 on all nodes.

Much appreciated!
--Richard


More information about the Slony1-general mailing list