Jeff threshar at torgo.978.org
Thu May 27 12:36:40 PDT 2010
Just ran into an interesting problem where I had a slave that was  
lagging for no apparent reason.
After some digging I fired up pgspy and watched the timing of queries   
and noticed forwardConfirm was consistently taking a long time (1-2  
seconds).  The queries it issued seemed to be zippy, leaving just the  
notify.

Sure enough:
XXXX=# select * from pgstattuple('pg_listener');
-[ RECORD 1 ]------+----------
table_len          | 312877056
tuple_count        | 3
tuple_len          | 288
tuple_percent      | 0
dead_tuple_count   | 44
dead_tuple_len     | 4224
dead_tuple_percent | 0
free_space         | 299606604
free_percent       | 95.76

we've got a bit of the ol' bloat there. but wait,  I'm running 8.4.2  
shouldn't AV be taking care of that?
Looking at pg_stat_all_tables I see

last_vacuum      |
last_autovacuum  | 2010-05-19 13:21:37.349991-04
last_analyze     | 2010-05-27 14:16:39.887117-04
last_autoanalyze | 2010-05-19 13:21:37.349991-04

well shoot, that was last week!  someone on irc suggested autovac  
cancellation so I looked at my logs and sure enough, pg_listener and  
autovac were NOT getting along (along with some other tables as  
well).  after building a list of the most commonly aborted autovacs I  
checked those out and all had recent autovac records in  
pg_stat_all_tables.

a vacuum full of pg_listener and we're back in business, lag  
immediately dropped to 0.

not sure what I can do to cope with the continued cancellations - I  
would have thought it would have tried again shortly but apparently I  
was wrong.  time to whip up yet another nagios alarm :(

Mostly just putting this out there for others who may have run into  
this.

btw, this is on a patched slony 1.2.17 & pg 8.4.2

--
Jeff Trout <jeff at jefftrout.com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/





More information about the Slony1-general mailing list