[Slony1-general] Replication node suddenly lagging, CPU bound postmaster

Wed Aug 20 06:41:40 PDT 2008

Hi Andrew,

On Wed, Aug 20, 2008 at 08:46:28AM -0400, Andrew Sullivan wrote:
> On Wed, Aug 20, 2008 at 12:15:20PM +0200, Benjamin Pineau wrote:
> 
> > Strangely, the replication on this node seems CPU bound by the postmaster
> > process doing the actual inserts/updates for slon (this postmaster process
> > is stuck at 99% CPU usage since the beginning of the problem).
> 
> Is your vacuum regimen correct?  I've seen that cause this sort of
> problem.

Cron do run an "ANALYZE verbose" at midnight every day but sunday. 
On sunday it launch "vacuumdb -a -z -e -v" (so a "simple" vacuum). 
I think this maintainance vacuum should have completed by the time 
problem appeared (more than 12 hours laters), but I don't know for
sure. And I'm not very sure wether my setup is correct or not (being 
a pg newbie ;).

And this node's postgresql.conf setup is as follow :
vacuum_cost_delay = 0			# 0-1000 milliseconds
vacuum_cost_page_hit = 1		# 0-10000 credits
vacuum_cost_page_miss = 10		# 0-10000 credits
vacuum_cost_page_dirty = 20		# 0-10000 credits
vacuum_cost_limit = 200			# 0-10000 credits
autovacuum = on				# enable autovacuum subprocess
autovacuum_naptime = 600		# time between autovacuum runs, in secs
autovacuum_vacuum_threshold = 10000	# min # of tuple updates before
autovacuum_analyze_threshold = 5000	# min # of tuple updates before 

I don't see any pending locks (appart from the currently running
replication's inserts). Wouldn't an intrusive, full vacuum leave such 
locks visible in pg_locks+pg_stat_activity? 
Or do you rather mean that launching a full vacuum may help?

Thank you for the tip, it does rings a bell (ie. since I had an "almost
disk full" situation just before the replication problem, maybe pg may
have launched an emergency autovacuum of some sort? I need to explore).