[Slony1-general] Slon not catching up

Wed Mar 22 08:11:41 PST 2006

On Wed, 2006-03-22 at 01:32, Ujwal S. Setlur wrote:
> --- Rod Taylor <pg at rbt.ca> wrote:
> 
> > On Tue, 2006-03-21 at 08:35 -0800, Ujwal S. Setlur
> > wrote:
> > > I tried tuning a bunch fof things, but the
> > subscriber
> > > never did catch up, so I restarted replication
> > from
> > > scratch. Make me a little nervous...
> > 
> > Chris,
> > 
> > I don't recall whether Slony would scale up the
> > group size automatically
> > to a large transaction boundary if a normal group
> > was smaller than the
> > largest transaction size in the time period it is
> > covering.
> > 
> > Ujwal,
> > 
> > If not, bumping up the group size to a setting of
> > several thousand, -g
> > 10000 is standard operation here, it can catch it up
> > in a hurry.
> > 
> 
> I tried using 10,000 for "g", and it did not help. I
> did restart the replication from start with a "g" of
> 100, and it had caught up overnight, and it seems to
> be going OK.
> 
> Regarding Vivek's comment, yes, I do need to put some
> monitoring in place. This is still a Development
> system, and I am trying to figure out where I need to
> pay some special attention.
> 

I've been using this query to monitor my sets:

set search_path=_setnamegoeshere;

select 
	con_origin as "Origin", 
	con_received as "Receiver", 
	date_trunc('minutes', min(now()-con_timestamp)) as 
		"Age of Latest Sync" 
from sl_confirm 
group by 
	con_origin, 
	con_received 
order by 
	con_origin, 
	con_received;

It's simple, but everytime we've had any issues, it's found them.

Would welcome any suggestions for a better query to check the health of
the set.