[Slony1-general] Slony lag times

Wed Aug 1 12:54:54 PDT 2007

Laurent Raufaste wrote:
> Hi,
>
> We are using Slony on a production environment and are very pleased by 
> it.
>
> Our cluster is made of 1 master, 4 slaves that needs to be replicated
> fast, and 2 slaves for which the replication speed isn't a problem.
>
> Here's our issue: In the sl_status view I notice that the st_lag_time is
> always between 1 and many seconds: it goes up to 10 seconds regularly,
> and approximatively one time a day, there is always a slave reaching 1
> min, for example while vacuuming.
>
If the problem is essentially that the master is overloaded, then there 
isn't any configuration change likely to help.  There is no guaranteeing 
that subscribers will be Right Up To Date.
> I tried playing with the folllowing options:
>     -s <milliseconds>     SYNC check interval (default 10000)
>     -t <milliseconds>     SYNC interval timeout (default 60000)
>     -o <milliseconds>     desired subscriber SYNC processing time
>     -g <num>              maximum SYNC group size (default 6)
>
> Now on the master I have:
> -s 1000 -g 50
> On the fast slaves I have:
> -s 1000
> And on the slow slaves:
> -s 10000 -g 10
>
> I tried lowering the SYNC check interval to 500ms with no real effect,
> and the master is already loaded enough anyway ;)
There are seeming misconceptions in how you're configuring it...

- On the origin node, the "-g" option is pretty much irrelevant.  -g 
affects how *subscribers* group together SYNCs into groups; since the 
origin does not apply any SYNCs coming from subscribers, it is 
irrelevant to do any grouping there.

- On the other hand, the main node where the "-s" (and -t) options have 
meaningful effect is on an origin node, as that is the option which 
controls how often SYNCs are generated.  (SYNCs get generated on 
non-origin nodes, but those SYNCs don't lead to any replication work 
being done, so they're very uninteresting.)  Thus, setting "-s 1000" 
versus "-s 10000" on subscriber nodes is pretty much irrelevant.
> Is there an effective way to shorten the replication lag time ?
>
> A Slony noob.
>
Speed is an "emergent property," falling from how much work is being 
thrown at the system (e.g. - what are you doing to overload it) and how 
much hardware there is to do replication work.

Tuning the DBMS, tuning OS, tuning hardware, and such, will have some 
effect.  In the long run, though, the main way is to get faster network 
and disk hardware...