[Slony1-general] Feature Idea: improve performance when a large sl_log backlog exists

Tue Nov 23 09:49:35 PST 2010

On 10-11-23 11:53 AM, Christopher Browne wrote:
>
> I'm not sure that we gain much by splitting the logs into a bunch of
> pieces for that case.
>
> It's still the same huge backlog, and until it gets worked down, it's
> bloated, period.

My concern is with the performance on the master (and any other slaves) 
caused by the bloat not the slave being populated.

Right now sl_log_1 can grow to be gigabytes on the master.  Each time 
the sync thread on the master needs to select from sl_log_1 to generate 
a sync it queries a table that is gigabytes in size.  Similarly when any 
other slaves need to select from sl_log_1 against the master they need 
to query a table that is gigabytes.

With having a  growable pool of sl_log tables my thinking is that once 
all transactions using sl_log_x have committed we can do a final sync 
against sl_log_x and then sync thread no longer needs to access it. 
Similarly other subscribers that querying the the master could be made 
smart enough to not need to look at sl_log segments for which they have 
already received all of the data.

This way the bloat caused by backlog won't effect the performance of 
normal (up-to-date) syncs.

>
> A different suggestion, that doesn't involve any changes to Slony...
>
> Initially, it might be a good idea to set up the new subscriber with
> FORWARD=no...
>
>     subscribe set (id=1, provider=1, receiver=2,forward=no);
>
> That means that log data won't get captured in sl_log_(1|2) on the
> subscriber for a while, while the subscription is catching up.  Once
> it's reasonably caught up, you submit:
>
>     subscribe set (id=1, provider=1, receiver=2,forward=yes);
>
> which turns that logging on, so that the new node becomes a failover
> target and a legitimate target to feed other subscriptions.
>
> While node #2 is catching up, it's a crummy candidate for a failover
> target, so while this strategy *altogether* loses the ability for it to
> be a failover target while catching up, remember that it was moving from
> 18-ish hours behind towards caught up, which represented a crummy
> failover target.  I don't think something hugely useful is being lost,
> here.