[Slony1-hackers] Revisions to SYNC grouping handling

Tue Dec 4 10:54:30 PST 2007

Christopher Browne <cbbrowne at ca.afilias.info> writes:
> It seems warranted to do some cleaning up of the handling of SYNC
> grouping in -HEAD, as what is there now has a confusion of
> embarassment of riches in quantity of possible policy...
>
> There are a bunch of grouping-related options at present that seem to
> interact somewhat confusingly...
>
> - sync_group_maxsize - default 20, min 0, max 10000
> - desired_sync_time  - default 60000ms, min 0ms (which shuts this off), max 6000000ms
>
> It then interacts in a somewhat complex fashion, where we tend to
> start with processing 1 SYNC, then doubling and adding 1, with
> [various attempts to set upper limits].
>
> In some testing one of my colleagues was doing recently, it seemed to
> be interacting in a fashion that made it appear as though the
> configuration parameters were not all being considered.
>
> I'm going to throw this out to the -hackers list, for comment
> *without* yet tossing out the further thoughts that I have; I'll
> comment further on that next week.

JP Fletcher observed the following issue, that apparently slon was
ignoring the sync_group_maxsize of 100, and running "way long."

  http://www.slony.info/bugzilla/show_bug.cgi?id=23

It seemed a bit surprising, save for the consideration that there are
a bunch of parameters, some of them encoded inside the code.

I'm going to start merely by describing what's there, without trying
to define what ought to be.

1.  The logic, as of v1.2, starts by trying to double the number of
SYNCs grouped together each time:

per code:
   max_sync = ((last_sync_group_size * 200) / 100) + 1;

The reason for this logic is that we gain some economies of scale by
combining SYNCs together, and so should presumably attempt, fairly
quickly, to increase the number of SYNCs towards a maximum.  Doubling
the number processed each time is a pretty quick way to do this.

There could be merit to parameterizing this, to change the rate of
increase from "x2" to "x5" or "x10" or "x1.5".

2.  There is logic to try to estimate an "ideal" number of SYNCs to
group together based on how long they took last time.

per code:
   ideal_sync = (last_sync_group_size * desired_sync_time) / last_sync_length;

With this, there is an attempt to try to have a "set of SYNCS" be
processed (and committed) every "desired_sync_time" ms.

It seems not-outrageous to say: "We'd like to group SYNCs so that we
get processing done every 5 minutes (e.g. - every 300000ms)", with,
thus, a COMMIT roughly every 5 minutes, if we're doing the huge amount
of work required to catch up after replication has fallen well behind.

Stopping every 5 minutes to do a COMMIT would seem likely to not
impose a huge extra burden over (say) doing a commit every 10 minutes.

Unfortunately, there's not necessarily any statistical reason to
believe that sizes of SYNCs are, in any sense, uniform in their size,
so it is entirely plausible for this approach to be *totally* wishful
thinking.

3.  On the other hand, if it turns out that we'll be doing a Seq Scan
across sl_log_1/sl_log_2, then there may be little sense in processing
anything much less than "the whole of sl_log_1 (or sl_log-2)".

If you're doing a Seq Scan each time you do *any* SYNC processing,
then the cost of pulling sl_log_n data is such that you might as well
process the Whole Thang, and do Seq Scan fewer times.

I believe that this is no longer an issue in Slony-I CVS HEAD, due to
the following having been committed on July 3, 2007:

<http://lists.slony.info/pipermail/slony1-commit/2007-July/001854.html>

4.  Log shipping introduces a desire to strictly minimize the number
of SYNCs grouped together, ideally to 1, in order that you might
substitute files from one node into another.

These are four competing ways of determining how many SYNCs might be
grouped together.

Right now, in 1.2 and -HEAD, we use a combination of 1. and 2., but
apparently with some bug present that allows the maximum grouping to
exceed the sync_group_maxsize parameter.

It might *possibly* be of some value to allow user control of how
quickly grouping accelerates (e.g. - a value to use in lieu of the
value "200" in the code fragment above).  Or perhaps we have too many
knobs already.  Opinions will be listened to.
-- 
let name="cbbrowne" and tld="linuxfinances.info" in name ^ "@" ^ tld;;
http://linuxfinances.info/info/x.html
The nice thing about Windows is  - It does not just crash, it displays
a dialog box and lets you press 'OK' first.  (Arno Schaefer's .sig)