[Slony1-general] Proposal: Move gradually to sync_group_maxsize

Wed Dec 15 22:52:58 PST 2004

On December 15, 2004 02:29 pm, Christopher Browne wrote:
> After having gone through a fair bit of non-fun in tuning watchdog
> processes to be more delicate, I have noticed one place where I'd like
> to tune slon behaviour a bit.
>
> There's the "-g" option, which sets sync_group_maxsize, which
> indicates the maximum number of SYNCs that will get grouped together.
>
> I'd really like to handle that more conservatively in a couple of
> ways:
>
>  1.  I'd like to have a freshly-started slon process work its way up
>      to sync_group_maxsize, as opposed to starting there.
>
>      Reasoning:
>
>      Suppose we have a slon that "goes bump in the night" for a while,
>      leaving its node 800 syncs behind its provider.  And the "-g"
>      option is trying to group together 50 syncs at a time.
>
>      It is possible that what 'went bump' had something to do with the
>      last slon run, which got 30 syncs in, and then fell over.  And
>      when we try to do 50 syncs, it'll fall over for the very same
>      reason.
>
>      In such a case, perhaps sync #30 is an unusually atrociously
>      large one, that size having to do with either slon or a
>      postmaster running out of memory and falling over.
>
>      In any of these cases, it would be nice if we started by doing
>      just 1 SYNC, and working our way up to 50 gradually.
>
>      Thus, in remote_worker.c, instead of
>
>      while (sync_group_size < sync_group_maxsize && node->message_head !=
> NULL) { stuff...
>      }
>
>      we'd use...
>
>      /* Define and initialize variable */
>      static int our_group_max_size = 1;
>
>      while (sync_group_maxsize < our_group_max_size && node->message_head
> != NULL) { if (our_group_max_size < sync_group_maxsize) {
>           our_group_max_size++;
>         }
>         stuff...
>      }
>
>      This has the effect that if there's one Really Big SYNC that is
>      taking something (a postmaster process somewhere runs out of
>      memory?) down, it'll get the best chance possible of getting
>      through those Really Big SYNCs without falling into the rut of
>      falling over over and over.
>
>      Coding this seems pretty trivial; I'd thought I'd solicit
>      comments, if anyone has further thoughts.

Yeah I like this one, though I wonder if it might be more effective to have 
this slowstart be somewhat dependent on the number of sync events waiting to 
be processed in sl_log.  (further thought on this needed).

>
>  2.  Further, it would be rather nice to be able to say:
>
>      "Hey!  We have just finished our 4th sync in this group, and it
>      turns out that #4 contained 280,000 updates, meaning that that
>      SYNC took 2 minutes to take effect.  This is more than plenty to
>      get efficiencies by grouping work together.  No sense in going on
>      to 50 in the group. Let's stop and COMMIT now."
>
>      I'm not so sure that is possible/practical.  It looks as though
>      once you decide how many SYNCs you're grouping together, that
>      commits you up front to a query that might possibly pull more
>      data than you wanted to bargain for :-(.

Hmm interesting concept, I think  I can sere a way to make this happen, though 
how usefull it is is a different item.

Expect another post on this topic later on tonight once I've had a chance to 
digest my thoughts.

-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com