[Slony1-general] Re: [Slony1-commit] By cbbrowne: Feature Request #1280

Thu Dec 8 19:13:27 PST 2005

Jan Wieck wrote:

> On 12/8/2005 10:51 AM, CVS User Account wrote:
>
>> Log Message:
>> -----------
>> Feature Request #1280 - Reduce generation of LISTEN/NOTIFY entries to
>> cut pg_listener bloat
>>
>> Attached is a patch to the slon code which should dramatically reduce
>> the amount of pg_listener bloat.
>>
>> The notion is that when the slon is working well, processing events with
>> great regularity, we switch over to do polling, and UNLISTEN to events
>> being handled via pg_listener, which entirely eliminates the
>> tendancy for pg_listener to bloat up.
>>
>> By now, people probably know my tendancy to prefer adaptive
>> self-configuration, where possible  :-) .
>>
>> We have two states enumerated:
>>
>> enum pstate_enum {POLL=1, LISTEN};
>> static enum pstate_enum pstate;
>>
>> Polling times are managed via poll_sleep...
>> static int poll_sleep;
>>
>> - Any time the event loop finds events, the sleep time gets set to 0 so
>> that the next iteration starts immediately.
>>
>> - Any time the event loop doesn't find events, we double poll_sleep and
>> add sync_interval (the -s option value)
>
>
> But this is the subscribers sync_interval, not the one on the origin
> or data provider. Normally a subscriber doesn't need a very high sync
> interval. The sync_interval controls how often a node looks at the
> log_actionseq if there have been any changes, so it is in fact a
> polling interval of the origins slon against the master DB. Consider
> this case:
>
> The origin is running with -s1000, the subscriber -s10000. This means
> that on a busy site, the origin will generate a SYNC every second. If
> your subscriber just got an event via NOTIFY, it sets the sleep time
> to 0 seconds. So it polls immediately after receiving this event,
> doesn't find another one, adds its own sync_interval to the value and
> will poll again in 10 seconds.
>
> I'd say this is a little too sleepy ;-)

I'm pleased to see comment :-).

My goal here is to try to head *rapidly* back to the point where we
start using NOTIFY again.  That would cut the cost of polling (which is
costly in that it throws queries at the DB)...

It seems to me that an implication here is that I'd prefer to use the
same -s/-t parameters for both master and slaves.  (Which, by stroke of
coincidence, is what happens, in production :-).)

That has the merit that if we switch roles (MOVE SET), we don't need to
reconfigure the slons :-).

On the subscriber, the -s parameter isn't the interesting one.  It never
sees a local update, so it is really primarily driven by the -t
parameter.  That's actually why I'm relatively uninterested in having
the -s parameter vary between nodes.

What happens on our site is that the origin and subscriber both run with
-s1000 and -t60000.

So if the subscriber gets an event via NOTIFY, it sets sleep time to
0s.  If it then *doesn't* find anything queued, it'll sleep for 1s. 
Then 3, then 7, then 15, 31, (all of which are represented via 2^n-1
:-)) and then heads back into "NOTIFY" land.  Of course, in each case,
if it got any work, the time would get dropped back to 0.  The "worst
case" scenario takes place if things suddenly get very busy immediately
after beginning the 31s poll.  Which means that there hadn't been any
work at all in the last 26s...

It seems to me that a bit of fiddling with -s/-t values can improve the
behaviour without touching the code at all.  It is, of course, an
opportune time to touch the code :-)