[Slony1-bugs] [Bug 27] pg_listener is switching to/from polling mode too much

Fri Jan 4 15:03:31 PST 2008

http://www.slony.info/bugzilla/show_bug.cgi?id=27

Christopher Browne <cbbrowne at ca.afilias.info> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|                            |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #1 from Christopher Browne <cbbrowne at ca.afilias.info>  2008-01-04 15:03:31 ---
In observing a node, in the "throes" of replication, it's not evident that this
is a big problem.

- When a node is busy replicating, that is, when there's work to do almost
immediately, it doesn't switch back and forth to LISTEN/UNLISTEN.  That's
definitely correct behaviour, and the behaviour we wanted to see.

Where I'm seeing it jump back into "LISTEN" mode from "polling" mode (e.g. -
from SLON_POLLSTATE_POLL to SLON_POLLSTATE_LISTEN), it looks to me as though
there is a configuration issue.

The logic that controls this jump is as follows:

        if (ntuples > 0) {    /* e.g. - found events to process */
                poll_sleep = 0;
                poll_state = SLON_POLLSTATE_POLL;
        } else {
                poll_sleep = poll_sleep * 2 + sync_interval;
                if (poll_sleep > sync_interval_timeout) {
                        poll_sleep = sync_interval_timeout;
                        poll_state = SLON_POLLSTATE_LISTEN;
                }
        }

The "else" clause is the portion of interest...

Each consecutive time that no events are found, then the sleep time gets
doubled, and the config parameter "sync_interval" is added to it.  When the
total time exceeds "sync_interval_timeout", we jump back into the "LISTEN"
state.

It seems to me, in this case, that the value of sync_interval_timeout is likely
configured to be too low.  Note that the default values of sync_interval and
sync_interval_timeout are 2000 and 10000, respectively.  Given those defaults,
any time the remote listener loop doesn't find new events in a period of 10s,
it will jump into "LISTEN" mode.  That would presumably happen with the
following sequence...

- poll_sleep started at 0ms
- we found ntuples = 0, and so bumped poll_sleep up to 4000ms
- we then slept 4000ms, found no new work
- we then bump poll_sleep up to 10000ms
- we then slept 10000ms, finding no new work
- After all that, poll_state gets changed to SLON_POLLSTATE_LISTEN

Before changing logic, I'd be inclined to increase sync_interval_timeout to
20000 or 30000 (e.g. - 20s or 30s)

I'm adding notes to this effect to the FAQ.

-- 
Configure bugmail: http://www.slony.info/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.