[Slony1-general] Long running TX causing out of memory errors...

Thu Sep 15 04:29:36 PDT 2005

Jim Archer wrote:

>
>> It's the slon worker process that dies by exhausting its memory. It
>> seems to want to read the entire xid's worth of log entries into memory
>> while applying them in background, and since reading is a lot faster
>
>
> What happens to the data on the origin when this happens?  Is there
> any corruption?  How would we recover from this?  Do we have to
> rebuild the nodes or just restart the slon?

Origin seems fine; everything just rolls back & aborts.

No corruption that I have seen.

Recovery is impossible without changes to slon (or deleting the big log
entry, but that's got its own problems).

Restarting slon does not help (except in some cases where is was
restoring a group of events, and can get through some of them each time,
apparently).

I have only taken a basic look at the slon code, but it seems the basic
process is:

thread type A: read all pending logs into memory (from all source nodes?)
thread type B: process in-memory logs to (single?) target DB.

(not sure if it does all source DBs at one time or in sequence).

What probably needs to happen is have a pair of threshold: 'empty'
threshold and 'full' threshold; when the type A thread reaches the
'full' threshold (in terms of current loaded log queue length), it
pauses (maybe finishes source TX, not sure). When type B thread reaches
'empty' threshold, it wakes up thread 'A'. So, in my case, these might
be set to empty=50 and full=100. Alternatively, the same basic approach
could be applied to buffer sizes rather than queue length: when the
total allocated (used?) buffer size exceeds the 'full' amount, thread A
pauses, when it drops below the 'empty' amount, thread A starts.