Jan Wieck JanWieck
Thu Dec 1 19:40:12 PST 2005
On 12/1/2005 10:58 AM, Chris Browne wrote:

> We were discussing on IRC a notion for "saving memory" (recall the
> various posts where people have seen slon processes bloat to ginormous
> size due to there being enormous tuples); I thought I'd sum up some of
> the thoughts (the really good ones being Jan's :-)) as well as adding
> a bit of "aftermath thoughts."
> 
> The trouble is that, at present, how things work is we set up a cursor thus:
> (presented in pseudocode to just give the idea...)
> 
> declare LOG cursor for select * from sl_log_1 where (plenty of
> parameters!!!);
> 
> loop until done
>   fetch 100 from LOG;
>   for each entry
>     Apply to subscriber
>   done
> done

This is in fact grossly simplified. In reality there is a ring buffer 
for at maximum 5000 log statements per data provider. One helper thread 
per data provider is doing the fetches to fill the buffer while another 
thread, the remote_worker, is applying them to the subscriber.

The reason for this is that the remote_worker is slon's bottleneck. 
There is only one session that applies all updates serial (and no, this 
cannot be easily changed). So the ring buffer filled by other threads is 
to ensure that the data keeps coming and that the backend applying the 
log doesn't twiddle thumbs while slon is fetching data from the remote side.

Granted, having a buffer that can hold up to 50 x 100 fetched log lines 
ahead of the current apply process is a bit generous.

> 
> The trouble with this is that if you have 100 entries in sequence that
> are 5MB in size, because you have some table storing enormous
> BLOB-like stuff, all 100 have to be loaded into memory several times.
> 
> 100 x 5MB = 500MB, which probably becomes over 1GB by the time you're
> through.
> 
> The alternative discussed:
> 
> declare LOG cursor for select [various fields], 
> case when octet_length(log_cmddata < LIMIT) then log_cmddata else NULL end
> from sl_log_1 where (same plenty of
> parameters as before);
> 
> loop until done
>   fetch RECCOUNT from LOG;
>   for each entry
>     if log_cmddata is not null then
>       Apply to subscriber
>     else
>       select log_cmddata into temporary memory buffer and apply to subscriber
>     endif
>   done
> done
> 
> octet_length() evidently can draw the size of the column out of the
> TOAST table without needing to actually read in the data.

The size is either the VARHDR field or for toasted values available in 
the toast reference stored in the main tuple. Thus, octet_length() does 
not access the toast table at all.

The approach I am currently trying to implement looks a bit different, 
mainly more complex. I think of a smaller ring buffer of lets say 500 
log rows of size ROW_LIMIT. Plus a config value MAX_BUFFER which must be 
larger than 500 * ROW_LIMIT. If the helper encounters large rows it will 
allocate more memory until MAX_BUFFER is exceeded (yes, one over to 
ensure that a single row that is larger than MAX_BUFFER - 500 * 
ROW_LIMIT) does not stop replication alltogether). The apply thread will 
reduce free those extra allocations after it is done with them, so the 
helper thread will have to wait until the current memory consumption is 
below MAX_BUFFER again.

This way we will still keep the apply backend as busy as possible, 
instead of stopping to fetch alltogether at the first sight of a row 
larger than ROW_LIMIT.


Jan

> 
> Memory usage is then based on three things:
>  1.  LIMIT - what is the maximum size of a record that will be drawn in by the fetch?
>  2.  RECCOUNT - number of records drawn in...
>  3.  Well, when the size overflows LIMIT, you'll have ONE large record in memory at
>      a time.  No way around that...
> 
> The practical upshot of that is that memory consumption is [LIMIT x RECCOUNT].
> 
> 5000 bytes x 1000 records = 5MB, which might still need doubling, but
> which is, regardless, rather less than 1GB :-)
> 
> A thought came to me, this morning as followup.  The above approach
> has 2 "free parameters," LIMIT and RECCOUNT, which need to get
> configured.
> 
> A thought is to have the slon control this via one input parameter,
> MAXMEM, which would be the desired multiple of the two values.
> 
> Then we start by setting RECCOUNT to 100, LIMIT to MAXMEM / RECCOUNT,
> and can then adapt, after each FETCH has been processed.
> 
> Then some rules can automatically modify those limits...  Here's
> something arbitrarily made up that is probably not outrageous :-).
> 
> -> If we discover that "too many" records are being drawn in as
>    singletons, say, > 10%, then we need to increase LIMIT
> 
>    LIMIT = LIMIT * 1.1
>    if LIMIT > MAXMEM / 20 then  -- Surely we don't want a fetch < 20
>       LIMIT = MAXMEM / 20
>    endif
>    RECCOUNT = MAXMEM / LIMIT
>    
> 
> -> If we discover that "slightly too many" records are being drawn in as
>    singletons, say, > 5%, then we need to increase LIMIT
> 
>    LIMIT = LIMIT * 1.05
>    if LIMIT > MAXMEM / 20 then  -- Surely we don't want a fetch < 20
>       LIMIT = MAXMEM / 20
>    endif
>    RECCOUNT = MAXMEM / LIMIT
>    
> -> If we discover that < 1% of records (a lot too few) are drawn in as singletons,
>    then reduce LIMIT
> 
>    LIMIT = LIMIT / 1.1
>    if LIMIT < 500 then
>      LIMIT = 500
>    end if
>    RECCOUNT = MAXMEM / LIMIT
> 
> -> If we discover that < 3% of records (somewhat too few) are drawn in as singletons,
>    then reduce LIMIT
> 
>    LIMIT = LIMIT / 1.05
>    if LIMIT < 500 then
>      LIMIT = 500
>    end if
>    RECCOUNT = MAXMEM / LIMIT
> 
> That'll tune towards some form of "reasonableness."  I don't think
> "optimality" is the right word; the goal is to have LIMIT set high
> enough that you normally process *nearly* all of the records without
> needing the extra bit of "singleton" logic.


-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #


More information about the Slony1-general mailing list