Jan Wieck JanWieck at Yahoo.com
Mon Jul 2 12:49:07 PDT 2007
On 7/2/2007 2:03 PM, Jan Wieck wrote:
> On 7/2/2007 1:45 PM, Marko Kreen wrote:
>> On 7/2/07, Jan Wieck <JanWieck at yahoo.com> wrote:
>>> The stuff I am currently (very slowly) working on is that very problem.
>>> Any long running transaction causes that the minxid in the SYNC's is
>>> stuck at that very xid during the entire runtime of the LRT. The problem
>>> with that is that the log selection in the slon worker uses an index
>>> scan who's only index scankey candidates are the minxid of one and the
>>> maxxid of another snapshot. That is the range of rows returned by the
>>> scan itself. Since the minxid is stuck, it will select larger and larger
>>> groups of log tuples only to filter out most of them on a higher level
>>> in the query via xxid_le_snapshot().
>> 
>> How the LRT problem is avoided in PGQ:
>> 
>> http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/skytools/skytools/sql/pgq/functions/pgq.batch_event_sql.sql?rev=1.2&content-type=text/x-cvsweb-markup
>> 
>> Basic idea is that there are only few LRT's, so its reasonable
>> to pick up bottom half of range by event txid, one-by-one.
> 
> Hmmm, that is an interesting idea. And it is (in contrast to what I've 
> been playing with) node insensitive, since it doesn't need info only 
> available on the event origin, like CLOG. Thanks.

Not only is it interesting, but it is astonishing simple to adopt into 
our code. I want to do some more testing before I commit this change, 
but the really interesting thing here is that it is only a 3 line change 
in the remote_worker.c file, which could easily be backported into 1.2.

I had created a really pathetic test case here by SIGSTOP'ing the slon 
while doing the copy_set() for a day, so it had some 90000 events 
backlog. About a third into that backlog, it was down to 60+ seconds 
delay for first log row and due to the dynamic in the group size, doing 
that on a single event base. That same database is now moving through 
the backlog in batches of 5-8 minutes each, has a <1 second delay for 
first log row and does those groups in 50-70 seconds.

This looks very promising.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #


More information about the Slony1-general mailing list