[Slony1-general] Slow-ny

Thu Nov 11 15:22:01 PST 2004

On Thu, 2004-11-11 at 08:28 -0500, Jan Wieck wrote:
> On 11/10/2004 9:28 PM, Rod Taylor wrote:
> 
> > On Wed, 2004-11-10 at 09:25 -0500, Rod Taylor wrote:
> >> The declare'd statement which finds the 2000 or so statements within a
> >> "snapshot" seems to take a little over a minute to FETCH the first 100
> >> entries. Statement attached.
> > 
> > After bumping the snapshots to 100, it managed to get through the
> > contents of the 48 hour transaction within about 8 hours -- the 48 hour
> > transaction was running while slony was doing the COPY step.
> > 
> > I think Slony could do significantly better if it retrieved all sync's
> > which have the same minxid in one shot -- even if this goes far beyond
> > the maximum sync group size.
> 
> The reason for not going beyond the maximum sync group size is to avoid 
> doing the whole work again if anything goes wrong. It is bad enough that 
> the copy_set needs to be done in one single, humungous transaction.
> 
> What you seem to experience here are several of the nice performance 
> improvements that PostgreSQL received after 7.2 ... just in reverse.

I disagree. I fixed most of those within Slony.  The issue now (common
for all versions of PostgreSQL and Slony) is that at one point the
log_xid range being requested by the LOG cursor covered 22Million tuples
(17 million XIDs) which is trimmed back down by the *snapshot functions.

    and (log_xid < '1715803287' and _test1_xxid_lt_snapshot(log_xid,
'1715764209:1715803287:''1715764209'',''1715803088'',''1715785290'''))
    and (log_xid >= '1698717542' and _test1_xxid_ge_snapshot(log_xid,
'1698717542:1705890743:''1705890707'',''1705821719'',''1705889897'',''1705890741'',''1705859044'',''1705890086'',''1705890344'',''1705885231'',''1698717542'''))

And one from about mid-way through looked like:

    and (log_xid < '1705890743' and _test1_xxid_lt_snapshot(log_xid,
'1698717542:1705890743:''1705890707'',''1705821719'',''1705889897'',''1705890741'',''1705859044'',''1705890086'',''1705890344'',''1705885231'',''1698717542'''))
    and (log_xid >= '1698717542' and _test1_xxid_ge_snapshot(log_xid,
'1698717542:1705866099:''1705866060'',''1705866038'',''1705821719'',''1705859044'',''1705866092'',''1705865763'',''1705866097'',''1705858396'',''1698717542'''))

repeat hundreds of times. When the log_xid range is large, you're going
to sift through a ton of data, doesn't matter what version of PostgreSQL
is used.

If the 1698717542 transaction was twice as long, it would have been
falling behind.

> > - Have Slony track the last minXid for a sync group.
> > - If the new minXid for the current Sync Group is the same as the
> > previous, see if there are other Sync's with the same min. If there are,
> > add them to the current group and process them all a single extra large
> > group.
> 
> Have you thought about the side effects of such strategy? For example, 
> the minxid of all transactions that start while a pg_dump is running 
> will be the same. I am not sure you really want to apply hours worth of 
> replication data in one step.

I don't have a problem with that.  I would much prefer (to at least have
the option) of applying hours worth of data in a single step which takes
hours, rather than sitting there waiting for days as it tries to catch
up in pieces bits and pieces.

--