[Slony1-general] Soliciting ideas for v2.0

Fri Jun 29 15:06:59 PDT 2007

Jeff Davis <pgsql at j-davis.com> writes:
> On Thu, 2007-06-28 at 18:17 -0400, Christopher Browne wrote:
>> Are there more items we should try to add?
>
> Is there a good way to make it more efficient to do large
> deletes/updates?
>
> For instance, if the number of tuples that need to be deleted for a
> transaction exceeds a certain amount, could we use a different process
> for the delete on the subscriber so that it doesn't do millions of
> single-tuple deletes?
>
> I don't know exactly how that would work, perhaps by using a temporary
> table on the subscriber and doing a single "DELETE FROM foo WHERE id IN
> (SELECT id FROM foo_delete_tmp)" or something?

Ah, yes.

It would be nontrivial to improve this in the case of a multi-column
key, but if the candidate primary key consists of a single column, one
might be able to detect this.

Here's a sample, of sorts...

slonyregress1@[local]:5834=# select * from _slony_regress1.sl_log_1 where log_cmdtype = 'D' order by log_actionseq limit 10;
 log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype | log_cmddata 
------------+---------+-------------+---------------+-------------+-------------
          1 | 3939096 |           2 |          9277 | D           | id='1'
          1 | 3939096 |           1 |          9278 | D           | id='1'
          1 | 3939096 |           2 |          9279 | D           | id='2'
          1 | 3939096 |           1 |          9280 | D           | id='2'
          1 | 3939096 |           2 |          9281 | D           | id='3'
          1 | 3939096 |           2 |          9282 | D           | id='4'
          1 | 3939096 |           2 |          9283 | D           | id='6'
          1 | 3939096 |           2 |          9284 | D           | id='9'
          1 | 3939096 |           2 |          9285 | D           | id='13'
          1 | 3939096 |           2 |          9286 | D           | id='18'
(10 rows)

This would normally get turned into:

delete from table2 where id = '1';
delete from table1 where id = '1';
delete from table2 where id = '2';
delete from table1 where id = '2';
... and so forth ...

It should, in principle, be possible to group together requests for one table, so that we could have the following:

delete from table2 where
   id = '3'
    or
   id = '4'
    or
   id = '6'
    or
   id = '9'
    or
   id = '13'
    or
   id = '18';

where this groups together the consecutive entries for table #2.

This actually still works for multiple-column PKs as long as we put
parentheses around each "log_cmddata" entry.  The win isn't in turning
this into a parser-challengingly-huge single query; it is in doing
several entries as one query, so I'd be inclined to let it start a new
DELETE request any time the size of the query reaches 100 tuples.

Looking at the update case...

slonyregress1@[local]:5834=# select * from _slony_regress1.sl_log_1 where log_cmdtype = 'U' order by log_actionseq limit 10; 
 log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype |        log_cmddata         
------------+---------+-------------+---------------+-------------+----------------------------
          1 | 3943148 |           2 |         18633 | U           | data='foo' where id='8340'
          1 | 3943148 |           2 |         18634 | U           | data='foo' where id='8341'
          1 | 3943148 |           2 |         18635 | U           | data='foo' where id='8342'
          1 | 3943148 |           2 |         18636 | U           | data='foo' where id='8343'
          1 | 3943148 |           2 |         18637 | U           | data='foo' where id='8344'
          1 | 3943148 |           2 |         18638 | U           | data='foo' where id='8345'
          1 | 3943148 |           2 |         18639 | U           | data='foo' where id='8346'
          1 | 3943148 |           2 |         18640 | U           | data='foo' where id='8347'
          1 | 3943148 |           2 |         18641 | U           | data='foo' where id='8348'
          1 | 3943148 |           2 |         18642 | U           | data='foo' where id='8349'
(10 rows)

It would take some parsing of the log_cmddata to do this, nonetheless,
I think it ought to be possible to compress this into some smaller
number of queries.  Again, if we limited each query to process 100
tuples, at most, that would still seem like enough to call it a "win."

I'll take a look...
-- 
output = reverse("ofni.secnanifxunil" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/rdbms.html
"When we write programs that "learn", it turns out that we do and they
don't." -- Alan J. Perlis