Christopher Browne cbbrowne
Mon Jul 31 11:18:45 PDT 2006
Jan Wieck wrote:
> On 7/31/2006 8:05 AM, Sean Kirkpatrick wrote:
>>> Supposing we had a single column primary key, it ought to be
>>> possible for
>>> the slon to recognize:
>>>
>>>  - I'm issuing "delete from foo where id = x;"
>>>
>>>  - Hmm.  The next statement is "delete from foo where id = y;"
>>>
>>>  - That could be folded into the statement:
>>>      delete from foo where id in (x, y);
>>>
>>>  - If we keep deferring, we might discover that the next statement is
>>>      delete from foo where id = z, which could turn that into...
>>>
>>>      delete from foo where id in (x,y,z);
>>  
>> This sounds like a viable way to reduce the number of statements to
>> propagate.
>
> It would only work reliable for immediately contiguous delete
> statements that are not mixed with any other insert, update or delete
> and that fall into one and the same SYNC chunk. There are insert,
> update, delete sequences that could otherwise become impossible to
> replicate.
>
> For a simple example, imagine an application does this, not
> necessarily in one transaction, but that all these statements belong
> to one sync chunk:
>
> delete ... where id = 2;
> insert ... (id) values (3);
> delete ... where id = 3;
> insert ... (id) values (2);
>
> In this case you cannot group all deletes together. If you execute
> them before the inserts, you will end up with an unwanted (and wrong)
> id=3. If you try to do them after, you never get there because the
> insert with id=2 will cause a duplicate key. And you can't do it
> before and after or you end up without the required id=2.
>
100% agreed.

That happens to also be a case where there won't be a difference in
behaviour / performance between what happened on the master and what
will happen on subscribers.

The scenario where this would provide some benefit would be the case
where the original query did a mass deletion of tuples.

Thus, you had, on the master node:

  delete ... where id between 2 and 2500;

which transforms to

  delete ... where id =2;
  delete ... where id =3;
  delete ... where id =7;
  delete ... where id =5;
  delete ... where id =4;
  delete ... where id =10;
[and so forth, staggering through the ~2497 possible ID values]

Even when interspersed with other activity, it seems likely that a lot
of these deletes will run in sequence together.
> Considering that deletes have the smallest footprint of all replicated
> operations, I don't think the effort necessary to find possible
> sequences of delete statements that can be grouped together without
> any risk is not worth it.
I tend to agree, from the perspective that this introduces quite a bit
of complexity, and a non-zero bit of risk, into some C server code that,
by nature, is not trivial to modify in the fairly extensive way that
would be required.



More information about the Slony1-general mailing list