Jeff Davis pgsql at j-davis.com
Mon Jul 9 12:18:17 PDT 2007
On Mon, 2007-07-09 at 08:46 -0400, Jan Wieck wrote:
> > Why does this not apply to UPDATEs as well? 
> 
> I can see how multiple log rows for DELETE might be combined into one 
> subscriber side DELETE statement with a WHERE clause using IN (BETWEEN 
> would be difficult in the case of multi-column primary keys).
> 
> How you intend to do the same for INSERT or UPDATE is unclear to me. As 
> said before, there is no SQL statement logging available and even if it 
> where, due to MVCC it won't do us any good because Slony does not 
> replicate single transactions in their exact commit order. This 
> visibility problem is one of the tricky details that killed pgcluster 1. 
> I won't let it sneak into Slony.

I am _not_ suggesting statement replication in Slony, and that was not
the direction I intended to take this discussion (statement replication
is really more the domain of pgpool, which is interesting but a
completely different design).

As I understand it, the subscriber reads the log from the provider and
turns the log into individual statements on the subscriber.

Instead, the subscriber could read the log on the provider into a temp
table on the subscriber, then the subscriber could do a "DELETE FROM foo
WHERE ... IN (select ... from temp_table)" or an "UPDATE ... SET ...
FROM temp_table WHERE ...". This should not be any different than
running individual statements within a single transaction, but is more
efficient in some cases (bulk deletes/updates).

In the quick test I just ran (attached ruby script), it took 160s to
COPY the 5M updates into a temp table, and then run one UPDATE that
would update every record in the table by joining with the temp table
(the COPY was a few seconds, so most of the 160s was the single UPDATE
statement). That's way faster than the 15 minutes it took to update one
record at a time.

Regards,
	Jeff Davis



More information about the Slony1-general mailing list