[Slony1-general] Re: conflict beween slony and pg_bulkload causing postmaster crash

Mon Sep 17 11:32:11 PDT 2007

On Mon, Sep 17, 2007 at 10:58:56AM -0700, Jason L. Buberel wrote:
> Found the archive entry, and I think you did a very good job of 
> convincing me to NOT use your script :)

Well, the point of all that was to emphasise what the docs also say,
which is that you basically can't do bulk loading coherently on a
table that is being replicated and is active. 

If you can be sure that you can re-load the table everywhere (i.e.
you can remove all the data everywhere, fix whatever the problem was,
and then reload everywhere), then it's a suitable tool (in principle
-- no warranty, &c.). 

This general caution is true for any bulk loading on an asynchronous
replication system like Slony -- if the goal is (1) not to send all
the data across the wire "retail" and (2) not to lock the entire
cluster at one time, then you're going to run a risk that node
members will be out of sync with one another.  The script as is
attempts to make that as safe as possible, but it's never going to be
a risk-free operation.

> Given that introduction, would any sane person even consider it?

Given that I wrote it in order to facilitate bulk loading in our
production systems for a particularly sensitive set of data (i.e. if
I was wrong, up to 10% of the Internet could have gone dark), I
am quite sure that it will solve the subset of problems it identifies
as its target.  I just wanted to make sure that others think really
hard before using it.  In particular, it is very important to
emphasise that this cannot be used _regularly_ on active tables. 
This is not the solution for a daily bulk load into an active table,
is all.  The perhaps alarmist disclaimers are there partly because
we've had plenty of people on this list already saying, "Slony breaks
when I do X; that's a bug," in the face of a manual which explicitly
says, "Never do X."  I didn't want to add another vector for such
complaints.

I will note, too, that our DBAs concluded that a more conventional
"load; add to replication and subscribe" sequence worked well enough
for them.  While I was working on this problem, apparently someone
found the cause of a bottleneck in the network, and fixed it.  It was
that bottleneck I was coding around.  Therefore, I have no evidence
that this script has ever been used in production.  It worked
repeatedly in our QA environment, however.

A
-- 
Andrew Sullivan  | ajs at crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what 
you told them to.  That actually seems sort of quaint now.
		--J.D. Baldwin