[Slony1-general] STILL can't migrate a node.

Fri Feb 22 21:20:14 PST 2008

A little more info on this problem...

Craig James wrote:
> I'm trying to migrate a node for the second time, and no luck.  Last 
> time I tried it, it just got stuck, and due to lack of time, I didn't 
> investigate.
> 
> This time I watched -- it got stuck again, doing some sort of huge 
> SELECT statement.  I was under the impression that migrating a node was 
> a fairly simple operation that should happen in a short time (less than 
> a minute?) even for large databases.
> 
> I waited 10 minutes, during which the entire system was completely 
> locked up (no other process could access the database), and our web site 
> was offline.  I finally had to kill all of the slon daemons and kill 
> Postgres to get our site back on the air, then run the node-unlock 
> command to get Slony back in shape.
> 
> This system appears to otherwise be working well.  I can insert, update 
> and delete records, and they're copied to the slave node immediately.
> 
> What's up?  Am I just too impatient?

I tried it again, after vacuuming the slony tables that are subject to bloat.  This time I shut everything off, started the migration of the master to node 2, and waited for 35 minutes, but the SELECT never finished.  vmstat showed massive I/O and CPU activity the whole time.

Again, after I killed postgres, restarted, and unlocked the node, Slony went back to performing perfectly.

Any help would be appreciated ... I have to do this before about noon Saturday in order to complete the rest of the weekend's tasks by Sunday evening.

Here is the script I'm using:

slonik <<_EOF_

cluster name = db_cluster;
node 1 admin conninfo = 'dbname=db host=server1 user=postgres';
node 2 admin conninfo = 'dbname=db host=server2 user=postgres';

lock set (id = 1, origin = 1);
wait for event (origin = 1, confirmed = 2);
move set (id = 1, old origin = 1, new origin = 2);

_EOF_

Thanks,
Craig