Wed Apr 5 16:15:45 PDT 2006
- Previous message: [Slony1-general] about triggers in slave db
- Next message: [Slony1-general] Graceful switchover gone wrong.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Oh dear - why is nothing ever easy? :( I've just tried to gracefully swap master/slave roles with little success... cayenne:~# /usr/lib/postgresql/8.1/bin/slonik <slonswap.txt ... which provided no output for 5 minutes at which point I CTRL-C'd... This script is: cluster name = replication; node 1 admin conninfo='host=194.24.250.137 dbname=laterooms user=XXX port=5432 password=XXX'; node 2 admin conninfo='host=194.24.250.143 dbname=laterooms user=XXX port=5432 password=XXX'; lock set (id = 1, origin = 1); wait for event (origin = 1, confirmed = 2); move set (id = 1, old origin = 1, new origin = 2); wait for event (origin = 1, confirmed = 2); i.e. copy+paste directly from slony_115/failover.html The log for node 1 shows: 2006-04-05 23:49:32 BST CONFIG moveSet: set_id=1 old_origin=1 new_origin=2 2006-04-05 23:49:32 BST DEBUG1 remoteWorkerThread_2: helper thread for provider 2 created 2006-04-05 23:49:32 BST CONFIG storeListen: li_origin=2 li_receiver=1 li_provider=2 2006-04-05 23:49:35 BST DEBUG1 remoteWorkerThread_2: connected to data provider 2 on 'host=194.24.250.143 dbname=laterooms user=XXX port=5432 password=XXX' During this 'dead time' I tried to execute a simple UPDATE on node 1, and was told "ERROR: Slony-I: Table pbx_ext_state is replicated and cannot be modified on a subscriber node" - great - that's exactly what I'd expect. Unfortunately, I was told the same thing when I executed the same query on node 2! At this point I paniced and executed 'uninstall node (id=1)' to clean out the current machine's slony config so I could at least bring our website back online again. The next line in node 1's log after the above is: 2006-04-05 23:56:55 BST FATAL syncThread: "start transaction;set transaction isolation level serializable;select last_value from "_replication".sl_action_seq;" - ERROR: schema "_replication" does not exist which unsurprisingly is when I uninstalled node 1 :) I also saw a process on node 1 during the 'dead time' marked as 'idle in transaction'. I've not touched node 2 - what can the various sl_ tables tell me about why this process froze, and what should I be looking for if/when it happens when I try it again tomorrow? :( <sigh> Maybe I should just run away and join the circus... Cheers, Gavin
- Previous message: [Slony1-general] about triggers in slave db
- Next message: [Slony1-general] Graceful switchover gone wrong.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list