Thu Sep 27 15:56:41 PDT 2007
- Previous message: [Slony1-bugs] Slony 1.2.10: Deadlock on slave during execute script
- Next message: [Slony1-bugs] Slony 1.2.10: Deadlock on slave during execute script
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, 27 Sep 2007, Christopher Browne wrote: > Jeff Frost wrote: >> For extra background on this bug, see the thread here: >> http://lists.slony.info/pipermail/slony1-general/2007-September/006687.html >> >> This is on Slony1-1.2.10, PostgreSQL-8.1.9: >> >> I've run into a situation on a client cluster composed of a master and 2 >> slave nodes where a deadlock on one of the slaves happens quite regularly >> during any EXECUTE SCRIT commands. It seems if slony loses the deadlock, >> some of the tables are left in a not altered for replication state and this >> breaks replication. > I am now mighty confused as to how this could happen. > > I headed down a pretty void rabbit trail, trying to duplicate the condition > by trying to induce a deadlock. > > I had not much luck trying to do that; the tendancy was for deadlocks to > occur against the queries with the lower locking levels. > > So I decided to instrument a slon and try to induce it to get into the > described "no longer altered for replication" state. > I added a nice "exit(-1);" at exactly the point at which you found the > induction of the deadlock. The slon fell over at the end of processing the > DDL script, as expected. > > And when it fell over, the transaction was (implicitly) rolled back, so that > the condition of the subscriber was recovered to consistency. > > 2007-09-27 20:21:16 UTC DEBUG2 remoteWorkerThread_1: Received event 1,30 > DDL_SCRIPT > 2007-09-27 20:21:16 UTC INFO Checking local node id > 2007-09-27 20:21:16 UTC INFO Found local node id > 2007-09-27 20:21:16 UTC DEBUG4 version for "dbname=slonyregress2" is 80300 > 2007-09-27 20:21:16 UTC DEBUG4 remoteWorkerThread_3: update provider > configuration > 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL request with 10 > statements > 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 0: [create > sequence ncolseq;] > 2007-09-27 20:21:16 UTC DEBUG4 version for "dbname=slonyregress3 > host=localhost user=cbbrowne port=5834" is 80300 > 2007-09-27 20:21:16 UTC DEBUG4 version for "dbname=slonyregress2" is 80300 > 2007-09-27 20:21:16 UTC DEBUG4 version for "dbname=slonyregress2" is 80300 > 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK > 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 1: [ > alter table table4 add column newcol timestamptz;] > 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK > 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 2: [ > alter table table4 alter column newcol set default now();] > 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK > 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 3: [ > alter table table4 rename column id1 to col1;] > 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK > 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 4: [ > alter table table4 rename column id2 to col2;] > 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK > 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 5: [ > > ALTER TABLE billing_discount ADD column "use_term" character(1);] > 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK > 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 6: [ > ALTER TABLE billing_discount ALTER column use_term set default 'n';] > 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK > 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 7: [ > ALTER TABLE billing_discount add constraint use_term_cons check (use_term in > ('y','n'));] > 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK > 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 8: [ > > alter table table1 rename column id to col1;] > 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK > 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 9: [] > 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_EMPTY_QUERY > 2007-09-27 20:21:16 UTC DEBUG2 slon: child terminated status: 65280; pid: > 24126, current worker pid: 24126 > 2007-09-27 20:21:16 UTC DEBUG1 slon: restart of worker in 10 seconds > 2007-09-27 20:21:20 UTC DEBUG1 slon: shutdown requested > 2007-09-27 20:21:20 UTC DEBUG2 slon: notify worker process to shutdown > 2007-09-27 20:21:20 UTC DEBUG1 slon: done > 2007-09-27 20:21:20 UTC DEBUG2 slon: exit(0) > > So, for instance, statement #6 was going to alter table billing_discount to > add a CHECK constraint: > > As we can see, that constraint isn't there. > > slonyregress2@[local]:5834=# \d billing_discount > Table "public.billing_discount" > Column | Type | > Modifiers > ---------------------+--------------------------+-------------------------------------------------------------------- > discount_code | character(2) | not null > billing_object_type | character varying(10) | not null > billing_action_type | character varying(10) | not null > discount_amount | numeric(7,2) | not null > start_date | timestamp with time zone | not null > end_date | timestamp with time zone | not null > billing_discount_id | integer | not null default > nextval(('billing_discount_seq'::text)::regclass) > registrar_id | integer | > tld_id | integer | > zone_id | integer | > Indexes: > "billing_discount_pkey" PRIMARY KEY, btree (billing_discount_id) > Triggers: > _slony_regress1_denyaccess_5 BEFORE INSERT OR DELETE OR UPDATE ON > billing_discount FOR EACH ROW EXECUTE PROCEDURE > _slony_regress1.denyaccess('_slony_regress1') > > slonyregress2@[local]:5834=# > > This heads me back to the "wait, that's impossible!" expectation that I > started with. > > When the ERROR message was submitted to the slon log, that means that the > transaction rolls back, and everything recovers to where it was. > > That's what I see happen here; I can't quite fathom why that's not happening > there. I guess it's fun to be unique. :-) And your introduction of the failure modes was on the slave and not the master, correct? Were you testing with 1.2.11 and not 1.2.10? Could I possibly be misinterpreting the log messages? BTW, thanks for looking into this. -- Jeff Frost, Owner <jeff at frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
- Previous message: [Slony1-bugs] Slony 1.2.10: Deadlock on slave during execute script
- Next message: [Slony1-bugs] Slony 1.2.10: Deadlock on slave during execute script
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-bugs mailing list