Fri Sep 28 07:20:25 PDT 2007
- Previous message: [Slony1-bugs] Slony 1.2.10: Deadlock on slave during execute script
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jeff Frost a écrit : > On Thu, 27 Sep 2007, Christopher Browne wrote: > >> Jeff Frost wrote: >>> For extra background on this bug, see the thread here: >>> http://lists.slony.info/pipermail/slony1-general/2007-September/006687.html >>> >>> This is on Slony1-1.2.10, PostgreSQL-8.1.9: >>> >>> I've run into a situation on a client cluster composed of a master >>> and 2 slave nodes where a deadlock on one of the slaves happens >>> quite regularly during any EXECUTE SCRIT commands. It seems if >>> slony loses the deadlock, some of the tables are left in a not >>> altered for replication state and this breaks replication. >> I am now mighty confused as to how this could happen. >> >> I headed down a pretty void rabbit trail, trying to duplicate the >> condition by trying to induce a deadlock. >> >> I had not much luck trying to do that; the tendancy was for deadlocks >> to occur against the queries with the lower locking levels. >> >> So I decided to instrument a slon and try to induce it to get into >> the described "no longer altered for replication" state. I added a >> nice "exit(-1);" at exactly the point at which you found the >> induction of the deadlock. The slon fell over at the end of >> processing the DDL script, as expected. >> >> And when it fell over, the transaction was (implicitly) rolled back, >> so that the condition of the subscriber was recovered to consistency. >> >> 2007-09-27 20:21:16 UTC DEBUG2 remoteWorkerThread_1: Received event >> 1,30 DDL_SCRIPT >> 2007-09-27 20:21:16 UTC INFO Checking local node id >> 2007-09-27 20:21:16 UTC INFO Found local node id >> 2007-09-27 20:21:16 UTC DEBUG4 version for "dbname=slonyregress2" is >> 80300 >> 2007-09-27 20:21:16 UTC DEBUG4 remoteWorkerThread_3: update provider >> configuration >> 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL request with >> 10 statements >> 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 0: >> [create sequence ncolseq;] >> 2007-09-27 20:21:16 UTC DEBUG4 version for "dbname=slonyregress3 >> host=localhost user=cbbrowne port=5834" is 80300 >> 2007-09-27 20:21:16 UTC DEBUG4 version for "dbname=slonyregress2" is >> 80300 >> 2007-09-27 20:21:16 UTC DEBUG4 version for "dbname=slonyregress2" is >> 80300 >> 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK >> 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 1: [ >> alter table table4 add column newcol timestamptz;] >> 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK >> 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 2: [ >> alter table table4 alter column newcol set default now();] >> 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK >> 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 3: [ >> alter table table4 rename column id1 to col1;] >> 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK >> 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 4: [ >> alter table table4 rename column id2 to col2;] >> 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK >> 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 5: [ >> >> ALTER TABLE billing_discount ADD column "use_term" character(1);] >> 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK >> 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 6: [ >> ALTER TABLE billing_discount ALTER column use_term set default 'n';] >> 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK >> 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 7: [ >> ALTER TABLE billing_discount add constraint use_term_cons check >> (use_term in ('y','n'));] >> 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK >> 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 8: [ >> >> alter table table1 rename column id to col1;] >> 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK >> 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 9: [] >> 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_EMPTY_QUERY >> 2007-09-27 20:21:16 UTC DEBUG2 slon: child terminated status: 65280; >> pid: 24126, current worker pid: 24126 >> 2007-09-27 20:21:16 UTC DEBUG1 slon: restart of worker in 10 seconds >> 2007-09-27 20:21:20 UTC DEBUG1 slon: shutdown requested >> 2007-09-27 20:21:20 UTC DEBUG2 slon: notify worker process to shutdown >> 2007-09-27 20:21:20 UTC DEBUG1 slon: done >> 2007-09-27 20:21:20 UTC DEBUG2 slon: exit(0) >> >> So, for instance, statement #6 was going to alter table >> billing_discount to add a CHECK constraint: >> >> As we can see, that constraint isn't there. >> >> slonyregress2@[local]:5834=# \d billing_discount >> Table "public.billing_discount" >> Column | Type | Modifiers >> ---------------------+--------------------------+-------------------------------------------------------------------- >> >> discount_code | character(2) | not null >> billing_object_type | character varying(10) | not null >> billing_action_type | character varying(10) | not null >> discount_amount | numeric(7,2) | not null >> start_date | timestamp with time zone | not null >> end_date | timestamp with time zone | not null >> billing_discount_id | integer | not null default >> nextval(('billing_discount_seq'::text)::regclass) >> registrar_id | integer | >> tld_id | integer | >> zone_id | integer | >> Indexes: >> "billing_discount_pkey" PRIMARY KEY, btree (billing_discount_id) >> Triggers: >> _slony_regress1_denyaccess_5 BEFORE INSERT OR DELETE OR UPDATE ON >> billing_discount FOR EACH ROW EXECUTE PROCEDURE >> _slony_regress1.denyaccess('_slony_regress1') >> >> slonyregress2@[local]:5834=# >> >> This heads me back to the "wait, that's impossible!" expectation that >> I started with. >> >> When the ERROR message was submitted to the slon log, that means that >> the transaction rolls back, and everything recovers to where it was. >> >> That's what I see happen here; I can't quite fathom why that's not >> happening there. > > I guess it's fun to be unique. :-) > > And your introduction of the failure modes was on the slave and not > the master, correct? Were you testing with 1.2.11 and not 1.2.10? > Could I possibly be misinterpreting the log messages? > > BTW, thanks for looking into this. > I personaly get these conditions with slony 1.2.10, PG 8.2.3 or 8.2.4 and a pgpool for web clients to connect to the slave. Slony does not connect through pgpool, and pgpool is not installed on the master. Regards, SAS
- Previous message: [Slony1-bugs] Slony 1.2.10: Deadlock on slave during execute script
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-bugs mailing list