Wed Sep 26 08:25:28 PDT 2007
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
For extra background on this bug, see the thread here: http://lists.slony.info/pipermail/slony1-general/2007-September/006687.html This is on Slony1-1.2.10, PostgreSQL-8.1.9: I've run into a situation on a client cluster composed of a master and 2 slave nodes where a deadlock on one of the slaves happens quite regularly during any EXECUTE SCRIT commands. It seems if slony loses the deadlock, some of the tables are left in a not altered for replication state and this breaks replication. This is the latest SQL that caused the problem (note there is not a COMMIT in the sql): -------- CREATE TABLE orders.amazon_items ( id serial NOT NULL, order_id integer NOT NULL, item_id integer NOT NULL, amazon_item_id character varying(14) NOT NULL, CONSTRAINT amazon_items_pkey PRIMARY KEY (id), CONSTRAINT amazon_items_order_id_fkey FOREIGN KEY (order_id) REFERENCES orders.orders (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT amazon_items_item_id_fkey FOREIGN KEY (item_id) REFERENCES orders.items (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) WITH OIDS; ALTER TABLE orders.amazon_items OWNER TO thenerds; -------- It was called by the following slonik script: -------- #!/usr/bin/slonik include </nerds/preamble.slonik>; EXECUTE SCRIPT ( SET ID = 1, FILENAME = '/nerds/thenerds.sql', EVENT NODE = 1 ); -------- and caused the following deadlock to occur: 15:27:54 sql1 slon[12252]: [39-1] 2007-09-18 15:27:54 EDT ERROR remoteWorkerThread_1: "select "_nerdcluster".ddlScript _complete_int(1, -1); " PGRES_FATAL_ERROR Sep 18 15:27:54 sql1 slon[12252]: [39-2] ERROR: deadlock detected Sep 18 15:27:54 sql1 slon[12252]: [39-3] DETAIL: Process 12263 waits for AccessExclusiveLock on relation 121589880 of databas e 121589046; blocked by process 12096. Sep 18 15:27:54 sql1 slon[12252]: [39-4] Process 12096 waits for AccessShareLock on relation 121589817 of database 121589046; blocked by process 12263. Which then left some of the tables on that slave in a bad state breaking replication: 2007-09-18 15:56:06 EDT ERROR remoteWorkerThread_1: "select "_nerdcluster".ddlScript_prepare_int(1, -1); " PGRES_FATAL_ERROR ERROR: Slony-I: alterTableRestore(): Table "public"."carts" is not in altered state CONTEXT: SQL statement "SELECT "_nerdcluster".alterTableRestore( $1 )" PL/pgSQL function "ddlscript_prepare_int" line 46 at perform Note that it's just an AccessShareLock that's killing us. Looks like that's caused by a select query which does searches. Our application does not produce any extraneous locking, it simply does SELECTS on that server. Interestingly, before we started using the slave for queries, the deadlocks would happen on the master when doing DDL changes, but this never caused the tables on the master to get into a bad state. You could just re-run your EXECUTE SCRIPT and it would usually work fine the second time. -- Jeff Frost, Owner <jeff at frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-bugs mailing list