Tue Jun 2 13:49:26 PDT 2009
- Previous message: [Slony1-general] Initial replication of sequences is failing
- Next message: [Slony1-general] Initial replication of sequences is failing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Thanks for the idea, Jeff. I inspected the log more closely and found what I think is the _real_ problem (unexpected chunk number)... 2009-05-28 23:49:31 CDT DEBUG2 remoteWorkerThread_1: Begin COPY of table "finch"."blastres_qry" 2009-05-28 23:49:31 CDT DEBUG2 remoteWorkerThread_1: nodeon73 is 0 NOTICE: truncate of "finch"."blastres_qry" failed - doing delete 2009-05-28 23:49:31 CDT DEBUG2 remoteListenThread_1: LISTEN 2009-05-28 23:49:33 CDT DEBUG2 remoteListenThread_1: queue event 1,4141 SYNC 2009-05-28 23:49:33 CDT DEBUG2 remoteListenThread_1: UNLISTEN 2009-05-28 23:49:34 CDT DEBUG2 syncThread: new sl_action_seq 1 - SYNC 2412 2009-05-28 23:49:38 CDT DEBUG2 localListenThread: Received event 2,2412 SYNC 2009-05-28 23:49:44 CDT DEBUG2 syncThread: new sl_action_seq 1 - SYNC 2413 2009-05-28 23:49:44 CDT DEBUG2 localListenThread: Received event 2,2413 SYNC 2009-05-28 23:49:45 CDT DEBUG2 remoteListenThread_1: queue event 1,4142 SYNC 2009-05-28 23:49:53 CDT DEBUG2 remoteListenThread_1: LISTEN 2009-05-28 23:49:54 CDT DEBUG2 syncThread: new sl_action_seq 1 - SYNC 2414 2009-05-28 23:49:56 CDT DEBUG2 localListenThread: Received event 2,2414 SYNC 2009-05-28 23:50:00 CDT ERROR remoteWorkerThread_1: copy to stdout on provider - PGRES_FATAL_ERROR ERROR: unexpected chunk number 8618 (expected 426) for toast value 34675334 2009-05-28 23:50:00 CDT WARN remoteWorkerThread_1: data copy for set 1 failed - sleep 30 seconds 2009-05-28 23:50:00 CDT DEBUG2 remoteListenThread_1: queue event 1,4143 SYNC 2009-05-28 23:50:00 CDT DEBUG2 remoteListenThread_1: UNLISTEN WARNING: there is no transaction in progress Any ideas on what could cause such an error? Sean Jeff Frost wrote: > Jeff Frost wrote: >> Andrew Sullivan wrote: >>> On Tue, Jun 02, 2009 at 10:30:45AM -0500, Sean Staats wrote: >>> >>>> I created a new replication cluster. It turns out that starting the >>>> table IDs at id=1 and the sequence IDs at id=1001 didn't make any >>>> difference as slony gave me the same error (sequence ID 1001 has already >>>> been assigned.) Increasing the log verbosity to 4 doesn't produce any >>>> more useful debugging information. Time for another approach. >>>> >>>> Would it make sense to create 2 different sets - one to replicate the >>>> tables and one to replicate the sequences? Is there a downside to this >>>> kind of workaround? >>>> >>> >>> It'd be better to figure out what the duplication is caused by. Have >>> a look in the _slony tables and check to see what's in there. Where's >>> the collision? >>> >>> >> I've seen this issue recently when the initial sync fails. If you >> scroll further back in your logs do you have a failure for the >> initial copy_set? When this happens to me, it seems that slony >> leaves the slave DB in a half replicated state, but reattempts to do >> the initial sync and finds that the sequences are already in >> _cluster.sl_sequence table, then errors out. This requires dropping >> the node and starting over. This is with version 1.2.16. I recall >> previous versions being able to recover from a failed initial sync >> without intervention, but my memory could be mistaken. > In fact, here's how it looks in my logs: > > Jun 2 13:09:36 localhost slon[1867]: [274-1] 2009-06-02 13:09:36 PDT > ERROR remoteWorkerThread_1: "select > "_engage_cluster".tableHasSerialKey('"archive"."invitation"');" > Jun 2 13:09:36 localhost slon[1867]: [274-2] could not receive data > from server: Connection timed out > Jun 2 13:09:36 localhost slon[1867]: [275-1] 2009-06-02 13:09:36 PDT > WARN remoteWorkerThread_1: data copy for set 1 failed - sleep 30 seconds > Jun 2 13:09:36 localhost postgres[1880]: [26-1] NOTICE: there is no > transaction in progress > Jun 2 13:10:06 localhost slon[1867]: [276-1] 2009-06-02 13:10:06 PDT > DEBUG1 copy_set 1 > Jun 2 13:10:06 localhost slon[1867]: [277-1] 2009-06-02 13:10:06 PDT > DEBUG1 remoteWorkerThread_1: connected to provider DB > Jun 2 13:10:09 localhost slon[1867]: [278-1] 2009-06-02 13:10:09 PDT > ERROR remoteWorkerThread_1: "select > "_engage_cluster".setAddSequence_int(1, 4, > Jun 2 13:10:09 localhost slon[1867]: [278-2] > '"public"."tracking_sequence"', 'public.tracking_sequence sequence')" > PGRES_FATAL_ERROR ERROR: Slony-I: setAddSequence_int(): > Jun 2 13:10:09 localhost slon[1867]: [278-3] sequence ID 4 has > already been assigned > Jun 2 13:10:09 localhost slon[1867]: [279-1] 2009-06-02 13:10:09 PDT > WARN remoteWorkerThread_1: data copy for set 1 failed - sleep 60 seconds > > The DB in question is 144GB and it's being replicated over a > relatively slow link. It seems to do about 1GB/hr, but never gets > past 10GB. It always dies at that same point. > -- > Jeff Frost, Owner <jeff at frostconsultingllc.com> > Frost Consulting, LLC http://www.frostconsultingllc.com/ > Phone: 916-647-6411 FAX: 916-405-4032 > > ------------------------------------------------------------------------ > > _______________________________________________ > Slony1-general mailing list > Slony1-general at lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general >
- Previous message: [Slony1-general] Initial replication of sequences is failing
- Next message: [Slony1-general] Initial replication of sequences is failing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list