[Slony1-general] Initial replication of sequences is failing

Tue Jun 2 13:17:19 PDT 2009

Jeff Frost wrote:
> Andrew Sullivan wrote:
>> On Tue, Jun 02, 2009 at 10:30:45AM -0500, Sean Staats wrote:
>>   =

>>> I created a new replication cluster.  It turns out that starting the  =

>>> table IDs at id=3D1 and the sequence IDs at id=3D1001 didn't make any  =

>>> difference as slony gave me the same error (sequence ID 1001 has alread=
y  =

>>> been assigned.)  Increasing the log verbosity to 4 doesn't produce any  =

>>> more useful debugging information.  Time for another approach.
>>>
>>> Would it make sense to create 2 different sets - one to replicate the  =

>>> tables and one to replicate the sequences?  Is there a downside to this=
  =

>>> kind of workaround?
>>>     =

>>
>> It'd be better to figure out what the duplication is caused by.  Have
>> a look in the _slony tables and check to see what's in there.  Where's
>> the collision?
>>
>>   =

> I've seen this issue recently when the initial sync fails.  If you
> scroll further back in your logs do you have a failure for the initial
> copy_set?  When this happens to me, it seems that slony leaves the
> slave DB in a half replicated state, but reattempts to do the initial
> sync and finds that the sequences are already in _cluster.sl_sequence
> table, then errors out.  This requires dropping the node and starting
> over.  This is with version 1.2.16. I recall previous versions being
> able to recover from a failed initial sync without intervention, but
> my memory could be mistaken.
In fact, here's how it looks in my logs:

Jun  2 13:09:36 localhost slon[1867]: [274-1] 2009-06-02 13:09:36 PDT
ERROR  remoteWorkerThread_1: "select
"_engage_cluster".tableHasSerialKey('"archive"."invitation"');"
Jun  2 13:09:36 localhost slon[1867]: [274-2]  could not receive data
from server: Connection timed out
Jun  2 13:09:36 localhost slon[1867]: [275-1] 2009-06-02 13:09:36 PDT
WARN   remoteWorkerThread_1: data copy for set 1 failed - sleep 30 seconds
Jun  2 13:09:36 localhost postgres[1880]: [26-1] NOTICE:  there is no
transaction in progress
Jun  2 13:10:06 localhost slon[1867]: [276-1] 2009-06-02 13:10:06 PDT
DEBUG1 copy_set 1
Jun  2 13:10:06 localhost slon[1867]: [277-1] 2009-06-02 13:10:06 PDT
DEBUG1 remoteWorkerThread_1: connected to provider DB
Jun  2 13:10:09 localhost slon[1867]: [278-1] 2009-06-02 13:10:09 PDT
ERROR  remoteWorkerThread_1: "select
"_engage_cluster".setAddSequence_int(1, 4,
Jun  2 13:10:09 localhost slon[1867]: [278-2] =

'"public"."tracking_sequence"', 'public.tracking_sequence sequence')"
PGRES_FATAL_ERROR ERROR:  Slony-I: setAddSequence_int():
Jun  2 13:10:09 localhost slon[1867]: [278-3]  sequence ID 4 has already
been assigned
Jun  2 13:10:09 localhost slon[1867]: [279-1] 2009-06-02 13:10:09 PDT
WARN   remoteWorkerThread_1: data copy for set 1 failed - sleep 60 seconds

The DB in question is 144GB and it's being replicated over a relatively
slow link.  It seems to do about 1GB/hr, but never gets past 10GB.  It
always dies at that same point. =

-- =

Jeff Frost, Owner 	<jeff at frostconsultingllc.com>
Frost Consulting, LLC 	http://www.frostconsultingllc.com/
Phone: 916-647-6411	FAX: 916-405-4032

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-general/attachments/20090602/=
99bc85a5/attachment.htm