[Slony1-general] Child terminated after queue events

Tue Aug 4 16:40:34 PDT 2009

Hi list!

I've been testing slony in the few past days, with some successful.

I'm using slony1-1.2.16 and two postgres databases (8.3 and 8.2),
replicating two databases. Both of them are on the same machine, but one
listens on 127.0.0.1 and the other at 127.0.0.2, and using different
ports for each one (5432 and 5433).

One database is really big (about 40G), and the other one is just a
hundred of megabytes.

I run an instance of slony for each database, and for each cluster to
replicate. A total of 4 instances of slony. Here is an example of a pair
of them:

# slon -d 1000 big_cluster dbname=big1 user=postgres host=127.0.0.1
port=5432

# slon -d 1000 big_cluster dbname=big1 user=postgres host=127.0.0.2
port=5433

The problem I see, is that at the smaller database, viewing the logs of
the connection to the database at 127.0.0.1 is this:

2009-08-04 18:30:02 CDT DEBUG2 remoteWorkerThread_1: current local
log_status is 0
2009-08-04 18:30:02 CDT DEBUG3 remoteWorkerThread_1: activate helper 1
2009-08-04 18:30:02 CDT DEBUG4 remoteWorkerThread_1: waiting for log
data
2009-08-04 18:30:02 CDT DEBUG4 remoteHelperThread_1_1: got work to do
2009-08-04 18:30:02 CDT DEBUG2 remoteWorkerThread_1_1: current remote
log_status = 0
2009-08-04 18:30:02 CDT DEBUG4 remoteHelperThread_1_1: allocate line
buffers
2009-08-04 18:30:02 CDT DEBUG4 remoteHelperThread_1_1: fetch from cursor
2009-08-04 18:30:02 CDT DEBUG2 remoteHelperThread_1_1: 0.001 seconds
delay for first row
2009-08-04 18:30:02 CDT DEBUG4 remoteHelperThread_1_1: fetched 0 log
rows
2009-08-04 18:30:02 CDT DEBUG4 remoteHelperThread_1_1: return 10 unused
line buffers
2009-08-04 18:30:02 CDT DEBUG2 remoteHelperThread_1_1: 0.001 seconds
until close cursor
2009-08-04 18:30:02 CDT DEBUG2 remoteHelperThread_1_1: inserts=0
updates=0 deletes=0
2009-08-04 18:30:02 CDT DEBUG4 remoteHelperThread_1_1: change helper
thread status
2009-08-04 18:30:02 CDT DEBUG4 remoteHelperThread_1_1: send DONE/ERROR
line to worker
2009-08-04 18:30:02 CDT DEBUG3 remoteHelperThread_1_1: waiting for
workgroup to finish
2009-08-04 18:30:02 CDT DEBUG3 remoteWorkerThread_1: helper 1 finished
2009-08-04 18:30:02 CDT DEBUG4 remoteWorkerThread_1: returning lines to
pool
2009-08-04 18:30:02 CDT DEBUG3 remoteWorkerThread_1: all helpers done.
2009-08-04 18:30:02 CDT DEBUG4 remoteWorkerThread_1: changing helper 1
to IDLE
2009-08-04 18:30:02 CDT DEBUG4 remoteWorkerThread_1: cleanup
2009-08-04 18:30:02 CDT DEBUG4 remoteHelperThread_1_1: waiting for work
2009-08-04 18:30:02 CDT DEBUG2 remoteWorkerThread_1: new sl_rowid_seq
value: 1000000000000000

Which means that it's already replicated and no more work to do has
arrived.

But when I see the output of slon replicating the big database at
127.0.0.1, this is the result:

2009-08-04 18:32:21 CDT DEBUG2 remoteListenThread_1: queue event 1,3330
SYNC
2009-08-04 18:32:21 CDT DEBUG2 remoteListenThread_1: queue event 1,3331
SYNC
2009-08-04 18:32:21 CDT DEBUG2 remoteListenThread_1: queue event 1,3332
SYNC
2009-08-04 18:32:21 CDT DEBUG2 remoteListenThread_1: queue event 1,3333
SYNC
2009-08-04 18:32:21 CDT DEBUG2 remoteWorkerThread_1: syncing set 1 with
1 table(s) from provider 1
2009-08-04 18:32:21 CDT DEBUG2 slon: child terminated status: 11; pid:
24351, current worker pid: 24351
2009-08-04 18:32:21 CDT DEBUG1 slon: restart of worker in 10 seconds

The queue event is always the last seen at the same cluster connection
to 127.0.0.2, so it never 'cuts' on the same event, it's last_event+1.

Can you give me same clue or light to this error? is it common?

Thanks!

Cheers

Martin