Thu Jun 14 08:25:05 PDT 2007
- Previous message: [Slony1-general] Readying 1.2.10
- Next message: [Slony1-general] Readying 1.2.10
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Steven Singer <ssinger_pg at sympatico.ca> writes: > On Fri, 8 Jun 2007, Christopher Browne wrote: > > I've been trying 1.2.10 from CVS against 8.2 on a Linux PowerPC(mac) and > > I've been occasional failures of the testddl unit test with > > <stdin>:5: sleep a couple seconds > <stdin>:9: timeout exceeded while waiting for event confirmation > > Which appears to be from a:wait for event (origin=1, confirmed=2); > > Most of the time testddl passes fine but it failed at least once when > I tried running it. I am still trying to reproduce the failure. I'm not entirely sure why this happens; it has occasionally happened to me too. In the slon logs, the slon for the subscriber node has fallen into the following "pit": When node #2 tries to process the subscription... 2007-06-14 15:06:21 UTC DEBUG2 remoteWorkerThread_1: Received event 1,17 ENABLE_SUBSCRIPTION 2007-06-14 15:06:21 UTC WARN remoteWorkerThread_1: copy set: data provider 1 only on sync -1 - sleep 5 seconds This is normally the result of node 2's slon not properly listening to node #1. If you restart the slon for node #2 (e.g. - kill the slon, run it again, by hand), it picks itself up out of the "pit" and subscribes and runs fine. I figure there must be some race condition surrounding the events that is taking place here... I just ran the DDL test three times in a row; twice, it failed with the above problem; the third time it worked. I grabbed the logs from the three iterations, and chopped them off immediately after the ENABLE_SUBSCRIPTION message, which is clearly the point at which the behaviour is diverging. I then chopped off timestamps to make them look temporally similar, and did a sort + diff to see what the differences look like. Here's one of the diffs; if we can identify what it is that isn't being processed properly in the case where it failed (which will involve the entries marked ">"), then maybe there's a bug to be stomped :-). I'm seeing one oddity, that in the "succeeding" slon, the events are associated with remoteWorkerThread_1, whereas on the other, they're on remoteListenThread_3. slony-regress.ks8357/slon_log.2.sorted-diff :::::::::::::: 20d19 < DEBUG1 copy_set 1 39,41c38 < DEBUG2 remoteListenThread_1: queue event 1,16 SUBSCRIBE_SET < DEBUG2 remoteListenThread_1: queue event 1,17 ENABLE_SUBSCRIPTION < DEBUG2 remoteListenThread_1: queue event 1,18 SYNC --- > DEBUG2 remoteListenThread_1: queue event 1,19 SYNC 44,46d40 < DEBUG2 remoteListenThread_1: queue event 3,1 STORE_PATH < DEBUG2 remoteListenThread_1: queue event 3,2 STORE_PATH < DEBUG2 remoteListenThread_1: queue event 3,3 SYNC 48a43,48 > DEBUG2 remoteListenThread_3: queue event 1,16 SUBSCRIBE_SET > DEBUG2 remoteListenThread_3: queue event 1,17 ENABLE_SUBSCRIPTION > DEBUG2 remoteListenThread_3: queue event 1,18 SYNC > DEBUG2 remoteListenThread_3: queue event 3,1 STORE_PATH > DEBUG2 remoteListenThread_3: queue event 3,2 STORE_PATH > DEBUG2 remoteListenThread_3: queue event 3,3 SYNC 50a51 > DEBUG2 remoteListenThread_3: stop listening for event origin 1 63,64d63 < DEBUG2 remoteWorkerThread_1: forward confirm 1,9 received by 3 < DEBUG2 remoteWorkerThread_1: forward confirm 2,0 received by 3 68d66 < DEBUG2 remoteWorkerThread_1: forward confirm 3,3 received by 1 74a73,76 > DEBUG2 remoteWorkerThread_3: forward confirm 1,18 received by 3 > DEBUG2 remoteWorkerThread_3: forward confirm 1,9 received by 3 > DEBUG2 remoteWorkerThread_3: forward confirm 2,0 received by 1 > DEBUG2 remoteWorkerThread_3: forward confirm 2,0 received by 3 75a78,79 > DEBUG2 remoteWorkerThread_3: forward confirm 3,0 received by 1 > DEBUG2 remoteWorkerThread_3: forward confirm 3,3 received by 1 82,83c86,87 < DEBUG2 slon: watchdog ready - pid = 8471 < DEBUG2 slon: worker process created - pid = 8473 --- > DEBUG2 slon: watchdog ready - pid = 8279 > DEBUG2 slon: worker process created - pid = 8280 84a89 > WARN remoteWorkerThread_1: copy set: data provider 1 only on sync -1 - sleep 5 seconds Jan just IM'ed me that he just committed a change that evidently affects this, so this may already be fixed :-). -- output = reverse("ofni.sailifa.ac" "@" "enworbbc") <http://dba2.int.libertyrms.com/> Christopher Browne (416) 673-4124 (land)
- Previous message: [Slony1-general] Readying 1.2.10
- Next message: [Slony1-general] Readying 1.2.10
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list