bugzilla-daemon at main.slony.info bugzilla-daemon at main.slony.info
Wed Jan 30 06:55:38 PST 2013
http://www.slony.info/bugzilla/show_bug.cgi?id=285

--- Comment #1 from Steve Singer <ssinger at ca.afilias.info> 2013-01-30 06:55:39 PST ---
I was able to replicate the problem after running the MoveSet test after a
while and was able to observe the cluster in this state:

A MOVE SET from 1=>2 had just completed

test3=# select * FROM _disorder_replica.sl_setsync ;
 ssy_setid | ssy_origin | ssy_seqno  |    ssy_snapshot    | ssy_action_list 
-----------+------------+------------+--------------------+-----------------
         1 |          2 | 5000000878 | 20106069:20106069: | 
(1 row)


test1=# select * FROM _disorder_replica.sl_setsync ;
 ssy_setid | ssy_origin | ssy_seqno  |    ssy_snapshot    | ssy_action_list 
-----------+------------+------------+--------------------+-----------------
         1 |          2 | 5000000740 | 20106781:20106781: | 
(1 row)

(Nodes 4,5 showed a similar sl_setsync as test1).


test2=# select ev_seqno,ev_type,ev_origin from _disorder_replica.sl_event where
ev_origin=2 order by ev_seqno desc limit 4;
  ev_seqno  | ev_type | ev_origin 
------------+---------+-----------
 5000000740 | SYNC    |         2
 5000000739 | SYNC    |         2
 5000000738 | SYNC    |         2
 5000000737 | SYNC    |         2
(4 rows)


Node 3 has an entry in sl_setsync for a SYNC that has not yet happened.

However, the last SYNC from node 1 was:

test1=# select ev_seqno,ev_type,ev_origin from _disorder_replica.sl_event where
ev_origin=1 order by ev_seqno desc limit 4;
  ev_seqno  | ev_type | ev_origin 
------------+---------+-----------
 5000000884 | SYNC    |         1
 5000000883 | SYNC    |         1
 5000000882 | SYNC    |         1
 5000000881 | SYNC    |         1
(4 rows)


What I think is happening is this:

slon 3 - remoteWorkerThread_1 process the MOVE SET 1->2
slon 3 - remoteWorkerThread_1 STARTS processesing SYNC 884 
slon 3 - remoteWorkerThread_2 processs the ACCEPT SET this changes sl_setsync
slon 3 - remoteWorkerThread_1 reaches the update sl_setsync at the bottom of
sync_event(). 

The UPDATE to sl_setsync does not have ssy_origin as part of the where clause,
because we are running in READ COMMITTED mode the DELETE+INSERT of the row on
sl_setsync becomes visible to the UPDATE part of sync_event() even though the
sync_event() started before the ACCEPT_SET was processed.

-- 
Configure bugmail: http://www.slony.info/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Slony1-bugs mailing list