[Slony1-general] Issue when adding node to replication

Thu Sep 27 12:58:00 PDT 2012

Follow up:

I executed this on the master:
mydatabase=# select * from _slony.sl_event where ev_origin not in 
(select no_id from _slony.sl_node);
  ev_origin |  ev_seqno  |         ev_timestamp          |    
ev_snapshot     | ev_type | ev_data1 | ev_data2 | ev_data3 | ev_data4 | 
ev_data5 | ev_data6 | ev_data7 | ev_data8
-----------+------------+-------------------------------+--------------------+---------+----------+----------+----------+----------+----------+----------+----------+----------
          3 | 5000290161 | 2012-09-27 09:48:03.749424-04 | 
40580084:40580084: | SYNC    |          |          |          |          
|          |          |          |
(1 row)

There is a row in sl_event that shouldn't be there, because it's 
referencing a node that nolonger exists. I need to add this node back to 
replication, but I don't want to run into the same issue as before. I 
ran a cleanupEvent('10 minute') and it did nothing (even did it with 0 
minutes).

Will this row eventually go away? will it cause issue if we attempt to 
add a new node to replication with node = 3? How can I safely clean this up?

thanks,
- Brian F

On 09/27/2012 01:28 PM, Brian Fehrle wrote:
> On 09/27/2012 01:26 PM, Jan Wieck wrote:
>> On 9/27/2012 2:34 PM, Brian Fehrle wrote:
>>> Hi all,
>>>
>>> PostgreSQL v 9.1.5 - 9.1.6
>>> Slony version 2.1.0
>>>
>>> I'm having an issue that's occurred twice now. I have 4 node slony
>>> cluster, and one of the operations is to drop a node from replication,
>>> do maintenance on it, then add it back to replication.
>>>
>>> Node 1 = master
>>> Node 2 = slave
>>> Node 3 = slave  ->  dropped then readded
>>> Node 4 = slave
>> First, why is the node actually dropped and readded so fast, instead
>> of just doing the maintenance while it falls behind, then let it catch
>> up?
>>
> We have several cases where it makes sense, such as re-installing the OS
> or in todays case, we replaced the physical machine with a new one.
>
>> You apparently have a full blown path network from everyone to
>> everyone. This is not good under normal circumstances since the
>> automatic listen generation will cause every node to listen on every
>> other node for events, from non-origins. Way too many useless database
>> connections.
>    From my understanding, without this set-up, all events must then be
> passed through the master node to relay it. So master node = 1, slave =
> 2 and 3, 3 must communicate with 2, and without direct access it will
> relay through the master. Is this understanding wrong?
>
>> What seems to happen here are some race conditions. The node is
>> dropped and when it is added back again, some third node still didn't
>> process the DROP NODE and when node 4 looks for events from node 3, it
>> finds old ones somewhere else (like on 1 or 2). When node 3 then comes
>> around to use those event IDs again, you get the dupkey error.
>>
>> What you could do if you really need to drop/readd it, use an explicit
>> WAIT FOR EVENT for the DROP NODE to make sure all traces of that node
>> are gone from the whole cluster.
>>
> Ok, I'll look into implementing that. Another thought was to issue a
> cleanupEvent() on each of the nodes still attached to replication after
> I do the dump.
>
> Thanks
> - Brian F
>> Jan
>>
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120927/100d8e0c/attachment.htm