<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Follow up:<br>

    <br>

    I executed this on the master:<br>

    mydatabase=# select * from _slony.sl_event where ev_origin not in

    (select no_id from _slony.sl_node);<br>

    <font face="monospace">&nbsp;ev_origin |&nbsp; ev_seqno&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

      ev_timestamp&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; ev_snapshot&nbsp;&nbsp;&nbsp;&nbsp; | ev_type | ev_data1 |

      ev_data2 | ev_data3 | ev_data4 | ev_data5 | ev_data6 | ev_data7 |

      ev_data8<br>

-----------+------------+-------------------------------+--------------------+---------+----------+----------+----------+----------+----------+----------+----------+----------<br>

      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3 | 5000290161 | 2012-09-27 09:48:03.749424-04 |

      40580084:40580084: | SYNC&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

      |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |<br>

      (1 row)</font><br>

    <br>

    There is a row in sl_event that shouldn't be there, because it's

    referencing a node that nolonger exists. I need to add this node

    back to replication, but I don't want to run into the same issue as

    before. I ran a cleanupEvent('10 minute') and it did nothing (even

    did it with 0 minutes).<br>

    <br>

    Will this row eventually go away? will it cause issue if we attempt

    to add a new node to replication with node = 3? How can I safely

    clean this up?<br>

    <br>

    thanks,<br>

    - Brian F<br>

    <br>

    On 09/27/2012 01:28 PM, Brian Fehrle wrote:

    <blockquote cite="mid:5064A8F0.9080801@consistentstate.com"

      type="cite">

      <pre wrap="">On 09/27/2012 01:26 PM, Jan Wieck wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">On 9/27/2012 2:34 PM, Brian Fehrle wrote:

</pre>

        <blockquote type="cite">

          <pre wrap="">Hi all,

PostgreSQL v 9.1.5 - 9.1.6

Slony version 2.1.0

I'm having an issue that's occurred twice now. I have 4 node slony

cluster, and one of the operations is to drop a node from replication,

do maintenance on it, then add it back to replication.

Node 1 = master

Node 2 = slave

Node 3 = slave  -&gt; dropped then readded

Node 4 = slave

</pre>

        </blockquote>

        <pre wrap="">

First, why is the node actually dropped and readded so fast, instead 

of just doing the maintenance while it falls behind, then let it catch 

up?

</pre>

      </blockquote>

      <pre wrap="">We have several cases where it makes sense, such as re-installing the OS 

or in todays case, we replaced the physical machine with a new one.

</pre>

      <blockquote type="cite">

        <pre wrap="">You apparently have a full blown path network from everyone to 

everyone. This is not good under normal circumstances since the 

automatic listen generation will cause every node to listen on every 

other node for events, from non-origins. Way too many useless database 

connections.

</pre>

      </blockquote>

      <pre wrap=""> From my understanding, without this set-up, all events must then be 

passed through the master node to relay it. So master node = 1, slave = 

2 and 3, 3 must communicate with 2, and without direct access it will 

relay through the master. Is this understanding wrong?

</pre>

      <blockquote type="cite">

        <pre wrap="">

What seems to happen here are some race conditions. The node is 

dropped and when it is added back again, some third node still didn't 

process the DROP NODE and when node 4 looks for events from node 3, it 

finds old ones somewhere else (like on 1 or 2). When node 3 then comes 

around to use those event IDs again, you get the dupkey error.

What you could do if you really need to drop/readd it, use an explicit 

WAIT FOR EVENT for the DROP NODE to make sure all traces of that node 

are gone from the whole cluster.

</pre>

      </blockquote>

      <pre wrap="">Ok, I'll look into implementing that. Another thought was to issue a 

cleanupEvent() on each of the nodes still attached to replication after 

I do the dump.

Thanks

- Brian F

</pre>

      <blockquote type="cite">

        <pre wrap="">

Jan

</pre>

      </blockquote>

      <pre wrap="">

_______________________________________________

Slony1-general mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Slony1-general@lists.slony.info">Slony1-general@lists.slony.info</a>

<a class="moz-txt-link-freetext" href="http://lists.slony.info/mailman/listinfo/slony1-general">http://lists.slony.info/mailman/listinfo/slony1-general</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>