That is correct. In the error node, the master node cannot get STORE_PATH event and cannot start remoteListen and remoteWorker threads. <br><br>Below is the event table for the error node.<br><i>[root@slony-r1s1-001 ~]# psql -U postgres system.db -c &quot;select * from _slony.sl_event where ev_seqno &gt; 5000000078&quot;; <br>
</i> ev_origin |  ev_seqno  |        ev_timestamp        | ev_snapshot |       ev_type       | ev_data1 | ev_data2 |                          ev_data3                   <br>        | ev_data4 | ev_data5 | ev_data6 | ev_data7 | ev_data8 <br>
-----------+------------+----------------------------+-------------+---------------------+----------+----------+-----------------------------------------------------<br>--------+----------+----------+----------+----------+----------<br>
         1 | 5000000079 | 2010-10-29 16:17:23.546814 | 888:888:    | SYNC                |          |          |                                                     <br>        |          |          |          |          | <br>
         1 | 5000000080 | 2010-10-29 09:49:06.107569 | 982:982:    | STORE_NODE          | 2        | slave    |                                                     <br>        |          |          |          |          | <br>
         1 | 5000000081 | 2010-10-29 09:49:06.107569 | 982:982:    | ENABLE_NODE         | 2        |          |                                                     <br>        |          |          |          |          | <br>
         1 | 5000000082 | 2010-10-29 09:49:06.433519 | 983:983:    | STORE_PATH          | 2        | 1        | host=192.168.11.12 dbname=system.db user=postgres po<br>rt=5432 | 10       |          |          |          | <br>
         1 | 5000000083 | 2010-10-29 09:49:10.311715 | 988:988:    | SUBSCRIBE_SET       | 1        | 1        | 2                                                   <br>        | t        | f        |          |          | <br>
         1 | 5000000084 | 2010-10-29 09:49:10.311715 | 988:988:    | ENABLE_SUBSCRIPTION | 1        | 1        | 2                                                   <br>        | t        | f        |          |          | <br>
(6 rows)<br><br>In the normal node event table, there has similar records below. Only difference is there have a number of SYNC events generated continuously. <br><i>[root@140-r1s1-001 ~]# psql -U postgres system.db -c &quot;select * from _slony.sl_event where ev_seqno &gt; 5000000078 order by ev_seqno&quot;;<br>
</i>LOG:  duration: 5.093 ms  statement: select * from _slony.sl_event where ev_seqno &gt; 5000000078 order by ev_seqno<br> ev_origin |  ev_seqno  |        ev_timestamp        | ev_snapshot |       ev_type       | ev_data1 | ev_data2 |                          ev_data3                   <br>
        | ev_data4 | ev_data5 | ev_data6 | ev_data7 | ev_data8 <br>-----------+------------+----------------------------+-------------+---------------------+----------+----------+-----------------------------------------------------<br>
--------+----------+----------+----------+----------+----------<br>         1 | 5000000082 | 2010-10-29 13:55:57.312714 | 892:892:    | SYNC                |          |          |                                                     <br>
        |          |          |          |          | <br>         1 | 5000000083 | 2010-10-29 06:32:52.628858 | 997:997:    | STORE_NODE          | 2        | slave    |                                                     <br>
        |          |          |          |          | <br>         1 | 5000000084 | 2010-10-29 06:32:52.628858 | 997:997:    | ENABLE_NODE         | 2        |          |                                                     <br>
        |          |          |          |          | <br>         1 | 5000000085 | 2010-10-29 06:32:53.07135  | 998:998:    | STORE_PATH          | 2        | 1        | host=192.168.11.12 dbname=system.db user=postgres po<br>
rt=5432 | 10       |          |          |          | <br>         1 | 5000000086 | 2010-10-29 06:32:57.676823 | 1003:1003:  | SUBSCRIBE_SET       | 1        | 1        | 2                                                   <br>
        | t        | f        |          |          | <br>         1 | 5000000087 | 2010-10-29 06:32:57.676823 | 1003:1003:  | ENABLE_SUBSCRIPTION | 1        | 1        | 2                                                   <br>
        | t        | f        |          |          | <br>         1 | 5000000088 | 2010-10-29 13:58:27.852526 | 1085:1085:  | SYNC                |          |          |                                                     <br>
        |          |          |          |          | <br>         1 | 5000000089 | 2010-10-29 13:58:37.868268 | 1087:1087:  | SYNC                |          |          |                                                     <br>
        |          |          |          |          | <br>...................<br>         1 | 5000000526 | 2010-10-29 15:11:31.902246 | 1987:1987:  | SYNC                |          |          |                                                     <br>
        |          |          |          |          | <br>         1 | 5000000527 | 2010-10-29 15:11:41.905465 | 1989:1989:  | SYNC                |          |          |                                                     <br>
        |          |          |          |          | <br>         1 | 5000000528 | 2010-10-29 15:11:51.912475 | 1991:1991:  | SYNC                |          |          |                                                     <br>
        |          |          |          |          | <br>         1 | 5000000529 | 2010-10-29 15:12:01.913758 | 1993:1993:  | SYNC                |          |          |                                                     <br>
        |          |          |          |          | <br>(448 rows)<br><br><br><div class="gmail_quote">On Fri, Oct 29, 2010 at 10:39 PM, Steve Singer <span dir="ltr">&lt;<a href="mailto:ssinger@ca.afilias.info">ssinger@ca.afilias.info</a>&gt;</span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class="im">On 10-10-29 10:24 AM, Jason Chen wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Hi Steve,<br>
<br>
 &gt;If you turn up the logging level to debug , what does slon report in<br>
the log in cases where it doesn&#39;t work.   I must be logging some stuff<br>
even if it then stops/hangs.<br>
<br>
I have attached the normal configuration and error configuration master<br>
node log. Can you take a look and see if there is anything abnormal?<br>
<br>
 &gt;slon using the connection settings from the service config to connect<br>
to its &#39;local&#39; database that it generates the syncs on.  It sounds like<br>
slon isn&#39;t able to talk to this database.<br>
<br>
Is there any log we can check on that since postgresql on the master<br>
node runs well?<br>
<br>
Please also let me know if you need other more information.<br>
<br>
Thanks,<br>
Jason<br>
<br>
</blockquote>
<br></div>
The error log stops after a few minutes.  Does slon just stop writing to the file?<br>
<br>
As you can see in the normal log file,<br>
the localListener thread sees the STORE NODE event and then the STORE PATH event a bit further down.<br>
<br>
2010-10-29 15:01:38 UTCDEBUG2 localListenThread: Received event 1,5000000179 STORE_NODE<br>
2010-10-29 15:01:38 UTCCONFIG storeNode: no_id=2 no_comment=&#39;slave&#39;<br>
<br>
<br>
Once it processes that store node event it then starts the remoteWorker and remoteListener threads that actually do stuff<br>
<br>
In the error case, if you query the sl_event table on the master you should see the STORE NODE and STORE PATH events. (this is worth confirming).  The question is why is the slon not getting to this events. What event numbers are assigned to them?<br>

<br>
In the error case it got as far as 1,5000000081  in the normal case the STORE NODE event was 1,5000000180 so if the time from when you started slon until when you ran the storeNode is similar in both cases then you still have a fair number of events left to process (though processing 100 SYNC events when no tables are replicated should be pretty fast)<br>

<br>
</blockquote></div><br>