Hi,<br><br>I am using postgresql-8.4 and slony1-1.2.0.3 and i have been able implement a 4 node replication cluster where nodes communicate successfully with each other. The way i have went about this is that i have written scripts (say cluster_setup.sh and subscribe.sh) to be run with slonik. Like run the script cluster_setup on the master node and then slon daemon&#39;s on all the 4 nodes with necessary connection information and finally run subscribe.sh on the master node again. This works perfectly fine and even when i kill some of the slons on the different machines, if i start slon again, the replication at that node picks up where it was left before. <br>

<br>After this i tried automating the whole process so that in case of a network disconnect/power failure/reboot the replication can continue to work as normal. So instead of running slon&#39;s manually on each machine, i placed a script having &#39;bash -U postgres -c &quot;./slon conninfo=&quot; &#39; command in init.d directory for each machine. After having all the database replication running again, i rebooted one of the machines but i could not have the database replication restored after that. The node which was acting as a provider to the rebooted machine started showing this error:<br>


<br>2011-08-05 09:25:40 PDTERROR  remoteListenThread_3: &quot;select con_origin, con_received,     max(con_seqno) as con_seqno,     max(con_timestamp) as con_timestamp from &quot;_four_node_rep_cluster20&quot;.sl_confirm where con_received &lt;&gt; 2 group by con_origin, con_received&quot; 2011-08-05 09:25:42 PDTERROR  remoteListenThread_3: &quot;select ev_origin, ev_seqno, ev_timestamp,        ev_snapshot,        &quot;pg_catalog&quot;.txid_snapshot_xmin(ev_snapshot),        &quot;pg_catalog&quot;.txid_snapshot_xmax(ev_snapshot),        ev_type,        ev_data1, ev_data2,        ev_data3, ev_data4,        ev_data5, ev_data6,        ev_data7, ev_data8 from &quot;_four_node_rep_cluster20&quot;.sl_event e where (e.ev_origin = &#39;3&#39; and e.ev_seqno &gt; &#39;5000000005&#39;) or (e.ev_origin = &#39;4&#39; and e.ev_seqno &gt; &#39;5000000039&#39;) order by e.ev_origin, e.ev_seqno limit 40&quot; - no connection to the server<br>


<br>and then the replication wont start working again till the time i reboot all the nodes. I am guessing it might be the case that the provider node gets reinitialized on rebooting thats why the replication starts again. I know slony is used for automated database replication so i was wondering whether there is any way in which i can make this work without rebooting all the nodes, which will be inconvenient if the number of nodes increase or for production server<br>

<br>Any inputs on the above error will be greatly appreciated. <br><br>Regards<br>Dilraj Singh<br>