Hi,<br><br>I am using postgresql-8.4 and slony1-1.2.0.3 and i have been able implement a 4 node replication cluster where nodes communicate successfully with each other. The way i have went about this is that i have written scripts (say cluster_setup.sh and subscribe.sh) to be run with slonik. Like run the script cluster_setup on the master node and then slon daemon's on all the 4 nodes with necessary connection information and finally run subscribe.sh on the master node again. This works perfectly fine and even when i kill some of the slons on the different machines, if i start slon again, the replication at that node picks up where it was left before. <br>
<br>After this i tried automating the whole process so that in case of a network disconnect/power failure/reboot the replication can continue to work as normal. So instead of running slon's manually on each machine, i placed a script having 'bash -U postgres -c "./slon conninfo=" ' command in init.d directory for each machine. After having all the database replication running again, i rebooted one of the machines but i could not have the database replication restored after that. The node which was acting as a provider to the rebooted machine started showing this error:<br>
<br>2011-08-05 09:25:40 PDTERROR remoteListenThread_3: "select con_origin, con_received, max(con_seqno) as con_seqno, max(con_timestamp) as con_timestamp from "_four_node_rep_cluster20".sl_confirm where con_received <> 2 group by con_origin, con_received" 2011-08-05 09:25:42 PDTERROR remoteListenThread_3: "select ev_origin, ev_seqno, ev_timestamp, ev_snapshot, "pg_catalog".txid_snapshot_xmin(ev_snapshot), "pg_catalog".txid_snapshot_xmax(ev_snapshot), ev_type, ev_data1, ev_data2, ev_data3, ev_data4, ev_data5, ev_data6, ev_data7, ev_data8 from "_four_node_rep_cluster20".sl_event e where (e.ev_origin = '3' and e.ev_seqno > '5000000005') or (e.ev_origin = '4' and e.ev_seqno > '5000000039') order by e.ev_origin, e.ev_seqno limit 40" - no connection to the server<br>
<br>and then the replication wont start working again till the time i reboot all the nodes. I am guessing it might be the case that the provider node gets reinitialized on rebooting thats why the replication starts again. I know slony is used for automated database replication so i was wondering whether there is any way in which i can make this work without rebooting all the nodes, which will be inconvenient if the number of nodes increase or for production server<br>
<br>Any inputs on the above error will be greatly appreciated. <br><br>Regards<br>Dilraj Singh<br>