<div dir="ltr">Sorry I&#39;m a bit sleep deprived, this is almost the exact thing I asked for help on in 2014.. Jan and Jeff both came in and gave me suggestions for keep alives which are much more aggressive than I have it set to.<div><br></div><div>So I&#39;m going to test with the more aggressive settings from this thread in 2014 &quot;<a href="https://www.mail-archive.com/slony1-general@lists.slony.info/msg06967.html">https://www.mail-archive.com/slony1-general@lists.slony.info/msg06967.html</a>&quot;</div><div><br></div><div>How lame I spaced, I knew Jan had been helpful, but totally spaced this thread.. UUGH! sorry</div><div><br></div><div>And yes double bad, top posting!! </div><div><br></div><div>Tory</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Sep 11, 2016 at 11:41 PM, Tory M Blue <span dir="ltr">&lt;<a href="mailto:tmblue@gmail.com" target="_blank">tmblue@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Jan has helped me before, giving me ideas to help with wide area replication where it seems that the connection drops between a large copy set and/or an index creation,  when there is no bits crossing the wire and the connections are dropped by the FW or other so Slony finishes up a table, index creation and attempts to grab the next table, but the connection is no longer there, so Slony says failed and attempts again.<div class="gmail_extra"><br></div><div class="gmail_extra">I think I&#39;m running into this between my Colo and Amazon, using their VPN gateway.  </div><div class="gmail_extra"><br></div><div class="gmail_extra">Here is the snippet of logs, there is no index here, we dropped it on the new node, so that it would not fail, but what&#39;s odd here is that it copies all the data and 35 minutes later it reports the time, which tells me it&#39;s doing something, but I&#39;m not sure what, if there is no index on that table. (there is a primary key with maintains integrity, and we didn&#39;t think we should drop that). but there are no other indexes, so the 35 minutes or whatever is a mystery..</div>


<div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_extra"><div class="gmail_extra"><br></div><div class="gmail_extra">2016-09-11 21:32:24 PDT CONFIG remoteWorkerThread_1: Begin COPY of table &quot;torque&quot;.&quot;adimpressions&quot;</div><div class="gmail_extra">2016-09-11 <b>22:39:39 </b>PDT CONFIG remoteWorkerThread_1: 76955497834 bytes copied for table &quot;torque&quot;.&quot;adimpressions&quot;</div></div><div class="gmail_extra">916499:2016-09-11 <b>23:14:25 </b>PDT CONFIG remoteWorkerThread_1: 6121.393 seconds to copy table &quot;torque&quot;.&quot;impressions&quot;</div><div class="gmail_extra">916608:2016-09-11 23:14:25 PDT CONFIG remoteWorkerThread_1: copy table &quot;torque&quot;.impressions_archive&quot;</div><div class="gmail_extra">916705:2016-09-11 23:14:25 PDT CONFIG remoteWorkerThread_1: Begin COPY of table &quot;torque&quot;.&quot;impressions_archive&quot;</div><div class="gmail_extra">916811:2016-09-11 23:14:25 PDT ERROR  remoteWorkerThread_1: &quot;select &quot;_cls&quot;.copyFields(237);&quot; </div><div class="gmail_extra">916907:2016-09-11 23:14:25 PDT WARN   remoteWorkerThread_1: data copy for set 2 failed 1 times - sleep 15 seconds</div><div class="gmail_extra">917014:2016-09-11 23:14:25 PDT INFO   cleanupThread: 7606.655 seconds for cleanupEvent()</div><div class="gmail_extra"><br></div><div class="gmail_extra">This run,  I added keep-alives by the following method. (and the timing and results are the same without them, set 2 fails with error 237).</div><div class="gmail_extra"><br></div><div class="gmail_extra">Adding the following to both slon commands on the origin and the new node</div><div class="gmail_extra">


<p><span>tcp_keepalive_idle 300 tcp_keepalive_count 5 tcp_keepalive_interval 300</span></p><p>Now not entirely sure how this is suppose to work and did I not tune this right. It obviously fails at the 30 minute mark, this is 25 minutes, however the servers never loses connection (I have a ping (not quite the same), but it has zero packet loss over the 2+ hours that these attempts to get things replicated take)). So maybe someone smarter then me can advice how I should tune the keep alives if that&#39;s what is happening.</p><p>I thought it would only use the keep-alives if it felt the partner was no longer there, but since i know pings show there is no connectivity issues, I&#39;m at a loss. AGAIN :)</p><p>Thanks for the assist</p><span class="HOEnZb"><font color="#888888"><p>Tory</p></font></span></div></div></div>

</blockquote></div><br></div>