<div dir="ltr">Sorry I'm a bit sleep deprived, this is almost the exact thing I asked for help on in 2014.. Jan and Jeff both came in and gave me suggestions for keep alives which are much more aggressive than I have it set to.<div><br></div><div>So I'm going to test with the more aggressive settings from this thread in 2014 "<a href="https://www.mail-archive.com/slony1-general@lists.slony.info/msg06967.html">https://www.mail-archive.com/slony1-general@lists.slony.info/msg06967.html</a>"</div><div><br></div><div>How lame I spaced, I knew Jan had been helpful, but totally spaced this thread.. UUGH! sorry</div><div><br></div><div>And yes double bad, top posting!! </div><div><br></div><div>Tory</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Sep 11, 2016 at 11:41 PM, Tory M Blue <span dir="ltr"><<a href="mailto:tmblue@gmail.com" target="_blank">tmblue@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Jan has helped me before, giving me ideas to help with wide area replication where it seems that the connection drops between a large copy set and/or an index creation, when there is no bits crossing the wire and the connections are dropped by the FW or other so Slony finishes up a table, index creation and attempts to grab the next table, but the connection is no longer there, so Slony says failed and attempts again.<div class="gmail_extra"><br></div><div class="gmail_extra">I think I'm running into this between my Colo and Amazon, using their VPN gateway. </div><div class="gmail_extra"><br></div><div class="gmail_extra">Here is the snippet of logs, there is no index here, we dropped it on the new node, so that it would not fail, but what's odd here is that it copies all the data and 35 minutes later it reports the time, which tells me it's doing something, but I'm not sure what, if there is no index on that table. (there is a primary key with maintains integrity, and we didn't think we should drop that). but there are no other indexes, so the 35 minutes or whatever is a mystery..</div>
<div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_extra"><div class="gmail_extra"><br></div><div class="gmail_extra">2016-09-11 21:32:24 PDT CONFIG remoteWorkerThread_1: Begin COPY of table "torque"."adimpressions"</div><div class="gmail_extra">2016-09-11 <b>22:39:39 </b>PDT CONFIG remoteWorkerThread_1: 76955497834 bytes copied for table "torque"."adimpressions"</div></div><div class="gmail_extra">916499:2016-09-11 <b>23:14:25 </b>PDT CONFIG remoteWorkerThread_1: 6121.393 seconds to copy table "torque"."impressions"</div><div class="gmail_extra">916608:2016-09-11 23:14:25 PDT CONFIG remoteWorkerThread_1: copy table "torque".impressions_archive"</div><div class="gmail_extra">916705:2016-09-11 23:14:25 PDT CONFIG remoteWorkerThread_1: Begin COPY of table "torque"."impressions_archive"</div><div class="gmail_extra">916811:2016-09-11 23:14:25 PDT ERROR remoteWorkerThread_1: "select "_cls".copyFields(237);" </div><div class="gmail_extra">916907:2016-09-11 23:14:25 PDT WARN remoteWorkerThread_1: data copy for set 2 failed 1 times - sleep 15 seconds</div><div class="gmail_extra">917014:2016-09-11 23:14:25 PDT INFO cleanupThread: 7606.655 seconds for cleanupEvent()</div><div class="gmail_extra"><br></div><div class="gmail_extra">This run, I added keep-alives by the following method. (and the timing and results are the same without them, set 2 fails with error 237).</div><div class="gmail_extra"><br></div><div class="gmail_extra">Adding the following to both slon commands on the origin and the new node</div><div class="gmail_extra">
<p><span>tcp_keepalive_idle 300 tcp_keepalive_count 5 tcp_keepalive_interval 300</span></p><p>Now not entirely sure how this is suppose to work and did I not tune this right. It obviously fails at the 30 minute mark, this is 25 minutes, however the servers never loses connection (I have a ping (not quite the same), but it has zero packet loss over the 2+ hours that these attempts to get things replicated take)). So maybe someone smarter then me can advice how I should tune the keep alives if that's what is happening.</p><p>I thought it would only use the keep-alives if it felt the partner was no longer there, but since i know pings show there is no connectivity issues, I'm at a loss. AGAIN :)</p><p>Thanks for the assist</p><span class="HOEnZb"><font color="#888888"><p>Tory</p></font></span></div></div></div>
</blockquote></div><br></div>