<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Hi!<br>

    </p>

    <div class="moz-cite-prefix">Am 07.10.2019 um 19:17 schrieb

      Christopher Browne:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAFNqd5U7vQ8rpn4mNcFQNRMryoWovdRqRY7QBJdEG=7Q3QWt_g@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div dir="ltr">On Mon, 7 Oct 2019 at 11:50, Klaus Darilion &lt;<a

            href="mailto:klaus.mailinglists@pernau.at" target="_blank"

            moz-do-not-send="true">klaus.mailinglists@pernau.at</a>&gt;

          wrote:<br>

        </div>

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">Hello!<br>

            <br>

            We use slony 2.1.4 and will be forced to this version for

            some more time.<br>

            <br>

            Today I debugged an issue where the logswitching did not

            finish.<br>

            Although it would be safe (in my opinion) to truncate the

            old log table,<br>

            the logswitch_finish() fails with:<br>

            <br>

              could not lock sl_log_2 - sl_log_2 not truncated<br>

            <br>

            The function tries to lock the sl_log table with:<br>

            <br>

             begin;<br>

             lock table "_regdnscluster".sl_log_2 in access exclusive

            mode nowait;<br>

            <br>

            The problem seems, that the table is so hot in reading (55

            slaves) that<br>

            the lock hardly succeeds.<br>

            <br>

            If I call logswitch_finish() manually (because the cleanup

            thread tries<br>

            only every 10 minutes - hard coded) I need to call it approx

            100 times<br>

            until I get the lock.<br>

            <br>

            Is there a reason to use "nowait"? As far as I understand,

            it should be<br>

            safe to wait some time until giving up, i.e.:<br>

            <br>

            SET lock_timeout TO '10s';<br>

            begin;<br>

            lock table "_regdnscluster".sl_log_2 in access exclusive

            mode;<br>

            <br>

            <br>

            This way, log switching can happen more often.<br>

          </blockquote>

          <div><br>

          </div>

          <div>set lock_timeout was introduced in PostgreSQL 9.3, so it

            isn't available in "all versions."<br>

            <br>

          </div>

          <div>When it was introduced, we wouldn't have been keen on

            directly adopting it due to that factor, especially in view

            that one of the major use cases for Slony is as a way of

            upgrading from elderly versions of PostgreSQL.</div>

          <div><br>

          </div>

          <div>It surely seems like a reasonable idea to attempt to use

            it now, for the reasons you suggest.</div>

          <div><br>

          </div>

        </div>

      </div>

    </blockquote>

    <p>I am not sure anymore if it so easy. I have change the function

      and call it via a cron job manually every minute and I got plenty

      of "deadlock detected errors", ie:</p>

    <p>(relation 83002 and 83009 are the sl_log_1 and sl_log_2 tables).

    </p>

    <p>2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598

      ERROR:  deadlock detected<br>

      2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598

      DETAIL:  Process 17816 waits for AccessExclusiveLock on relation

      83002 of database 16414; blocked by process 19342.<br>

              Process 19342 waits for AccessShareLock on relation 83009

      of database 16414; blocked by process 17816.<br>

              Process 17816: select * from

      _regdnscluster.logswitch_finish_klaus();<br>

              Process 19342: declare LOG cursor for select log_origin,

      log_txid, log_tableid, log_actionseq, log_cmdtype,

      octet_length(log_cmddata), case when octet_length(log_cmddata)

      &lt;= 8192 then log_cmddata else null end from

      "_regdnscluster".sl_log_1 where log_origin = 1 and log_tableid in

      (1,3,5,7,9,10) and log_txid &gt;= '10374380842' and log_txid &lt;

      '10374380941' and "pg_catalog".txid_visible_in_snapshot(log_txid,

      '10374380941:10374380941:') union all select log_origin, log_txid,

      log_tableid, log_actionseq, log_cmdtype,

      octet_length(log_cmddata), case when octet_length(log_cmddata)

      &lt;= 8192 then log_cmddata else null end from

      "_regdnscluster".sl_log_1 where log_origin = 1 and log_tableid in

      (1,3,5,7,9,10) and log_txid in (select * from

      "pg_catalog".txid_snapshot_xip('10374380842:10374380842:') except

      select * from

      "pg_catalog".txid_snapshot_xip('10374380941:10374380941:') ) union

      all select log_origin, log_txid, log_tableid, log_actionseq,

      log_cmdtype, octet_length(log_cmddata), case when

      octet_length(log_cmddata) &lt;= 8192 then log_cmd<br>

      2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598 HINT: 

      See server log for query details.<br>

      2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598

      CONTEXT:  SQL statement "truncate "_regdnscluster".sl_log_1"<br>

              PL/pgSQL function _regdnscluster.logswitch_finish_klaus()

      line 129 at SQL statement<br>

      <br>

    </p>

    <p>So, currently I have the old locking activateded again.</p>

    <p><br>

    </p>

    <p>regards</p>

    <p>Klaus</p>

    <p><br>

    </p>

    <br>

  </body>

</html>