Christopher Browne cbbrowne at ca.afilias.info
Mon Jan 7 09:43:53 PST 2008
I have been looking at this idea for a while, and I *think* it has
enough merit to consider implementing it.

The present way that SYNCs are processed may be outlined thus:

 - slon opens a LOG cursor to the provider that selects the contents
   of sl_log_1/sl_log_2 that are relevant to the SYNC set being
   processed

 - slon fetches 100 entries at a time, leaving out all Large Tuples
   (based on configurable parameter)

 - For each of the 100 entries, the slon generates an
   INSERT/UPDATE/DELETE, and delivers them to the worker process.

   In cases where the tuples were Very Large, those tuples get queried
   individually.

 - If the subscriber is a logging subscriber, we then also insert the
   sl_log_? data into the appropriate sl_log_? table on the subscriber.

I propose a rather different approach, that uses COPY and a stored
procedure.

 - Rather than opening a cursor, we submit a COPY query against the
 provider of the form:

   "COPY (select * from sl_log_1 where [SYNC criteria] union all
          select * from sl_log_2 where [SYNC criteria]) to stdout;"

 - We then set up a "COPY [subscriber's_current_sl_log_? table] from stdin;" 
   on the subscriber.

 - We loop, so long as there is data provided by the first statement, and
   COPY the data to the subscriber node.

 - Once the COPY is complete, we run a stored procedure on the
   subscriber node, telling it the same [SYNC criteria] given above, and
   this stored procedure runs through the data in sl_log_? on the subscriber
   in actionseq order.  For each row, it will:

    - build an insert/update/delete statement
    - execute that statement

I can already see several straightforward improvements and tradeoffs
in this, as compared with what we have now.  (Hopefully none of it
involves overly baroque plans with obvious fatal errors! - see below!)

  1.  Some processing load gets taken off the provider

      The "LOG cursor" query becomes a "COPY [foo] to stdout" query,
      and it's worth noting that we lose the need to have an ORDER BY
      clause, which eliminates a sort.

  2.  Processing load is moved from slon to subscriber DBMS

      The present slon loads sl_log_? tuples into memory, and the C
      code generates the INSERT/UPDATE/DELETE.

      In the proposed method, the stored procedure does that work,
      on the subscriber DBMS.

      An important question: Will that loop lead to grossly excessive
      backend memory usage in cases where Large Tuples are processed?
      (e.g. - where the INSERT statement is inserting a tuple
      consisting of 40MB of data)

  3.  We use COPY to load data onto the subscriber

      There are two very large benefits to this, in that:
       i) COPY should be *WAY* faster than the INSERT presently used;
       ii) We can COPY in specific-sized-buffer chunks, which eliminates
           the somewhat-overly baroque code that tries to limit slon
           memory usage.

      There is a downside: with this approach, we now have no option
      for a subscriber node to NOT be configured to be a provider; all
      nodes now load data into the sl_log_? tables.

   4. The "shape" of lag times, on subscribers, changes somewhat.

      Right now, there is a sort of "linear" behaviour; if a SYNC is
      half-done, then that means that
           - half of the tuples needing updates have been updated;
           - half of the sl_log_? copies have been loaded.

      With the new behaviour, we load all of the data into sl_log_?
      before doing anything about applying changes into the replicated
      tables.

      I'm not sure if this one matters at all.

   5. There is the new potential to untie loading sl_log_? from
      processing it.

      That is, we could, in principle, make those processes
      asynchronous, with separate COMMITs.  

      De-synchronizing log application from log copying adds the
      interesting result that we probably cut down on the potential
      for data loss in case of loss of an origin node.

      Let us suppose, for sake of argument (and I'm making up
      numbers!)  that present processing takes 2.0s per SYNC, and that
      the new approach takes 0.15s to copy sl_log_?, and 1.5s to apply
      sl_log_?.

      Given those (made-up) numbers, the new approach means that that
      SYNC is vulnerable to loss for only 0.15s, rather than the 2.0s
      of the past.

      Further, if we did enough tracking to do this, it ought to be
      possible to set up a replication node which consists *solely* of
      the Slony-I schema.  You'd have a small database whose contents
      would "churn" pretty heavily, but where it would likely all be
      cached.

   6. Replication is applied based on subscriber node configuration

      Thus, you could rename tables on a subscriber-by-subscriber
      basis.

      I think that's more fitting...

There's enough interaction with worker threads that this would
represent a fairly significant code change to Slony-I, and the number
of changes in behaviour warrant community discussion.

My belief is that it's a worthwhile change, even considering the
tradeoffs (notable one being #3 - you can't have "subscribe only"
nodes that don't record sl_log_? data).

Jan has observed that there is something of a "downside" in that we do
*not* gain, in any of this, the ability to use prepared statements (or
similar) against the replicated tables.  He's keen on creating some
new approach where the triggers capture something looking more like
the underlying data structure of the tuple, and where the action, on
the subscriber, involves updating the table down nearer to the
"storage level" deeper inside PostgreSQL.  I'm not sure that's
actually an argument against doing this.
-- 
let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;;
http://www3.sympatico.ca/cbbrowne/
Evil Overlords tend to get overthrown due to overly baroque plans with
obvious fatal  errors.  Follow the  "Rules of the Evil  Overlord," and
you need not fear heroic  opposition, whether that hero be James Bond,
Flash Gordon, or a little hobbit named Frodo.


More information about the Slony1-general mailing list