[Slony1-hackers] Separating log pulling from log processing

Mon Dec 17 15:53:51 PST 2007

An idea that has been brought up as being potentially of some value in
improving performance is that of changing the processing of sl_log_*
tables from the present "SELECT [relevant tuples] from sl_log_n" to...

On Provider:
  copy (select * from sl_log_n where [relevant tuples]) to stdout;

On Subscriber:
  copy _slony_schema.sl_log_[current] from stdin;

This would have a number of interesting direct effects:

- It eliminates the need for us to optimize this for tuple size; the
  COPY won't chew slon memory arbitrarily badly.

- COPY gets the data into the log table pretty much As Quickly As Is
  Conceivably Possible.  Way more efficient than as a set of INSERT
  statements.

- We then need to run some process on the subscriber node to process
  all the tuples.  It has been suggested that this be either rules or
  triggers on sl_log_n.

But in some conversations, some neat further ideas suggest
themselves...

Forget the triggers/rules; *all* that the COPY does is the COPY.  A
*later* "APPLY" phase sees to applying the updates to replicated
tables.

This would have a number of interesting effects:

- It eliminates the need for the "SYNC pipelining" idea (or rather
  simplifies it drastically)

  If "COPY" and "APPLY" are separate threads altogether, then the
  "COPY" thread can keep busy pulling data, perhaps pulling
  dramatically ahead of the "APPLY" thread.

- It will improve latency for cascaded subscribers.

  Data is available for forwarding to other nodes as soon as "COPY" is
  done; no need to wait for "APPLY."

- We might even have nodes that don't bother with an APPLY thread.

  In this case, the database would consist purely of the
  Slony-I-specific schema, with *no* data other than configuration and
  sl_log_* data.

  As this node has no "application data", it might be set up to store
  sl_log_* data for a much longer period of time.

- Possible arguable downside: This eliminates the ability to have a
  non-"Forwarding" node.
-- 
output = ("cbbrowne" "@" "linuxdatabases.info")
http://linuxdatabases.info/info/unix.html
This login session:  only $23.95!