[Slony1-general] RFC

Wed Oct 20 22:50:59 PDT 2004

One of the features intended for 1.1 is the ability to serialize the
updates to go out into files that can be kept in a spool directory.

The spool files could then be transferred via whatever means was
desired to a "slave system," whether that be via FTP, rsync, or
perhaps even by pushing them onto a 1GB "USB key" to be sent to the
destination by clipping it to the ankle of some sort of "avian
transport" system ;-).

There are plenty of neat things you can do with a data stream in this
form, including:

  -> Using it to replicate to nodes that _aren't_ securable
  -> Supporting a different form of PITR
  -> If disaster strikes, you can look at the logs of queries
     themselves
  -> This is a neat scheme for building load for tests...
  -> We have a data "escrow" system that would become incredibly
     cheaper given 'log shipping'

But we need to start thinking about how to implement it to be usable.
I'm at the stage of starting to think about questions; this will be
WAY richer on questions than on answers...

Q1: Where should the "spool files" for a subscription set be generated?

 Several thoughts come to mind:

  A1 -> The slon for the origin node generates them

  A2 -> Any slon node participating in the subscription set can generate
        them

  A3 -> A special "pseudo-node" generates spool files rather than applying
        changes to a database

 Based on the implications, A1 seems somewhat preferable... 

Q2: What takes place when a failover/MOVE SET takes place?

   -> If we picked, for Q1, A2 or A3, then the answer is "nothing."

   -> If Q1's answer was A1, then it becomes necessary for the new
      origin to start generating spool files.  

      What do we do if it that slon hasn't got suitable configuration?
      Simply stop spooling?

  A natural implication is that you'd want all of the slons, for a
  particular cluster, that are permitted to "log ship", to run on the
  same host so that they get identical filesystems to work with.

Q3: What if we run out of "spool space"?

   -> It's forced to stop writing out logs; this should _prevent_
      purging sl_log_1/sl_log_2 entries in the affected range so 
      that "log shipping" isn't corrupted.

      In effect, "log shipping" is a sort of 'virtual destination'
      that Slony-I's existing data structures need to know something
      about.  It's not a true node, but we may need to create
      something resembling one...

Q4: How do we configure it?

   Things that need to be configured include:

   a) Path in which to put "spool files"

   b) Set identification; each replication set should be handled
      separately

      The path probably should have, appended to it:
       sprintf("/set%9d/", set_id)

   c) Naming convention for the spool files, likely using a
      strftime()-conformant name string, also with the option of
      having it use the starting and/or ending SYNC ids.

   d) This needs to be passed to the slon; probably in new options
      in the config file (ergo it is 1.1-only...)

   e) Activating log shipping will be an issue, depending on Q1.

      - If Q1's answer was A1, where logging only takes place at the
        origin, then all we forcibly need to do is to have an option
        like:

        gen_logs_at_origin = t

      - If Q1's answer was A3, then what we already have is a 
        "special slon" and there's no need for any extra options.

      - If Q1's answer was A2, namely that the slon can decide to
        do log shipping for any set, we'll need to have an option
        akin to one of the following:

        gen_logs_sets = 1,2,4    # Generate logs for sets 1,2,4

        or the "Java properties" style:

        gen_logs.set1.path = '/var/spool/slonylogs/somesystem/set1'
        gen_logs.set1.spoolname = '%s.log'
        gen_logs.set2.path = '/var/spool/slonylogs/somesystem/set2'
        gen_logs.set2.spoolname = '%s.log'
        gen_logs.set4.path = '/var/spool/slonylogs/somesystem/set4'
        gen_logs.set4.spoolname = '%s.log'

It looks to me like it is preferable to log at the origin, which means
that there's just one option, "gen_logs_at_origin", which would be set
to "t" on all the nodes that are intended as candidates for being
'master' nodes...

Q5: What should the logs consist of?

  -> Should they simply consist of the updates on the tables Slony-I
     is to replicate?

  -> Should there also be some information stored concerning what
     SYNCS are processed?

  -> Would the log-shipped-subscribers also operate in a
     "mostly-read-only" mode as is the case for 'direct' subscribers?

  -> How much metadata should get added in?  E.g. - comments about
     SYNCs, when data was added in, data about events that aren't
     directly updating data

     For this to be overly voluminous would be irritating, but having
     some metadata to search through would be handy...

  -> Would it be a useful notion to try to make it possible for a
     resulting "node" to join a replication set?

I'm sure there are some "oughtn't try to do that" answers to be had
here, but we might as well 
-- 
"cbbrowne","@","ca.afilias.info"
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)