[Slony1-commit] slony1-engine/doc/adminguide complexenv.dia complexenv.png complexfail.dia complexfail.png failover.sgml slonik_ref.sgml

Fri Jan 16 09:16:55 PST 2009

Update of /home/cvsd/slony1/slony1-engine/doc/adminguide
In directory main.slony.info:/tmp/cvs-serv32389

Modified Files:
	complexenv.dia complexenv.png failover.sgml slonik_ref.sgml 
Added Files:
	complexfail.dia complexfail.png 
Log Message:
Add in documentation about complex failover scenarios, e.g. - the handling
of failover where a whole site is lost.


--- NEW FILE: complexfail.dia ---
(This appears to be a binary file; contents omitted.)

--- NEW FILE: complexfail.png ---
(This appears to be a binary file; contents omitted.)

Index: complexenv.png
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/complexenv.png,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
Binary files /tmp/cvsohDcSc and /tmp/cvsEIGX36 differ

Index: slonik_ref.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/slonik_ref.sgml,v
retrieving revision 1.92
retrieving revision 1.93
diff -C2 -d -r1.92 -r1.93
*** slonik_ref.sgml	17 Nov 2008 22:41:21 -0000	1.92
--- slonik_ref.sgml	16 Jan 2009 17:16:52 -0000	1.93
***************
*** 2576,2579 ****
--- 2576,2588 ----
      <emphasis>not</emphasis> abandon the failed node.
      </para>
+ 
+     <para> If there are many nodes in a cluster, and failover includes
+     dropping out additional nodes (<emphasis>e.g.</emphasis> when it
+     is necessary to treat <emphasis>all</emphasis> nodes at a site
+     including an origin as well as subscribers as failed), it is
+     necessary to carefully sequence the actions, as described in <xref
+     linkend="complexfailover">.
+     </para>
+ 
     </refsect1>
     <refsect1> <title> Version Information </title>

Index: complexenv.dia
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/complexenv.dia,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
Binary files /tmp/cvskFeegf and /tmp/cvs6JhGu9 differ

Index: failover.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/failover.sgml,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** failover.sgml	13 Oct 2008 19:29:12 -0000	1.28
--- failover.sgml	16 Jan 2009 17:16:52 -0000	1.29
***************
*** 241,244 ****
--- 241,319 ----
  </sect2>
  
+ <sect2 id="complexfailover"> <title> Failover With Complex Node Set </title>
+ 
+ <para> Failover is relatively <quote/simple/ if there are only two
+ nodes; if a &slony1; cluster comprises many nodes, achieving a clean
+ failover requires careful planning and execution. </para>
+ 
+ <para> Consider the following diagram describing a set of six nodes at two sites.
+ 
+ <inlinemediaobject> <imageobject> <imagedata fileref="complexenv.png">
+ </imageobject> <textobject> <phrase> Symmetric Multisites </phrase>
+ </textobject> </inlinemediaobject></para>
+ 
+ <para> Let us assume that nodes 1, 2, and 3 reside at one data
+ centre, and that we find ourselves needing to perform failover due to
+ failure of that entire site.  Causes could range from a persistent
+ loss of communications to the physical destruction of the site; the
+ cause is not actually important, as what we are concerned about is how
+ to get &slony1; to properly fail over to the new site.</para>
+ 
+ <para> We will further assume that node 5 is to be the new origin,
+ after failover. </para>
+ 
+ <para> The sequence of &slony1; reconfiguration required to properly
+ failover this sort of node configuration is as follows:
+ </para>
+ 
+ <itemizedlist>
+ 
+ <listitem><para> Resubscribe (using <xref linkend="stmtsubscribeset">
+ ech node that is to be kept in the reformation of the cluster that is
+ not already subscribed to the intended data provider.  </para>
+ 
+ <para> In the example cluster, this means we would likely wish to
+ resubscribe nodes 4 and 6 to both point to node 5.</para>
+ 
+ <programlisting>
+    include &lt;/tmp/failover-preamble.slonik&gt;;
+    subscribe set (id = 1, provider = 5, receiver = 4);
+    subscribe set (id = 1, provider = 5, receiver = 4);
+ </programlisting>
+ 
+ </listitem>
+ <listitem><para> Drop all unimportant nodes, starting with leaf nodes.</para>
+ 
+ <para> Since nodes 1, 2, and 3 are unaccessible, we must indicate the
+ <envar>EVENT NODE</envar> so that the event reaches the still-live
+ portions of the cluster. </para>
+ 
+ <programlisting>
+    include &lt;/tmp/failover-preamble.slonik&gt;;
+    drop node (id=2, event node = 4);
+    drop node (id=3, event node = 4);
+ </programlisting>
+ 
+ </listitem>
+ 
+ <listitem><para> Now, run <command>FAILOVER</command>.</para>
+ 
+ <programlisting>
+    include &lt;/tmp/failover-preamble.slonik&gt;;
+    failover (id = 1, backup node = 5);
+ </programlisting>
+ 
+ </listitem>
+ 
+ <listitem><para> Finally, drop the former origin from the cluster.</para>
+ 
+ <programlisting>
+    include &lt;/tmp/failover-preamble.slonik&gt;;
+    drop node (id=1, event node = 4);
+ </programlisting>
+ </listitem>
+ 
+ </itemizedlist>
+ 
  <sect2><title> Automating <command> FAIL OVER </command> </title>