[Slony1-commit] slony1-engine/doc/adminguide addthings.sgml bestpractices.sgml dropthings.sgml failover.sgml firstdb.sgml help.sgml maintenance.sgml monitoring.sgml reshape.sgml slony.sgml

Mon Mar 24 08:57:36 PDT 2008

Update of /home/cvsd/slony1/slony1-engine/doc/adminguide
In directory main.slony.info:/tmp/cvs-serv20409

Modified Files:
	addthings.sgml bestpractices.sgml dropthings.sgml 
	failover.sgml firstdb.sgml help.sgml maintenance.sgml 
	monitoring.sgml reshape.sgml slony.sgml 
Log Message:
Ensure that test_slony_schema scripts are prominently mentioned in the
admin guide as a "best practice."

Index: dropthings.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/dropthings.sgml,v
retrieving revision 1.17
retrieving revision 1.18
diff -C2 -d -r1.17 -r1.18
*** dropthings.sgml	5 Jan 2007 19:11:28 -0000	1.17
--- dropthings.sgml	24 Mar 2008 15:57:34 -0000	1.18
***************
*** 159,162 ****
--- 159,172 ----
  nodes.</para>
  </sect2>
+ 
+ <sect2> <title> Verifying Cluster Health </title>
+ 
+ <para> After performing any of these procedures, it is an excellent
+ idea to run the <filename>tools</filename> script &lteststate;, which
+ rummages through the state of the entire cluster, pointing out any
+ anomalies that it finds.  This includes a variety of sorts of
+ communications problems.</para>
+ 
+ </sect2>
  </sect1>
  <!-- Keep this comment at the end of the file

Index: reshape.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/reshape.sgml,v
retrieving revision 1.21
retrieving revision 1.22
diff -C2 -d -r1.21 -r1.22
*** reshape.sgml	22 Oct 2007 20:50:35 -0000	1.21
--- reshape.sgml	24 Mar 2008 15:57:34 -0000	1.22
***************
*** 40,43 ****
--- 40,48 ----
  about <xref linkend="stmtstorelisten">.</para></listitem>

+ <listitem><para> After performing the configuration change, you
+ should, as <xref linkend="bestpractices">, run the &lteststate;
+ scripts in order to validate that the cluster state remains in good
+ order after this change. </para> </listitem>
+ 
  </itemizedlist>
  </para>

Index: monitoring.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/monitoring.sgml,v
retrieving revision 1.41
retrieving revision 1.42
diff -C2 -d -r1.41 -r1.42
*** monitoring.sgml	25 Feb 2008 15:37:58 -0000	1.41
--- monitoring.sgml	24 Mar 2008 15:57:34 -0000	1.42
***************
*** 5,8 ****
--- 5,78 ----
  <indexterm><primary>monitoring &slony1;</primary></indexterm>

+ <sect2 id="testslonystate"> <title> test_slony_state</title>
+ 
+ <indexterm><primary>script test_slony_state to test replication state</primary></indexterm>
+ 
+ <para> This invaluable script does various sorts of analysis of the
+ state of a &slony1; cluster.  &slony1; <xref linkend="bestpractices">
+ recommend running these scripts frequently (hourly seems suitable) to
+ find problems as early as possible.  </para>
+ 
+ <para> You specify arguments including <option>database</option>,
+ <option>host</option>, <option>user</option>,
+ <option>cluster</option>, <option>password</option>, and
+ <option>port</option> to connect to any of the nodes on a cluster.
+ You also specify a <option>mailprog</option> command (which should be
+ a program equivalent to <productname>Unix</productname>
+ <application>mailx</application>) and a recipient of email. </para>
+ 
+ <para> You may alternatively specify database connection parameters
+ via the environment variables used by
+ <application>libpq</application>, <emphasis>e.g.</emphasis> - using
+ <envar>PGPORT</envar>, <envar>PGDATABASE</envar>,
+ <envar>PGUSER</envar>, <envar>PGSERVICE</envar>, and such.</para>
+ 
+ <para> The script then rummages through <xref linkend="table.sl-path">
+ to find all of the nodes in the cluster, and the DSNs to allow it to,
+ in turn, connect to each of them.</para>
+ 
+ <para> For each node, the script examines the state of things,
+ including such things as:
+ 
+ <itemizedlist>
+ <listitem><para> Checking <xref linkend="table.sl-listen"> for some
+ <quote>analytically determinable</quote> problems.  It lists paths
+ that are not covered.</para></listitem>
+ 
+ <listitem><para> Providing a summary of events by origin node</para>
+ 
+ <para> If a node hasn't submitted any events in a while, that likely
+ suggests a problem.</para></listitem>
+ 
+ <listitem><para> Summarizes the <quote>aging</quote> of table <xref
+ linkend="table.sl-confirm"> </para>
+ 
+ <para> If one or another of the nodes in the cluster hasn't reported
+ back recently, that tends to lead to cleanups of tables like &sllog1;,
+ &sllog2; and &slseqlog; not taking place.</para></listitem>
+ 
+ <listitem><para> Summarizes what transactions have been running for a
+ long time</para>
+ 
+ <para> This only works properly if the statistics collector is
+ configured to collect command strings, as controlled by the option
+ <option> stats_command_string = true </option> in <filename>
+ postgresql.conf </filename>.</para>
+ 
+ <para> If you have broken applications that hold connections open,
+ this will find them.</para>
+ 
+ <para> If you have broken applications that hold connections open,
+ that has several unsalutory effects as <link
+ linkend="longtxnsareevil"> described in the
+ FAQ</link>.</para></listitem>
+ 
+ </itemizedlist></para>
+ 
+ <para> The script does some diagnosis work based on parameters in the
+ script; if you don't like the values, pick your favorites!</para>
+ 
+ </sect2>
+ 
  <sect2> <title> &nagios; Replication Checks </title>

***************
*** 132,203 ****
  </sect2>

- <sect2 id="testslonystate"> <title> test_slony_state</title>
- 
- <indexterm><primary>script test_slony_state to test replication state</primary></indexterm>
- 
- <para> This script does various sorts of analysis of the state of a
- &slony1; cluster.</para>
- 
- <para> You specify arguments including <option>database</option>,
- <option>host</option>, <option>user</option>,
- <option>cluster</option>, <option>password</option>, and
- <option>port</option> to connect to any of the nodes on a cluster.
- You also specify a <option>mailprog</option> command (which should be
- a program equivalent to <productname>Unix</productname>
- <application>mailx</application>) and a recipient of email. </para>
- 
- <para> You may alternatively specify database connection parameters
- via the environment variables used by
- <application>libpq</application>, <emphasis>e.g.</emphasis> - using
- <envar>PGPORT</envar>, <envar>PGDATABASE</envar>,
- <envar>PGUSER</envar>, <envar>PGSERVICE</envar>, and such.</para>
- 
- <para> The script then rummages through <xref linkend="table.sl-path">
- to find all of the nodes in the cluster, and the DSNs to allow it to,
- in turn, connect to each of them.</para>
- 
- <para> For each node, the script examines the state of things,
- including such things as:
- 
- <itemizedlist>
- <listitem><para> Checking <xref linkend="table.sl-listen"> for some
- <quote>analytically determinable</quote> problems.  It lists paths
- that are not covered.</para></listitem>
- 
- <listitem><para> Providing a summary of events by origin node</para>
- 
- <para> If a node hasn't submitted any events in a while, that likely
- suggests a problem.</para></listitem>
- 
- <listitem><para> Summarizes the <quote>aging</quote> of table <xref
- linkend="table.sl-confirm"> </para>
- 
- <para> If one or another of the nodes in the cluster hasn't reported
- back recently, that tends to lead to cleanups of tables like &sllog1;,
- &sllog2; and &slseqlog; not taking place.</para></listitem>
- 
- <listitem><para> Summarizes what transactions have been running for a
- long time</para>
- 
- <para> This only works properly if the statistics collector is
- configured to collect command strings, as controlled by the option
- <option> stats_command_string = true </option> in <filename>
- postgresql.conf </filename>.</para>
- 
- <para> If you have broken applications that hold connections open,
- this will find them.</para>
- 
- <para> If you have broken applications that hold connections open,
- that has several unsalutory effects as <link
- linkend="longtxnsareevil"> described in the
- FAQ</link>.</para></listitem>
- 
- </itemizedlist></para>
- 
- <para> The script does some diagnosis work based on parameters in the
- script; if you don't like the values, pick your favorites!</para>
- 
- </sect2>
- 
  <sect2 id="search-logs"> <title> <command>search-logs.sh</command> </title>

--- 202,205 ----

Index: bestpractices.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/bestpractices.sgml,v
retrieving revision 1.32
retrieving revision 1.33
diff -C2 -d -r1.32 -r1.33
*** bestpractices.sgml	11 Mar 2008 15:56:00 -0000	1.32
--- bestpractices.sgml	24 Mar 2008 15:57:34 -0000	1.33
***************
*** 447,452 ****
  </listitem>

! <listitem><para> Use <filename>test_slony_state.pl</filename> to look
! for configuration problems.</para>

  <para>This is a Perl script which connects to a &slony1; node and then
--- 447,452 ----
  </listitem>

! <listitem><para> Run &lteststate; frequently to discover configuration
! problems as early as possible.</para>

  <para>This is a Perl script which connects to a &slony1; node and then

Index: failover.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/failover.sgml,v
retrieving revision 1.26
retrieving revision 1.27
diff -C2 -d -r1.26 -r1.27
*** failover.sgml	27 Feb 2008 19:37:03 -0000	1.26
--- failover.sgml	24 Mar 2008 15:57:34 -0000	1.27
***************
*** 133,136 ****
--- 133,141 ----
  be any loss of data.</para>

+ <para> After performing the configuration change, you should, as <xref
+ linkend="bestpractices">, run the &lteststate; scripts in order to
+ validate that the cluster state remains in good order after this
+ change. </para>
+ 
  </sect2>
  <sect2><title> Failover</title>
***************
*** 226,229 ****
--- 231,240 ----

  </listitem>
+ 
+ <listitem> <para> After performing the configuration change, you
+ should, as <xref linkend="bestpractices">, run the &lteststate;
+ scripts in order to validate that the cluster state remains in good
+ order after this change. </para> </listitem>
+ 
  </itemizedlist>

Index: slony.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/slony.sgml,v
retrieving revision 1.40
retrieving revision 1.41
diff -C2 -d -r1.40 -r1.41
*** slony.sgml	25 Feb 2008 15:37:58 -0000	1.40
--- slony.sgml	24 Mar 2008 15:57:34 -0000	1.41
***************
*** 52,55 ****
--- 52,56 ----
  <!ENTITY lslon "<xref linkend=slon>">
  <!ENTITY lslonik "<xref linkend=slonik>">
+ <!ENTITY lteststate "<xref linkend=testslonystate>">

  ]>

Index: help.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/help.sgml,v
retrieving revision 1.20
retrieving revision 1.21
diff -C2 -d -r1.20 -r1.21
*** help.sgml	27 Nov 2006 17:27:42 -0000	1.20
--- help.sgml	24 Mar 2008 15:57:34 -0000	1.21
***************
*** 11,17 ****
  <listitem><para> Before submitting questions to any public forum as to
  why <quote>something mysterious</quote> has happened to your
! replication cluster, please run the <xref linkend="testslonystate">
! tool.  It may give some clues as to what is wrong, and the results are
! likely to be of some assistance in analyzing the problem. </para>
  </listitem>

--- 11,18 ----
  <listitem><para> Before submitting questions to any public forum as to
  why <quote>something mysterious</quote> has happened to your
! replication cluster, be sure to run the &lteststate; tool and be
! prepared to provide its output.  It may give some clues as to what is
! wrong, and the results are likely to be of some assistance in
! analyzing the problem. </para>
  </listitem>

Index: firstdb.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/firstdb.sgml,v
retrieving revision 1.25
retrieving revision 1.26
diff -C2 -d -r1.25 -r1.26
*** firstdb.sgml	22 Jun 2007 16:15:57 -0000	1.25
--- firstdb.sgml	24 Mar 2008 15:57:34 -0000	1.26
***************
*** 322,331 ****
  the database.  When the copy process is finished, the replication
  daemon on <envar>$SLAVEHOST</envar> will start to catch up by applying
! the accumulated replication log.  It will do this in little steps, 10
! seconds worth of application work at a time.  Depending on the
! performance of the two systems involved, the sizing of the two
! databases, the actual transaction load and how well the two databases
! are tuned and maintained, this catchup process can be a matter of
! minutes, hours, or eons.</para>

  <para>You have now successfully set up your first basic master/slave
--- 322,337 ----
  the database.  When the copy process is finished, the replication
  daemon on <envar>$SLAVEHOST</envar> will start to catch up by applying
! the accumulated replication log.  It will do this in little steps,
! initially doing about 10 seconds worth of application work at a time.
! Depending on the performance of the two systems involved, the sizing
! of the two databases, the actual transaction load and how well the two
! databases are tuned and maintained, this catchup process may be a
! matter of minutes, hours, or eons.</para>
! 
! <para> If you encounter problems getting this working, check over the
! logs for the &lslon; processes, as error messages are likely to be
! suggestive of the nature of the problem.  The tool &lteststate; is
! also useful for diagnosing problems with nearly-functioning
! replication clusters.</para>

  <para>You have now successfully set up your first basic master/slave
***************
*** 380,386 ****
  <filename>slony-I-basic-mstr-slv.txt</filename>.</para>

! <para>If this script returns <command>FAILED</command> please contact the
! developers at <ulink url="http://slony.info/">
! http://slony.info/</ulink></para></sect3>
  </sect2>
  </sect1>
--- 386,394 ----
  <filename>slony-I-basic-mstr-slv.txt</filename>.</para>

! <para>If this script returns <command>FAILED</command> please contact
! the developers at <ulink url="http://slony.info/">
! http://slony.info/</ulink>.  Be sure to be prepared with useful
! diagnostic information including the logs generated by &lslon;
! processes and the output of &lteststate;. </para></sect3>
  </sect2>
  </sect1>

Index: maintenance.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/maintenance.sgml,v
retrieving revision 1.29
retrieving revision 1.30
diff -C2 -d -r1.29 -r1.30
*** maintenance.sgml	25 Feb 2008 15:37:58 -0000	1.29
--- maintenance.sgml	24 Mar 2008 15:57:34 -0000	1.30
***************
*** 180,187 ****
  <indexterm><primary>testing cluster status</primary></indexterm>

! <para> In the <filename>tools</filename> directory, you may find
! scripts called <filename>test_slony_state.pl</filename> and
! <filename>test_slony_state-dbi.pl</filename>.  One uses the Perl/DBI
! interface; the other uses the Pg interface.
  </para>

--- 180,187 ----
  <indexterm><primary>testing cluster status</primary></indexterm>

! <para> In the <filename>tools</filename> directory, you will find
! &lteststate; scripts called <filename>test_slony_state.pl</filename>
! and <filename>test_slony_state-dbi.pl</filename>.  One uses the
! Perl/DBI interface; the other uses the Pg interface.
  </para>

***************
*** 189,195 ****
  &slony1; node (you can pick any one), and from that, determine all the
  nodes in the cluster.  They then run a series of queries (read only,
! so this should be quite safe to run) which look at the various
! &slony1; tables, looking for a variety of sorts of conditions
! suggestive of problems, including:
  </para>

--- 189,195 ----
  &slony1; node (you can pick any one), and from that, determine all the
  nodes in the cluster.  They then run a series of queries (read only,
! so this should be quite safe to run) which examine various &slony1;
! tables, looking for a variety of sorts of conditions suggestive of
! problems, including:
  </para>

Index: addthings.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/addthings.sgml,v
retrieving revision 1.29
retrieving revision 1.30
diff -C2 -d -r1.29 -r1.30
*** addthings.sgml	11 Jun 2007 16:02:50 -0000	1.29
--- addthings.sgml	24 Mar 2008 15:57:34 -0000	1.30
***************
*** 296,304 ****
  </para></listitem>

! <listitem><para> At this point, it is an excellent idea to run
! the <filename>tools</filename>
! script <command>test_slony_state-dbi.pl</command>, which rummages
! through the state of the entire cluster, pointing out any anomalies
! that it finds.  This includes a variety of sorts of communications
  problems.</para> </listitem>

--- 296,303 ----
  </para></listitem>

! <listitem><para> At this point, it is an excellent idea to run the
! <filename>tools</filename> script &lteststate;, which rummages through
! the state of the entire cluster, pointing out any anomalies that it
! finds.  This includes a variety of sorts of communications
  problems.</para> </listitem>

***************
*** 353,361 ****
  originates a replication set.</para> </listitem>

! <listitem><para> Run the <filename>tools</filename>
! script <command>test_slony_state-dbi.pl</command>, which rummages
! through the state of the entire cluster, pointing out any anomalies
! that it notices, as well as some information on the status of each
! node. </para> </listitem>

  </itemizedlist>
--- 352,359 ----
  originates a replication set.</para> </listitem>

! <listitem><para> Run the <filename>tools</filename> script
! &lteststate;, which rummages through the state of the entire cluster,
! pointing out any anomalies that it notices, as well as some
! information on the status of each node. </para> </listitem>

  </itemizedlist>