Mon Mar 24 08:57:36 PDT 2008
- Previous message: [Slony1-commit] Replication problem in windows
- Next message: [Slony1-commit] slony1-www/content frontpage.txt
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Update of /home/cvsd/slony1/slony1-engine/doc/adminguide In directory main.slony.info:/tmp/cvs-serv20409 Modified Files: addthings.sgml bestpractices.sgml dropthings.sgml failover.sgml firstdb.sgml help.sgml maintenance.sgml monitoring.sgml reshape.sgml slony.sgml Log Message: Ensure that test_slony_schema scripts are prominently mentioned in the admin guide as a "best practice." Index: dropthings.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/dropthings.sgml,v retrieving revision 1.17 retrieving revision 1.18 diff -C2 -d -r1.17 -r1.18 *** dropthings.sgml 5 Jan 2007 19:11:28 -0000 1.17 --- dropthings.sgml 24 Mar 2008 15:57:34 -0000 1.18 *************** *** 159,162 **** --- 159,172 ---- nodes.</para> </sect2> + + <sect2> <title> Verifying Cluster Health </title> + + <para> After performing any of these procedures, it is an excellent + idea to run the <filename>tools</filename> script <eststate;, which + rummages through the state of the entire cluster, pointing out any + anomalies that it finds. This includes a variety of sorts of + communications problems.</para> + + </sect2> </sect1> <!-- Keep this comment at the end of the file Index: reshape.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/reshape.sgml,v retrieving revision 1.21 retrieving revision 1.22 diff -C2 -d -r1.21 -r1.22 *** reshape.sgml 22 Oct 2007 20:50:35 -0000 1.21 --- reshape.sgml 24 Mar 2008 15:57:34 -0000 1.22 *************** *** 40,43 **** --- 40,48 ---- about <xref linkend="stmtstorelisten">.</para></listitem> + <listitem><para> After performing the configuration change, you + should, as <xref linkend="bestpractices">, run the <eststate; + scripts in order to validate that the cluster state remains in good + order after this change. </para> </listitem> + </itemizedlist> </para> Index: monitoring.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/monitoring.sgml,v retrieving revision 1.41 retrieving revision 1.42 diff -C2 -d -r1.41 -r1.42 *** monitoring.sgml 25 Feb 2008 15:37:58 -0000 1.41 --- monitoring.sgml 24 Mar 2008 15:57:34 -0000 1.42 *************** *** 5,8 **** --- 5,78 ---- <indexterm><primary>monitoring &slony1;</primary></indexterm> + <sect2 id="testslonystate"> <title> test_slony_state</title> + + <indexterm><primary>script test_slony_state to test replication state</primary></indexterm> + + <para> This invaluable script does various sorts of analysis of the + state of a &slony1; cluster. &slony1; <xref linkend="bestpractices"> + recommend running these scripts frequently (hourly seems suitable) to + find problems as early as possible. </para> + + <para> You specify arguments including <option>database</option>, + <option>host</option>, <option>user</option>, + <option>cluster</option>, <option>password</option>, and + <option>port</option> to connect to any of the nodes on a cluster. + You also specify a <option>mailprog</option> command (which should be + a program equivalent to <productname>Unix</productname> + <application>mailx</application>) and a recipient of email. </para> + + <para> You may alternatively specify database connection parameters + via the environment variables used by + <application>libpq</application>, <emphasis>e.g.</emphasis> - using + <envar>PGPORT</envar>, <envar>PGDATABASE</envar>, + <envar>PGUSER</envar>, <envar>PGSERVICE</envar>, and such.</para> + + <para> The script then rummages through <xref linkend="table.sl-path"> + to find all of the nodes in the cluster, and the DSNs to allow it to, + in turn, connect to each of them.</para> + + <para> For each node, the script examines the state of things, + including such things as: + + <itemizedlist> + <listitem><para> Checking <xref linkend="table.sl-listen"> for some + <quote>analytically determinable</quote> problems. It lists paths + that are not covered.</para></listitem> + + <listitem><para> Providing a summary of events by origin node</para> + + <para> If a node hasn't submitted any events in a while, that likely + suggests a problem.</para></listitem> + + <listitem><para> Summarizes the <quote>aging</quote> of table <xref + linkend="table.sl-confirm"> </para> + + <para> If one or another of the nodes in the cluster hasn't reported + back recently, that tends to lead to cleanups of tables like &sllog1;, + &sllog2; and &slseqlog; not taking place.</para></listitem> + + <listitem><para> Summarizes what transactions have been running for a + long time</para> + + <para> This only works properly if the statistics collector is + configured to collect command strings, as controlled by the option + <option> stats_command_string = true </option> in <filename> + postgresql.conf </filename>.</para> + + <para> If you have broken applications that hold connections open, + this will find them.</para> + + <para> If you have broken applications that hold connections open, + that has several unsalutory effects as <link + linkend="longtxnsareevil"> described in the + FAQ</link>.</para></listitem> + + </itemizedlist></para> + + <para> The script does some diagnosis work based on parameters in the + script; if you don't like the values, pick your favorites!</para> + + </sect2> + <sect2> <title> &nagios; Replication Checks </title> *************** *** 132,203 **** </sect2> - <sect2 id="testslonystate"> <title> test_slony_state</title> - - <indexterm><primary>script test_slony_state to test replication state</primary></indexterm> - - <para> This script does various sorts of analysis of the state of a - &slony1; cluster.</para> - - <para> You specify arguments including <option>database</option>, - <option>host</option>, <option>user</option>, - <option>cluster</option>, <option>password</option>, and - <option>port</option> to connect to any of the nodes on a cluster. - You also specify a <option>mailprog</option> command (which should be - a program equivalent to <productname>Unix</productname> - <application>mailx</application>) and a recipient of email. </para> - - <para> You may alternatively specify database connection parameters - via the environment variables used by - <application>libpq</application>, <emphasis>e.g.</emphasis> - using - <envar>PGPORT</envar>, <envar>PGDATABASE</envar>, - <envar>PGUSER</envar>, <envar>PGSERVICE</envar>, and such.</para> - - <para> The script then rummages through <xref linkend="table.sl-path"> - to find all of the nodes in the cluster, and the DSNs to allow it to, - in turn, connect to each of them.</para> - - <para> For each node, the script examines the state of things, - including such things as: - - <itemizedlist> - <listitem><para> Checking <xref linkend="table.sl-listen"> for some - <quote>analytically determinable</quote> problems. It lists paths - that are not covered.</para></listitem> - - <listitem><para> Providing a summary of events by origin node</para> - - <para> If a node hasn't submitted any events in a while, that likely - suggests a problem.</para></listitem> - - <listitem><para> Summarizes the <quote>aging</quote> of table <xref - linkend="table.sl-confirm"> </para> - - <para> If one or another of the nodes in the cluster hasn't reported - back recently, that tends to lead to cleanups of tables like &sllog1;, - &sllog2; and &slseqlog; not taking place.</para></listitem> - - <listitem><para> Summarizes what transactions have been running for a - long time</para> - - <para> This only works properly if the statistics collector is - configured to collect command strings, as controlled by the option - <option> stats_command_string = true </option> in <filename> - postgresql.conf </filename>.</para> - - <para> If you have broken applications that hold connections open, - this will find them.</para> - - <para> If you have broken applications that hold connections open, - that has several unsalutory effects as <link - linkend="longtxnsareevil"> described in the - FAQ</link>.</para></listitem> - - </itemizedlist></para> - - <para> The script does some diagnosis work based on parameters in the - script; if you don't like the values, pick your favorites!</para> - - </sect2> - <sect2 id="search-logs"> <title> <command>search-logs.sh</command> </title> --- 202,205 ---- Index: bestpractices.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/bestpractices.sgml,v retrieving revision 1.32 retrieving revision 1.33 diff -C2 -d -r1.32 -r1.33 *** bestpractices.sgml 11 Mar 2008 15:56:00 -0000 1.32 --- bestpractices.sgml 24 Mar 2008 15:57:34 -0000 1.33 *************** *** 447,452 **** </listitem> ! <listitem><para> Use <filename>test_slony_state.pl</filename> to look ! for configuration problems.</para> <para>This is a Perl script which connects to a &slony1; node and then --- 447,452 ---- </listitem> ! <listitem><para> Run <eststate; frequently to discover configuration ! problems as early as possible.</para> <para>This is a Perl script which connects to a &slony1; node and then Index: failover.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/failover.sgml,v retrieving revision 1.26 retrieving revision 1.27 diff -C2 -d -r1.26 -r1.27 *** failover.sgml 27 Feb 2008 19:37:03 -0000 1.26 --- failover.sgml 24 Mar 2008 15:57:34 -0000 1.27 *************** *** 133,136 **** --- 133,141 ---- be any loss of data.</para> + <para> After performing the configuration change, you should, as <xref + linkend="bestpractices">, run the <eststate; scripts in order to + validate that the cluster state remains in good order after this + change. </para> + </sect2> <sect2><title> Failover</title> *************** *** 226,229 **** --- 231,240 ---- </listitem> + + <listitem> <para> After performing the configuration change, you + should, as <xref linkend="bestpractices">, run the <eststate; + scripts in order to validate that the cluster state remains in good + order after this change. </para> </listitem> + </itemizedlist> Index: slony.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/slony.sgml,v retrieving revision 1.40 retrieving revision 1.41 diff -C2 -d -r1.40 -r1.41 *** slony.sgml 25 Feb 2008 15:37:58 -0000 1.40 --- slony.sgml 24 Mar 2008 15:57:34 -0000 1.41 *************** *** 52,55 **** --- 52,56 ---- <!ENTITY lslon "<xref linkend=slon>"> <!ENTITY lslonik "<xref linkend=slonik>"> + <!ENTITY lteststate "<xref linkend=testslonystate>"> ]> Index: help.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/help.sgml,v retrieving revision 1.20 retrieving revision 1.21 diff -C2 -d -r1.20 -r1.21 *** help.sgml 27 Nov 2006 17:27:42 -0000 1.20 --- help.sgml 24 Mar 2008 15:57:34 -0000 1.21 *************** *** 11,17 **** <listitem><para> Before submitting questions to any public forum as to why <quote>something mysterious</quote> has happened to your ! replication cluster, please run the <xref linkend="testslonystate"> ! tool. It may give some clues as to what is wrong, and the results are ! likely to be of some assistance in analyzing the problem. </para> </listitem> --- 11,18 ---- <listitem><para> Before submitting questions to any public forum as to why <quote>something mysterious</quote> has happened to your ! replication cluster, be sure to run the <eststate; tool and be ! prepared to provide its output. It may give some clues as to what is ! wrong, and the results are likely to be of some assistance in ! analyzing the problem. </para> </listitem> Index: firstdb.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/firstdb.sgml,v retrieving revision 1.25 retrieving revision 1.26 diff -C2 -d -r1.25 -r1.26 *** firstdb.sgml 22 Jun 2007 16:15:57 -0000 1.25 --- firstdb.sgml 24 Mar 2008 15:57:34 -0000 1.26 *************** *** 322,331 **** the database. When the copy process is finished, the replication daemon on <envar>$SLAVEHOST</envar> will start to catch up by applying ! the accumulated replication log. It will do this in little steps, 10 ! seconds worth of application work at a time. Depending on the ! performance of the two systems involved, the sizing of the two ! databases, the actual transaction load and how well the two databases ! are tuned and maintained, this catchup process can be a matter of ! minutes, hours, or eons.</para> <para>You have now successfully set up your first basic master/slave --- 322,337 ---- the database. When the copy process is finished, the replication daemon on <envar>$SLAVEHOST</envar> will start to catch up by applying ! the accumulated replication log. It will do this in little steps, ! initially doing about 10 seconds worth of application work at a time. ! Depending on the performance of the two systems involved, the sizing ! of the two databases, the actual transaction load and how well the two ! databases are tuned and maintained, this catchup process may be a ! matter of minutes, hours, or eons.</para> ! ! <para> If you encounter problems getting this working, check over the ! logs for the &lslon; processes, as error messages are likely to be ! suggestive of the nature of the problem. The tool <eststate; is ! also useful for diagnosing problems with nearly-functioning ! replication clusters.</para> <para>You have now successfully set up your first basic master/slave *************** *** 380,386 **** <filename>slony-I-basic-mstr-slv.txt</filename>.</para> ! <para>If this script returns <command>FAILED</command> please contact the ! developers at <ulink url="http://slony.info/"> ! http://slony.info/</ulink></para></sect3> </sect2> </sect1> --- 386,394 ---- <filename>slony-I-basic-mstr-slv.txt</filename>.</para> ! <para>If this script returns <command>FAILED</command> please contact ! the developers at <ulink url="http://slony.info/"> ! http://slony.info/</ulink>. Be sure to be prepared with useful ! diagnostic information including the logs generated by &lslon; ! processes and the output of <eststate;. </para></sect3> </sect2> </sect1> Index: maintenance.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/maintenance.sgml,v retrieving revision 1.29 retrieving revision 1.30 diff -C2 -d -r1.29 -r1.30 *** maintenance.sgml 25 Feb 2008 15:37:58 -0000 1.29 --- maintenance.sgml 24 Mar 2008 15:57:34 -0000 1.30 *************** *** 180,187 **** <indexterm><primary>testing cluster status</primary></indexterm> ! <para> In the <filename>tools</filename> directory, you may find ! scripts called <filename>test_slony_state.pl</filename> and ! <filename>test_slony_state-dbi.pl</filename>. One uses the Perl/DBI ! interface; the other uses the Pg interface. </para> --- 180,187 ---- <indexterm><primary>testing cluster status</primary></indexterm> ! <para> In the <filename>tools</filename> directory, you will find ! <eststate; scripts called <filename>test_slony_state.pl</filename> ! and <filename>test_slony_state-dbi.pl</filename>. One uses the ! Perl/DBI interface; the other uses the Pg interface. </para> *************** *** 189,195 **** &slony1; node (you can pick any one), and from that, determine all the nodes in the cluster. They then run a series of queries (read only, ! so this should be quite safe to run) which look at the various ! &slony1; tables, looking for a variety of sorts of conditions ! suggestive of problems, including: </para> --- 189,195 ---- &slony1; node (you can pick any one), and from that, determine all the nodes in the cluster. They then run a series of queries (read only, ! so this should be quite safe to run) which examine various &slony1; ! tables, looking for a variety of sorts of conditions suggestive of ! problems, including: </para> Index: addthings.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/addthings.sgml,v retrieving revision 1.29 retrieving revision 1.30 diff -C2 -d -r1.29 -r1.30 *** addthings.sgml 11 Jun 2007 16:02:50 -0000 1.29 --- addthings.sgml 24 Mar 2008 15:57:34 -0000 1.30 *************** *** 296,304 **** </para></listitem> ! <listitem><para> At this point, it is an excellent idea to run ! the <filename>tools</filename> ! script <command>test_slony_state-dbi.pl</command>, which rummages ! through the state of the entire cluster, pointing out any anomalies ! that it finds. This includes a variety of sorts of communications problems.</para> </listitem> --- 296,303 ---- </para></listitem> ! <listitem><para> At this point, it is an excellent idea to run the ! <filename>tools</filename> script <eststate;, which rummages through ! the state of the entire cluster, pointing out any anomalies that it ! finds. This includes a variety of sorts of communications problems.</para> </listitem> *************** *** 353,361 **** originates a replication set.</para> </listitem> ! <listitem><para> Run the <filename>tools</filename> ! script <command>test_slony_state-dbi.pl</command>, which rummages ! through the state of the entire cluster, pointing out any anomalies ! that it notices, as well as some information on the status of each ! node. </para> </listitem> </itemizedlist> --- 352,359 ---- originates a replication set.</para> </listitem> ! <listitem><para> Run the <filename>tools</filename> script ! <eststate;, which rummages through the state of the entire cluster, ! pointing out any anomalies that it notices, as well as some ! information on the status of each node. </para> </listitem> </itemizedlist>
- Previous message: [Slony1-commit] Replication problem in windows
- Next message: [Slony1-commit] slony1-www/content frontpage.txt
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-commit mailing list