[Slony1-commit] By cbbrowne: Reorganized log documentation, and linked it into various

Sat Jul 29 04:30:07 PDT 2006

Log Message:
-----------
Reorganized log documentation, and linked it into various places that
should point to it.

Modified Files:
--------------
    slony1-engine/doc/adminguide:
        addthings.sgml (r1.17 -> r1.18)
        ddlchanges.sgml (r1.24 -> r1.25)
        filelist.sgml (r1.16 -> r1.17)
        logshipping.sgml (r1.14 -> r1.15)
        monitoring.sgml (r1.27 -> r1.28)
        slony.sgml (r1.32 -> r1.33)

-------------- next part --------------
Index: addthings.sgml
===================================================================
RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/addthings.sgml,v
retrieving revision 1.17
retrieving revision 1.18
diff -Ldoc/adminguide/addthings.sgml -Ldoc/adminguide/addthings.sgml -u -w -r1.17 -r1.18

--- doc/adminguide/addthings.sgml
+++ doc/adminguide/addthings.sgml
@@ -39,9 +39,9 @@
 things.  For instance, submitting multiple subscription requests for a
 particular set in one <xref linkend="slonik"> script often turns out
 quite badly.  If it is <emphasis>truly</emphasis> necessary to
-automate this, you'll probably want to submit <xref
-linkend="stmtwaitevent"> requests in between subscription requests in
-order that the <xref linkend="slonik"> script wait for one
+automate this, you'll probably want to
+submit <xref linkend="stmtwaitevent"> requests in between subscription
+requests in order that the <xref linkend="slonik"> script wait for one
 subscription to complete processing before requesting the next
 one.</para>
 
@@ -111,7 +111,12 @@
 <listitem><para> There are a number of <quote>sharp edges</quote> to note...</para></listitem>
 </itemizedlist>
 <itemizedlist>
-    <listitem><para> You absolutely <emphasis>must not</emphasis> include transaction control commands, particularly <command>BEGIN</command> and <command>COMMIT</command>, inside these DDL scripts. &slony1; wraps DDL scripts with a <command>BEGIN</command>/<command>COMMIT</command> pair; adding extra transaction control will mean that parts of the DDL will commit outside the control of &slony1; </para></listitem>
+<listitem><para> You absolutely <emphasis>must not</emphasis> include
+transaction control commands, particularly <command>BEGIN</command>
+and <command>COMMIT</command>, inside these DDL scripts. &slony1;
+wraps DDL scripts with a <command>BEGIN</command>/<command>COMMIT</command> 
+pair; adding extra transaction control will mean that parts of the DDL
+will commit outside the control of &slony1; </para></listitem>
 
 <listitem><para> Before version 1.2, it was necessary to be
 exceedingly restrictive about what you tried to process using
@@ -137,7 +142,8 @@
 
 <listitem><para> How to remove replication for a node</para>
 
-<para> You will want to remove the various &slony1; components connected to the database(s).</para>
+<para> You will want to remove the various &slony1; components
+connected to the database(s).</para>
 
 <para> We will just consider, for now, doing this to one node. If you
 have multiple nodes, you will have to repeat this as many times as
@@ -149,13 +155,15 @@
 <listitem><para> Log Triggers / Update Denial Triggers
 
 </para></listitem>
-<listitem><para>    The "cluster" schema containing &slony1; tables indicating the state of the node as well as various stored functions 
-</para></listitem>
-<listitem><para>
-    <xref linkend="slon"> process that manages the node 
-</para></listitem>
-<listitem><para>
-    Optionally, the SQL and pl/pgsql scripts and &slony1; binaries that are part of the &postgres; build. (Of course, this would make it challenging to restart replication; it is unlikely that you truly need to do this...) 
+<listitem><para> The <quote>cluster</quote> schema containing &slony1;
+tables indicating the state of the node as well as various stored
+functions
+</para></listitem>
+<listitem><para> &lslon; process that manages the node </para></listitem>
+<listitem><para> Optionally, the SQL and pl/pgsql scripts and &slony1;
+binaries that are part of the &postgres; build. (Of course, this would
+make it challenging to restart replication; it is unlikely that you
+truly need to do this...)
 </para></listitem>
 </itemizedlist>
 
@@ -186,8 +194,8 @@
 CASCADE;</command>, which will drop out &slony1; functions, tables,
 and triggers alike.  That is generally less suitable
 than <xref linkend="stmtuninstallnode">, because that command not only
-drops the schema and its contents, but also removes any columns added
-in using <xref linkend= "stmttableaddkey">.
+drops the schema and its contents, but also removes any columns
+previously added in using <xref linkend= "stmttableaddkey">.
 </para></listitem>
 </itemizedlist>
 </listitem>
@@ -306,15 +314,31 @@
 
 </listitem>
 
+<listitem><para> <link linkend="slonyupgrade">How do I upgrade
+&slony1; to a newer version?<linkend> </para> </listitem>
+
 <listitem><para> What happens when I fail over?</para> 
 
-<para> To be written...</para> </listitem>
+<para> Some of this is described under <xref linkend="failover"> but
+more of a procedure should be written...</para> </listitem>
 
 <listitem><para> How do I <quote>move master</quote> to a new node? </para> 
 
-<para> Obviously, use <xref linkend="stmtmoveset">; more details
-should be added...</para>
-</listitem>
+<para> You must first pick a node that is connected to the former
+origin (otherwise it is not straightforward to reverse connections in
+the move to keep everything connected). </para>
+
+<para> Second, you must run a &lslonik; script with the
+command <xref linkend="stmtlockset"> to lock the set on the origin
+node.  Note that at this point you have an application outage under
+way, as what this does is to put triggers on the origin that rejects
+updates. </para>
+
+<para> Now, submit the &lslonik; <xref linkend="stmtmoveset"> request.
+It's perfectly reasonable to submit both requests in the same
+&lslonik; script.  Now, the origin gets switched over to the new
+origin node.  If the new node is a few events behind, it may take a
+little while for this to take place. </para> </listitem>
 
 </itemizedlist>
 </sect1>
Index: monitoring.sgml
===================================================================
RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/monitoring.sgml,v
retrieving revision 1.27
retrieving revision 1.28
diff -Ldoc/adminguide/monitoring.sgml -Ldoc/adminguide/monitoring.sgml -u -w -r1.27 -r1.28
--- doc/adminguide/monitoring.sgml
+++ doc/adminguide/monitoring.sgml
@@ -4,794 +4,9 @@
 
 <indexterm><primary>monitoring &slony1;</primary></indexterm>
 
-<para>Here are some of things that you may find in your &slony1; logs,
-and explanations of what they mean.</para>
-
-<sect2><title>CONFIG notices</title>
-
-<para>These entries are pretty straightforward. They are informative
-messages about your configuration.</para>
-
-<para>Here are some typical entries that you will probably run into in
-your logs:
-
-<screen>
-CONFIG main: local node id = 1
-CONFIG main: loading current cluster configuration
-CONFIG storeNode: no_id=3 no_comment='Node 3'
-CONFIG storePath: pa_server=5 pa_client=1 pa_conninfo="host=127.0.0.1 dbname=foo user=postgres port=6132" pa_connretry=10
-CONFIG storeListen: li_origin=3 li_receiver=1 li_provider=3
-CONFIG storeSet: set_id=1 set_origin=1 set_comment='Set 1'
-CONFIG main: configuration complete - starting threads
-</screen></para></sect2>
-
-<sect2><title>DEBUG Notices</title>
-
-<para>Debug notices are always prefaced by the name of the thread that
-the notice originates from. You will see messages from the following
-threads:
-
-<variablelist>
-<varlistentry><term>localListenThread</term> 
-
-<listitem><para> This is the local thread that listens for events on
-the local node.</para></listitem></varlistentry>
-
-<varlistentry><term>remoteWorkerThread-X</term> 
-
-<listitem><para> The thread processing remote events.  You can expect
-to see one of these for each node that this node communicates
-with.</para></listitem></varlistentry>
-
-<varlistentry><term>remoteListenThread-X</term>
-
-<listitem><para>Listens for events on a remote node database.  You may
-expect to see one of these for each node in the
-cluster.</para></listitem></varlistentry>
-
-<varlistentry><term>cleanupThread</term> <listitem><para> Takes care
-of things like vacuuming, cleaning out the confirm and event tables,
-and deleting old data.</para></listitem></varlistentry>
-
-<varlistentry><term>syncThread</term> <listitem><para> Generates SYNC
-events.</para></listitem></varlistentry>
-
-</variablelist>
-</para>
-</sect2>
-
-<sect2> <title> How to read &slony1; logs </title>
-
-<para> Note that as far as slon is concerned, there is no
-<quote>master</quote> or <quote>slave.</quote> They are just
-nodes. </para>
-
-<para>What you can expect, initially, is to see, on both nodes, some
-events propagating back and forth.  Firstly, there should be some
-events published to indicate creation of the nodes and paths.  If you
-don't see those, then the nodes aren't likely to be able to
-communicate with one another, and nothing else will happen... </para>
-
-<itemizedlist>
-
-<listitem><para>Create the two nodes.</para> 
-
-<para> No slons are running yet, so there are no logs to look
-at.</para>
-
-</listitem>
-
-<listitem><para> Start the two slons</para>
-
-<para> The logs for each will start out very quiet, as neither node
-has much to say, and neither node knows how to talk to another
-node. </para>
-
-</listitem>
-
-<listitem><para> Do the <xref linkend="stmtstorepath"> to set up
-communications paths.  That will allow the nodes to start to become
-aware of one another.</para>
-
-<para> The slon logs should now start to receive events from
-<quote>foreign</quote> nodes.</para>
-
-<para> In version 1.0, <xref linkend="table.sl-listen"> is not set up
-automatically, so things still remain quiet until you explicitly
-submit <command>STORE LISTEN</command> requests. In version 1.1, the
-<quote>listen paths</quote> are set up automatically, which will much
-more quickly get the communications network up and running.  </para>
-
-<para> If you look at the contents of the tables <xref
-linkend="table.sl-node"> and <xref linkend="table.sl-path"> and <xref
-linkend="table.sl-listen">, on each node, that should give a good idea
-as to where things stand.  Until the <xref linkend="slon"> starts,
-each node may only be partly configured.  If there are two nodes,
-there should be two entries in all three of these tables once the
-communications configuration is set up properly.  If there are fewer
-entries than that, well, that should give you some idea of what is
-missing.</para>
-</listitem>
-
-<listitem><para> If needed (<emphasis>e.g.</emphasis> - before version
-1.1), submit <xref linkend="stmtstorelisten"> requests to indicate how
-the nodes will use the communications paths. </para>
-
-<para> Once this has been done, the nodes' logs should show a greater
-level of activity, with events periodically being initiated on one
-node or the other, and propagating to the other. </para>
-</listitem>
-
-<listitem> <para> You'll set up the set (<xref
-linkend="stmtcreateset">), add tables (<xref
-linkend="stmtsetaddtable">), and sequences (<xref
-linkend="stmtsetaddsequence">), and will see relevant events only on
-the origin node for the set. </para></listitem>
-
-<listitem><para> Then, when you submit the <xref
-linkend="stmtsubscribeset"> request, the event should go to both
-nodes. </para>
-
-<para> The origin node has little more to do, after that...  The
-subscriber will then have a <command>COPY_SET</command> event, which
-will lead to logging information about adding each table and copying
-its data.</para></listitem>
-
-</itemizedlist>
-
-<para>After that, you'll mainly see two sorts of behaviour:</para>
-
-<itemizedlist>
-
-<listitem><para> On the origin, there won't be too terribly much
-logged, just indication that some <command>SYNC</command> events are
-being generated and confirmed by other nodes.</para></listitem>
-
-<listitem><para> On the subscriber, there will be reports of
-<command>SYNC</command> events, and that the subscriber pulls data
-from the provider for the relevant set(s).  This will happen
-infrequently if there are no updates going to the origin node; it will
-happen frequently when the origin sees heavy updates. </para>
-</listitem>
-
-</itemizedlist>
-
-<para> WriteMe: I can't decide the format for the rest of this. I
-think maybe there should be a "how it works" page, explaining more
-about how the threads work, what to expect in the logs after you run a
-</para>
-</sect2>
-
-<sect2> <title> Errors and Implications </title>
-
-<itemizedlist>
-<listitem><para><command>remoteWorkerThread_%d: log archive failed %s - %s\n</command> </para> 
-
-<para> This indicates that an error was encountered trying to write a
-log shipping file.  Normally the &lslon; will retry, and hopefully
-succeed. </para> </listitem>
-
-<listitem><para><command>remoteWorkerThread_%d: DDL preparation failed - set %d - only on node %</command></para> 
-<para> Something broke when applying a DDL script on one of the nodes.
-This is quite likely indicates that the node's schema differed from
-that on the origin; you may need to apply a change manually to the
-node to allow the event to proceed.  The scary, scary alternative
-might be to delete the offending event, assuming it can't possibly
-work...</para> </listitem>
-<listitem><para><command>SLON_CONFIG: remoteWorkerThread_%d: DDL request with %d statements</command></para> 
-<para> This is informational, indicating how many SQL statements were processed. </listitem>
-<listitem><para><command>SLON_ERROR: remoteWorkerThread_%d: DDL had invalid number of statements - %d</command></para> 
-
-<para> Occurs if there were < 0 statements (which should be impossible) or > MAXSTATEMENTS statements.  Probably the script was bad...</listitem>
-
-<listitem><para><command>ERROR: remoteWorkerThread_%d: malloc()
-failure in DDL_SCRIPT - could not allocate %d bytes of
-memory</command></para>
-
-<para> This should only occur if you submit some extraordinarily large
-DDL script that makes a &lslon; run out of memory </para></listitem>
-
-<listitem><para><command>CONFIG: remoteWorkerThread_%d: DDL Statement %d: [%s]</command></para> 
-
-<para> This lists each DDL statement as it is submitted. </listitem>
-<listitem><para><command>ERROR: DDL Statement failed - %s</command></para> 
-
-<para> Oh, dear, one of the DDL statements that worked on the origin
-failed on this remote node... </para></listitem>
-
-<listitem><para><command>CONFIG: DDL Statement success - %s</command></para> 
-
-<para> All's well...  </para></listitem>
-
-<listitem><para><command>ERROR: remoteWorkerThread_%d: Could not generate DDL archive tracker %s - %s</command></para> 
-
-<para> Apparently the DDL script couldn't be written to a log shipping file... </para></listitem>
-
-<listitem><para><command>ERROR: remoteWorkerThread_%d: Could not submit DDL script %s - %s</command></para> 
-
-<para>Couldn't write the script to a log shipping file.</para> </listitem>
-
-<listitem><para><command>ERROR: remoteWorkerThread_%d: Could not close DDL script %s - %s</command></para> 
-
-<para>Couldn't close a log shipping file for a DDL script.</para> </listitem>
-
-<listitem><para><command>FATAL: remoteWorkerThread_%d: pthread_create() - %s</command></para> 
-
-<para> Couldn't create a new remote worker thread. </para>
-</listitem>
-<listitem><para><command>DEBUG1 remoteWorkerThread_%d: helper thread for provider %d created</command></para> 
-
-<para> This normally happens when the &lslon; starts: a thread is created for each node to which the local node should be listening for events.</para></listitem>
-
-<listitem><para><command>DEBUG4: remoteWorkerThread_%d: added active set %d to provider %d</command></para> 
-
-<para> Indicates that this set is being provided by this
-provider.</para></listitem>
-
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d: helper thread for provider %d terminated </command></para> 
-
-<para> If subscriptions reshape such that a node no longer provides a
-subscription, then the thread that works on that node can be
-dropped. </para></listitem>
-
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d: disconnecting
-from data provider %d </command></para>
-
-<para> A no-longer-used data provider may be dropped; if connection
-information is changed, the &lslon; needs to disconnect and
-reconnect. </para></listitem>
-
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: ignore new events due to shutdown</command></para> 
-
-<para> If the &lslon; is shutting down, it is futile to process more events</listitem>
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: event %d ignored - unknown origin</command></para> 
-
-<para> Probably happens if events arrive before
-the <command>STORE_NODE</command> event that tells that the new node
-now exists... </para></listitem>
-
-<listitem><para><command>WARN: remoteWorkerThread_%d: event %d ignored - origin inactive</command></para> 
-<para> This shouldn't occur now (2006) as we don't support the notion of deactivating a node... </para>
-</listitem>
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: event %d ignored - duplicate </command></para>
-
-<para> This might be expected to happen if the event notification
-comes in concurrently from two sources... </para> </listitem>
-
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: unknown node %d</command></para> 
-
-<para> Happens if the &lslon; is unaware of this node; probably a sign
-of <command>STORE_NODE</command> requests not
-propagating... </para> </listitem>
-
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d: node %d - no worker thread</command></para> 
-
-<para> Curious: we can't wake up the worker thread; there probably
-should already be one... </para></listitem>
-
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: forward confirm %d,%s received by %d</command></para> 
-
-<para> These events should occur frequently and routinely as nodes report confirations of the events they receive. </para></listitem>
-
-<listitem><para><command>DEBUG1: copy_set %d</command></para> 
-
-<para> This indicates the beginning of copying data for a new subscription. </para></listitem>
-
-<listitem><para><command>ERROR: remoteWorkerThread_%d: set %d not found in runtime configuration </command></para> 
-
-<para> &lslon; tried starting up a subscription; it couldn't find conninfo for the data source.  Perhaps paths are not properly propagated?</para></listitem>
-
-<listitem><para><command>ERROR: remoteWorkerThread_%d: node %d has no pa_conninfo </command></para> 
-
-<para> Apparently the conninfo configuration
-was <emphasis>wrong</emphasis>...</para></listitem>
-
-<listitem><para><command>ERROR: copy set %d cannot connect to provider DB node %d </command></para> 
-
-<para> &lslon; couldn't connect to the provider.  Is the conninfo
-wrong?  Or perhaps authentication is misconfigured?  Or perhaps the
-database is down?</para></listitem>
-
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d: connected to provider DB </command></para> 
-
-<para> Excellent: the copy set has a connection to its provider</para></listitem>
-<listitem><para><command>ERROR: remoteWorkerThread_%d: Could not open COPY SET archive file %s - %s</command></para> 
-
-<para> Seems pretty self-explanatory... </para></listitem>
-<listitem><para><command>ERROR: remoteWorkerThread_%d: Could not generate COPY SET archive header %s - %s</command></para> 
-
-<para> Probably means that we just filled up a filesystem... </para></listitem>
-
-<listitem><para><command>WARN: remoteWorkerThread_%d: transactions
-earlier than XID %s are still in progress</command></para>
-
-<para> This indicates that some old transaction is in progress from before the earliest available <command>SYNC</command> on the provider.  &slony1; cannot start replicating until that transaction completes.  This will repeat until thetransaction completes...
-</para></listitem>
-
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: prepare to copy table %s </command></para> 
-
-<para> This indicates that &lslon; is beginning preparations to set up subscription for a table.</para></listitem>
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d: table %s will require Slony-I serial key</command></para> 
-
-<para> Evidently this is a table defined with <xref linkend="stmttableaddkey"> where &slony1; has to add a surrogate primary key.</listitem>
-<listitem><para><command>ERROR: remoteWorkerThread_%d: Could not lock table %s on subscriber</command></para> 
-
-<para> For whatever reason, the table could not be locked, so the
-subscription needs to be restarted.  If the problem was something like
-a deadlock, retrying may help.  If the problem was otherwise, you may
-need to intervene...</para></listitem>
-
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: all tables for set %d found on subscriber</command></para> 
-
-<para> An informational message indicating that the first pass through the tables found no problems... </para></listitem>
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: copy sequence %s</command></para> 
-
-<para> Processing some sequence... </para></listitem>
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: copy table %s</command></para> 
-
-<para> &lslon; is starting to copy a table... </para></listitem>
-<listitem><para><command>DEBUG3: remoteWorkerThread_%d: table %s Slony-I serial key added local</command></para> 
-
-<para> Just added new column to the table to provide surrogate primary key.</listitem>
-<listitem><para><command>DEBUG3: remoteWorkerThread_%d: local table %s already has Slony-I serial key</command></para> 
-
-<para> Did not need to add serial key; apparently it was already there.</listitem>
-<listitem><para><command>DEBUG3: remoteWorkerThread_%d: table %s does not require Slony-I serial key</command></para> 
-
-<para> Apparently this table didn't require a special serial key... </para>
-</listitem>
-<listitem><para><command>DEBUG3: remoteWorkerThread_%d: table %s Slony-I serial key added local</command></para> </listitem>
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: Begin COPY of table %s</command></para> 
-
-<para> &lslon; is about to start the COPY on both sides to copy a table... </para></listitem>
-<listitem><para><command>ERROR: remoteWorkerThread_%d: Could not generate copy_set request for %s - %s</command></para> 
-
-<para> This indicates that the <command>delete/copy</command> requests
-failed on the subscriber.  The &lslon; will repeat
-the <command>COPY_SET</command> attempt; it will probably continue to
-fail.. </para> </listitem>
-
-<listitem><para><command>ERROR: remoteWorkerThread_%d: copy to stdout on provider - %s %s</command></para> 
-
-<para> Evidently something about the COPY to <filename>stdout</filename> on the provider node broke...  The event will be retried... </para> </listitem>
-
-<listitem><para><command>ERROR: remoteWorkerThread_%d: copy from stdin on local node - %s %s</command></para> 
-
-<para> Evidently something about the COPY into the table on the
-subscriber node broke...  The event will be
-retried... </para> </listitem>
-
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: %d bytes copied for table %s</command></para> 
-
-<para> This message indicates that the COPY of the table has
-completed.  This is followed by running <command>ANALYZE</command> and
-reindexing the table on the subscriber.</para></listitem>
-
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: %.3f seconds
-to copy table %s</command></para>
-
-<para> After this message, copying and reindexing and analyzing the table on the subscriber is complete.</para></listitem>
-
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: set last_value of sequence %s (%s) to %s</command></para> 
-
-<para> As should be no surprise, this indicates that a sequence has been processed on the subscriber.</para>
-</listitem>
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: %.3 seconds to opy sequences</command></para> 
-
-<para> Summarizing the time spent processing sequences in the <command>COPY_SET</command> event. </para></listitem>
-
-<listitem><para><command>ERROR: remoteWorkerThread_%d: query %s did not return a result </command></para> 
-
-<para> This indicates that the query, as part of final processing of <command>COPY_SET</command>, failed.  The copy will restart... </para></listitem>
-
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: copy_set no previous SYNC found, use enable event</command></para> 
-
-<para> This takes place if no past SYNC event was found; the current
-event gets set to the event point of
-the <command>ENABLE_SUBSCRIPTION</command> event.
-</para> </listitem>
-
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: copy_set SYNC found, use event seqno %s</command></para> 
-
-<para> This takes place if a SYNC event was found; the current
-event gets set as shown. </para> </listitem>
-
-<listitem><para><command>ERROR: remoteWorkerThread_%d: sl_setsyn entry for set %d not found on provider</command></para> 
-
-<para> SYNC synchronization information was expected to be drawn from
-an existing subscriber, but wasn't found.  Something
-replication-breakingly-bad has probably
-happened... </para> </listitem>
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d: could not insert to sl_setsync_offline</command></para> 
-
-<para> Oh, dear.  After setting up a subscriber, and getting pretty
-well everything ready, some writes to a log shipping file failed.
-Perhaps disk filled up...  </para></listitem>
-
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d: %.3f seconds to build initial setsync status</command></para> 
-
-<para> Indicates the total time required to get the copy_set event finalized...</para>
-</listitem>
-
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d: disconnected from provider DB</command></para> 
-
-<para> At the end of a subscribe set event, the subscriber's &lslon;
-will disconnect from the provider, clearing out
-connections... </listitem>
-
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d: copy_set %d done in %.3f seconds</command></para> 
-
-<para> Indicates the total time required to complete copy_set...  This indicates a successful subscription!</para>
-</listitem>
-
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d: SYNC %d processing</command></para> 
-
-<para> This indicates the start of processing of a <command>SYNC</command> </para> </listitem>
-
-<listitem><para><command>ERROR: remoteWorkerThread_%d: No pa_conninfo
-for data provider %d</command></para>
-
-<para> Oh dear, we haven't connection information to connect to the
-data provider.  That shouldn't be possible,
-normally...</para></listitem>
-
-<listitem><para><command>ERROR: remoteWorkerThread_%d: cannot connect to data provider %d on 'dsn'</command></para>
-
-<para> Oh dear, we haven't got <emphasis>correct</emphasis> connection
-information to connect to the data provider.</para></listitem>
-
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d: connected to data provider %d on 'dsn'</command></para> 
-
-<para> Excellent; the &lslon; has connected to the provider. </para> </listitem>
-
-<listitem><para><command>WARN: remoteWorkerThread_%d: don't know what ev_seqno node %d confirmed for ev_origin %d</command></para> 
-
-<para> There's no confirmation information available for this node's provider; need to abort the <command>SYNC</command> and wait a bit in hopes that that information will emerge soon...</para></listitem>
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d: data provider %d only confirmed up to ev_seqno %d for ev_origin %d </command></para> 
-
-<para> The provider for this node is a subscriber, and apparently that subscriber is a bit behind.  The &lslon; will need to wait for the provider to catch up until it has <emphasis> new</emphasis> data. </para></listitem>
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: data provider %d confirmed up to ev_seqno %s for ev_origin %d - OK</command></para> 
-
-<para> All's well; the provider should have the data that the subscriber needs...</para></listitem>
-
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: syncing set %d with %d table(s) from provider %d</command></para> 
-<para> This is declaring the plans for a <command>SYNC</command>: we have a set with some tables to process.</para></listitem>
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: ssy_action_list value: %s length: %d</command></para> 
-
-<para> This portion of the query to collect log data to be applied has been known to <quote>bloat up</quote>; this shows how it has gotten compressed... </para></listitem>
-
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: writing archive log...</command></para> 
-
-<para> This indicates that a log shipping archive log is being written for a particular <command>SYNC</command> set. </para></listitem>
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: Didn't add OR to provider</command></para> 
-
-<para> This indicates that there wasn't anything in a <quote>provider</quote> clause in the query to collect log data to be applied, which shouldn't be.  Things are quite likely to go bad at this point... </para></listitem>
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: no sets need syncing for this event</command></para> 
-
-<para>This will be the case for all <command>SYNC</command> events generated on nodes that are not originating replication sets.  You can expect to see these messages reasonably frequently. </para></listitem>
-<listitem><para><command>ERROR: remoteWorkerThread_%d: cannot determine current log status</command></para> 
-
-<para> The attempt to read from sl_log_status, which determines
-whether we're working on <envar>sl_log_1</envar>
-or <envar>sl_log_2</envar> got no results; that can't be a good thing,
-as there certainly should be data here...  Replication is likely about
-to halt...</para> </listitem>
-
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: current local log_status is %d</command></para> 
-<para> This indicates which of <envar>sl_log_1</envar> and <envar>sl_log_2</envar> are being used to store replication data. </para> 
-</listitem>
-<listitem><para><command>DEBUG3: remoteWorkerThread_%d: activate helper %d</command></para> 
-
-<para> We're about to kick off a thread to help process <command>SYNC</command> data...</para></listitem>
-
-<listitem><para><command>DEBUG4: remoteWorkerThread_%d: waiting for log data</command></para> 
-
-<para> The thread is waiting to get data to consume (e.g. - apply to the replica).</para></listitem>
-
-<listitem><para><command>ERROR: remoteWorkerThread_%d: %s %s - qualification was %s </command></para> 
-
-<para> Apparently an application of replication data to the subscriber failed...  This quite likely indicates some sort of serious corruption.</para></listitem>
-<listitem><para><command>ERROR: remoteWorkerThread_%d: replication query did not affect one row (cmdTuples = %s) - query was: %s qualification was: %s</command></para> 
-
-<para> If <envar> SLON_CHECK_CMDTUPLES</envar> is set, &lslon; applies
-changes one tuple at a time, and verifies that each change affects
-exactly one tuple.  Apparently that wasn't the case here, which
-suggests a corruption of replication.  That's a rather bad
-thing...</para></listitem>
-
-<listitem><para><command>ERROR: remoteWorkerThread_%d: SYNC aborted</command></para> 
-
-<para> If any errors have been encountered that haven't already aborted the <command>SYNC</command>, this catches and aborts it.</para></listitem>
-
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: new sl_rowid_seq value: %s</command></para> 
-
-<para> This marks the progression of this internal &slony1; sequence. </para></listitem>
-<listitem><para><command>INFO: remoteWorkerThread_%d: Run Archive Command %s</command></para> 
-
-<para> If &lslon; has been configured to run a command after generating each log shipping archive log, this reports when that process is spawned using <function>system()</function>. </para> </listitem>
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d: SYNC %d done in %.3f seconds</command></para> 
-
-<para> This indicates the successful completion of a <command>SYNC</command>.  Hurray!</para></listitem>
-
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d_d:%.3f seconds delay for first row </command></para> 
-
-<para> This indicates how long it took to get the first row from the LOG cursor that pulls in data from the sl_log tables.</para> </listitem>
-<listitem><para><command>ERROR: remoteWorkerThread_%d_d: large log_cmddata for actionseq %s not found</command></para> 
-
-<para> &lslon; could not find the data for one of the <quote>very large</quote> sl_log table tuples that are pulled individually.  This shouldn't happen.</para></listitem>
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d_d:%.3f seconds until close cursor </command></para> 
-
-<para> This indicates how long it took to complete reading data from the LOG cursor that pulls in data from the sl_log tables.</para> </listitem>
-<listitem><para><command>DEBUG2: remoteWorkerThread_%d_d: inserts=%d updates=%d deletes=%d </command></para> 
-
-<para> This reports how much activity was recorded in the current <command>SYNC</command> set. </para> </listitem>
-
-<listitem><para><command>DEBUG3: remoteWorkerThread_%d: compress_actionseq(list,subquery) Action list: %s</command></para> 
-
-<para> This indicates a portion of the LOG cursor query that is about to be compressed.  (In some cases, this could grow to <emphasis>enormous</emphasis> size, blowing up the query parser...) </para></listitem>
-<listitem><para><command>DEBUG3: remoteWorkerThread_%d: compressed actionseq subquery %s</command></para> 
-
-<para> This indicates what that subquery compressed into. </para></listitem>
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d: </command></para> </listitem>
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d: </command></para> </listitem>
-<listitem><para><command>DEBUG1: remoteWorkerThread_%d: </command></para> </listitem>
-
-<listitem><para><command>ERROR: Slonik version: @MODULEVERSION@ != Slony-I version in PG build %</command></para> 
-
-<para> This is raised in <function>checkmoduleversion()</function> if there is a mismatch between the version of &slony1; as reported by &lslonik; versus what the &postgres; build has.</para></listitem>
-<listitem><para><command>ERROR: Slony-I: registry key % is not an int4 value</command></para> 
-
-<para> Raised in <function>registry_get_int4()</function>, this complains if a requested value turns out to be NULL.</listitem>
-<listitem><para><command>ERROR: registry key % is not a text value</command></para> 
-
-<para> Raised in <function>registry_get_text()</function>, this complains if a requested value turns out to be NULL.</listitem>
-<listitem><para><command>ERROR: registry key % is not a timestamp value</command></para> 
-
-<para> Raised in <function>registry_get_timestamp()</function>, this complains if a requested value turns out to be NULL.</listitem>
-<listitem><para><command>NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=%</command></para> 
-
-<para> This will occur when a &lslon; starts up after another has crashed; this is routine cleanup.</para></listitem>
-<listitem><para><command>ERROR: Slony-I: This node is already initialized</command></para> 
-
-<para> This would typically happen if you submit <xref linkend="stmtstorenode"> gainst a node that has already been set up with the &slony1; schema. </para></listitem>
-<listitem><para><command>ERROR: Slony-I: node % not found</command></para> 
-
-<para> An attempt to mark a node not listed locally as enabled should fail...</para></listitem>
-<listitem><para><command>ERROR: Slony-I: node % is already active</command></para> 
-
-<para> An attempt to mark a node already marked as active as active should fail...</para></listitem>
-<listitem><para><command>ERROR: Slony-I: DROP_NODE cannot initiate on the dropped node</command></para> 
-
-<para> You need to have an EVENT NODE other than the node that is to be dropped.... </para></listitem>
-
-<listitem><para><command>ERROR: Slony-I: Node % is still configured as a data provider</command></para> 
-
-<para> You cannot drop a node that is in use as a data provider; you
-need to reshape subscriptions so no nodes are dependent on it first. </para></listitem>
-<listitem><para><command>ERROR: Slony-I: Node % is still origin of one or more sets</command></para> 
-
-<para> You can't drop a node if it is the origin for a set!  Use <xref linkend="stmtmoveset"> or <xref linkend="stmtfailover"> first. </para></listitem>
-
-<listitem><para><command>ERROR: Slony-I: cannot failover - node % has no path to the backup node</command></para> 
-
-<para> You cannot failover to a node that isn't connected to all the subscribers, at least indirectly. </para></listitem>
-<listitem><para><command>ERROR: Slony-I: cannot failover - node % is not subscribed to set %</command></para> 
-
-<para> You can't failover to a node that doesn't subscribe to all the relevant sets. </para></listitem>
-
-<listitem><para><command>ERROR: Slony-I: cannot failover - subscription for set % is not active</command></para> <para> If the subscription has been set up, but isn't yet active, that's still no good.</para></listitem>
-
-<listitem><para><command>ERROR: Slony-I: cannot failover - node % is not a forwarder of set %</command></para> 
-
-<para> You can only failover or move a set to a node that has
-forwarding turned on. </para></listitem>
-
-<listitem><para><command>NOTICE: failedNode: set % has no other direct receivers - move now</command></para> 
-
-<para> If the backup node is the only direct subscriber, then life is a bit simplified...  No need to reshape any subscriptions!</para></listitem>
-<listitem><para><command>NOTICE: failedNode set % has other direct receivers - change providers only</command></para> 
-<para> In this case, all direct subscribers are pointed to the backup node, and the backup node is pointed to receive from another node so it can get caught up.</para></listitem>
-<listitem><para><command>NOTICE: Slony-I: Please drop schema _ at CLUSTERNAME@</command></para> 
-
-<para> A node has been uninstalled; you may need to drop the schema... </para></listitem>
-<listitem><para><command>ERROR: Slony-I: setAddTable_int(): table % has no index %</command></para> 
-
-<para> Apparently a PK index was specified that is absent on this node...</para></listitem>
-<listitem><para><command>ERROR: Slony-I setAddTable_int(): table % not found</command></para> 
-
-<para> Table wasn't found on this node; did you load the schema in properly?. </para></listitem>
-<listitem><para><command>ERROR: Slony-I setAddTable_int(): table % is not a regular table</command></para> 
-
-<para> You tried to replicate something that isn't a table; you can't do that! </para></listitem>
-<listitem><para><command>NOTICE: Slony-I setAddTable_int(): table % PK column % nullable</command></para> 
-
-<para> You tried to replicate a table where one of the columns in the would-be primary key is allowed to be null.  All PK columns must be <command>NOT NULL.</command> This request is about to fail. </para></listitem>
-<listitem><para><command>ERROR: Slony-I setAddTable_int(): table % not replicable!</command></para> 
-
-<para> 
-This happens because of the NULLable PK column. </para></listitem>
-<listitem><para><command>ERROR: Slony-I setAddTable_int(): table id % has already been assigned!</command></para> 
-
-<para> The table ID value needs to be assigned uniquely
-in <xref linkend="stmtsetaddtable">; apparently you requested a value
-already in use.
-</para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setDropTable(): table % not found</command></para> 
-
-<para> Table wasn't found on this node; are you sure you had the ID right? </para></listitem>
-<listitem><para><command>ERROR: Slony-I setDropTable(): set % not found</command></para> 
-
-<para> The replication set wasn't found on this node; are you sure you had the ID right? </para></listitem>
-<listitem><para><command>ERROR: Slony-I setDropTable(): set % has remote origin</command></para> 
-
-<para> The replication set doesn't originate on this node; you probably need to specify an <command>EVENT NODE</command> in the <xref linkend="stmtsetdroptable"> command. </para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setAddSequence(): set % not found</command></para> 
-<para> Apparently the set you requested is not available...</para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setAddSequence(): set % has remote origin</command></para> 
-<para> You may only add things at the origin node.</para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setAddSequence(): cannot add sequence to currently subscribed set %</command></para> 
-<para> Apparently the set you requested has already been subscribed.  You cannot add tables/sequences to an already-subscribed set.  You will need to create a new set, add the objects to that new set, and set up subscriptions to that.</para></listitem>
-<listitem><para><command>ERROR: Slony-I setAddSequence_int(): set % not found</command></para> 
-<para> Apparently the set you requested is not available...</para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setAddSequence_int(): sequence % not found</command></para> 
-<para> Apparently the sequence you requested is not available on this node.  How did you set up the schemas on the subscribers???</para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setAddSequence_int(): % is not a sequence</command></para> 
-<para> Seems pretty obvious :-).</para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setAddSequence_int(): sequence ID % has already been assigned</command></para> 
-<para> Each sequence ID added must be unique; apparently you have reused an ID.</para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setDropSequence_int(): sequence % not found</command></para> 
-<para> Could this sequence be in another set?</para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setDropSequence_int(): set % not found</command></para> 
-<para> Could you have gotten the set ID wrong?</para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setDropSequence_int(): set % has origin at another node - submit this to that node</command></para> 
-
-<para> This message seems fairly self-explanatory...</para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setMoveTable_int(): table % not found</command></para> 
-
-<para> Table wasn't found on this node; you probably gave the wrong ID number... </para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setMoveTable_int(): set ids cannot be identical</command></para> 
-
-<para> Does it make sense to move a table from a set into the very same set? </para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setMoveTable_int(): set % not found</command></para> 
-
-<para> Set wasn't found on this node; you probably gave the wrong ID number... </para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setMoveTable_int(): set % does not originate on local node</command></para> 
-
-<para> Set wasn't found to have origin on this node; you probably gave the wrong EVENT NODE... </para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setMoveTable_int(): subscriber lists of set % and % are different</command></para> 
-
-<para> You can only move objects between sets that have identical subscriber lists. </para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setMoveSequence_int(): sequence % not found</command></para> 
-
-<para> Sequence wasn't found on this node; you probably gave the wrong ID number... </para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setMoveSequence_int(): set ids cannot be identical</command></para> 
-
-<para> Does it make sense to move a sequence from a set into the very same set? </para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setMoveSequence_int(): set % not found</command></para> 
-
-<para> Set wasn't found on this node; you probably gave the wrong ID number... </para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setMoveSequence_int(): set % does not originate on local node</command></para> 
-
-<para> Set wasn't found to have origin on this node; you probably gave the wrong EVENT NODE... </para></listitem>
-
-<listitem><para><command>ERROR: Slony-I setMoveSequence_int(): subscriber lists of set % and % are different</command></para> 
-
-<para> You can only move objects between sets that have identical subscriber lists. </para></listitem>
-
-<listitem><para><command>Slony-I: sequenceSetValue(): sequence % not found</command></para> 
-
-<para> Curious; the sequence object is missing.  Could someone have dropped it from the schema by hand (e.g. - not using <xref linkend="stmtddlscript">)?</listitem>
-
-<listitem><para><command>ERROR: Slony-I ddlScript_prepare(): set % not found</command></para> 
-
-<para> Set wasn't found on this node; you probably gave the wrong ID number... </para></listitem>
-
-<listitem><para><command>ERROR: Slony-I ddlScript_prepare_int(): set % not found</command></para> 
-
-<para> Set wasn't found on this node; you probably gave the wrong ID number... </para></listitem>
-
-<listitem><para><command>ERROR: Slony-I: alterTableForReplication(): Table with id % not found</command></para> 
-<para> Apparently the table wasn't found; could the schema be messed up? </para> </listitem>
-
-<listitem><para><command>ERROR: Slony-I: alterTableForReplication(): Table % is already in altered state</command></para> 
-
-<para> Curious...  We're trying to set a table up for replication
-a <emphasis>second</emphasis> time?  </para> </listitem>
-
-<listitem><para><command>NOTICE: Slony-I: alterTableForReplication(): multiple instances of trigger % on table %'',</command></para> 
-
-<para> This normally happens if you have a table that had a trigger attached to it that replication hid due to this being a subscriber node, and then you added a trigger by the same name back to replication.  Now, when trying to "fix up" triggers, those two triggers conflict. </para>
-
-<para> The DDL script will keep running and rerunning, or the UNINSTALL NODE will keep failing, until you drop the <quote>visible</quote> trigger, by hand, much as you must have added it, by hand, earlier. </para>
-</listitem>
-<listitem><para><command>ERROR: Slony-I: Unable to disable triggers</command></para> 
-<para> This is the error that follows the <quote>multiple triggers</quote> problem. </para></listitem>
-<listitem><para><command>ERROR: Slony-I: alterTableRestore(): Table with id % not found</command></para> 
-
-<para> This runs when a table is being restored to <quote>non-replicated</quote> state; apparently the replicated table wasn't found.</para></listitem>
-
-<listitem><para><command>ERROR: Slony-I: alterTableRestore(): Table % is not in altered state</command></para> 
-
-<para> Hmm.  The table isn't in altered replicated state.  That shouldn't be, if replication had been working properly...</para> </listitem>
-
-<listitem><para><command>ERROR: Slony-I: subscribeSet() must be called on provider</command></para> 
-<para> This function should only get called on the provider node.  &lslonik; normally handles this right, unless one had wrong DSNs in a &lslonik; script...</para>
-</listitem>
-
-<listitem><para><command>ERROR: Slony-I: subscribeSet(): set % not found</command></para> 
-<para> Hmm.  The provider node isn't aware of this set.  Wrong parms to a &lslonik; script?</para> </listitem>
-
-<listitem><para><command>ERROR: Slony-I: subscribeSet(): set origin and receiver cannot be identical</command></para> 
-<para> Duh, an origin node can't subscribe to itself.</para> </listitem>
-
-<listitem><para><command>ERROR: Slony-I: subscribeSet(): set provider and receiver cannot be identical</command></para> 
-<para> A receiver must subscribe to a <emphasis>different</emphasis> node...</para> </listitem>
-<listitem><para><command>Slony-I: subscribeSet(): provider % is not an active forwarding node for replication set %</command></para> 
-
-<para> You can only use a live, active, forwarding provider as a data
-source.  </para></listitem>
-
-<listitem><para>Slony-I: subscribeSet_int(): set % is not active, cannot change provider<command></command></para> 
-<para> You can't change the provider just yet...</para></listitem>
-<listitem><para><command>Slony-I: subscribeSet_int(): set % not found</command></para> 
-<para> This node isn't aware of the set...  Perhaps you submitted wrong parms?</para></listitem>
-<listitem><para><command>Slony-I: unsubscribeSet() must be called on receiver</command></para> 
-<para> Seems obvious...  This probably indicates a bad &lslonik; admin DSN...</para></listitem>
-<listitem><para><command>Slony-I: Cannot unsubscribe set % while being provider</command></para> 
-
-<para> This should seem obvious; <xref linkend="stmtunsubscribeset"> will fail if a node has dependent subscribers for which it is the provider </para> </listitem>
-
-<listitem><para><command>Slony-I: cleanupEvent(): Single node - deleting events < %</command></para> 
-<para> If there's only one node, the cleanup event will delete old events so that you don't get <quote>build-up of crud.</quote></para></listitem>
-<listitem><para><command>Slony-I: tableAddKey(): table % not found</command></para> 
-<para> Perhaps you didn't copy the schema over properly?</para></listitem>
-<listitem><para><command>Slony-I: tableDropKey(): table with ID% not found</command></para> 
-<para> Seems curious; you were presumably replicating to this table, so for this to be gone seems rather odd...</para></listitem>
-<listitem><para><command>Slony-I: determineIdxnameUnique(): table % not found</command></para> 
-
-<para>Did you properly copy over the schema to a new node???</para></listitem>
-<listitem><para><command>Slony-I: table % has no primary key</command></para> 
-
-<para> This likely signifies a bad loading of schema...</para></listitem>
-
-<listitem><para><command>Slony-I: table % has no unique index %</command></para> 
-
-<para> This likely signifies a bad loading of schema...</para></listitem>
-
-<listitem><para><command>Slony-I: Logswitch to sl_log_2 initiated'</command></para> 
-<para> Indicates that &lslon; is in the process of switching over to this log table.</para></listitem>
-<listitem><para><command>Slony-I: Logswitch to sl_log_1 initiated'</command></para> 
-<para> Indicates that &lslon; is in the process of switching over to this log table.</para></listitem>
-<listitem><para><command>Previous logswitch still in progress</command></para> 
-
-<para> An attempt was made to do a log switch while one was in progress...</para></listitem>
-
-</itemizedlist>
-</sect2>
-
 <sect2> <title> &nagios; Replication Checks </title>
 
-<indexterm><primary>&nagios; for monitoring
-replication</primary></indexterm>
+<indexterm><primary>&nagios; for monitoring replication</primary></indexterm>
 
 <para> The script in the <filename>tools</filename> directory called
 <command> pgsql_replication_check.pl </command> represents some of the
@@ -883,6 +98,8 @@
 
 <sect2 id="testslonystate"> <title> test_slony_state</title>
 
+<indexterm><primary>script test_slony_state to test replication state</primary></indexterm>
+
 <para> This script is in preliminary stages, and may be used to do
 some analysis of the state of a &slony1; cluster.</para>
 
Index: ddlchanges.sgml
===================================================================
RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/ddlchanges.sgml,v
retrieving revision 1.24
retrieving revision 1.25
diff -Ldoc/adminguide/ddlchanges.sgml -Ldoc/adminguide/ddlchanges.sgml -u -w -r1.24 -r1.25
--- doc/adminguide/ddlchanges.sgml
+++ doc/adminguide/ddlchanges.sgml
@@ -40,42 +40,45 @@
 statements, as the script is already executed inside a transaction.
 In &postgres; version 8, the introduction of nested transactions
 changes this somewhat, but you must still remain aware that the
-actions in the script are wrapped inside a single
-transaction.</para></listitem>
+actions in the script are processed within the scope of a single
+transaction whose <command>BEGIN</command> and <command>END</command>
+you do not control.</para></listitem>
 
 <listitem><para>If there is <emphasis>anything</emphasis> broken about
 the script, or about how it executes on a particular node, this will
 cause the <xref linkend="slon"> daemon for that node to panic and
-crash. If you restart the node, it will, more likely than not, try to
+crash.  You may see various expected messages (positive and negative)
+in <xref linkend="ddllogs">.  If you restart the &lslon;, it will,
+more likely than not, try to
 <emphasis>repeat</emphasis> the DDL script, which will, almost
-certainly, fail the second time just as it did the first time.  I have
-found this scenario to lead to a need to go to the
-<quote>master</quote> node to delete the event to stop it from
-continuing to fail.</para>
+certainly, fail the second time in the same way it did the first time.
+I have found this scenario to lead to a need to go to the
+<quote>master</quote> node to delete the event from the
+table <envar>sl_event</envar> in order to stop it from continuing to
+fail.</para>
 
 <para> The implication of this is that it is
 <emphasis>vital</emphasis> that modifications not be made in a
 haphazard way on one node or another.  The schemas must always stay in
 sync.</para> </listitem>
 
-<listitem><para> For <application>slon</application> to, at that
-point, <quote>panic</quote> is probably the
+<listitem><para> For &lslon; to, at that point, <quote>panic</quote>
+is probably the
 <emphasis>correct</emphasis> answer, as it allows the DBA to head over
 to the database node that is broken, and manually fix things before
 cleaning out the defective event and restarting
-<application>slon</application>.  You can be certain that the updates
+&lslon;.  You can be certain that the updates
 made <emphasis>after</emphasis> the DDL change on the provider node
 are queued up, waiting to head to the subscriber.  You don't run the
 risk of there being updates made that depended on the DDL changes in
 order to be correct.</para></listitem>
 
 <listitem><para> When you run <xref linkend="stmtddlscript">, this
-causes the <application>slonik</application> to request, <emphasis>for
-each table in the specified set</emphasis>, an exclusive table
-lock.</para>
+causes the &lslonik; to request, <emphasis>for each table in the
+specified set</emphasis>, an exclusive table lock.</para>
 
-<para> It starts by requesting the lock, and altering the table to
-remove &slony1; triggers:
+<para> It starts by requesting the lock, altering the table to remove
+&slony1; triggers, and restoring any triggers that had been hidden:
 
 <screen>
 BEGIN;
@@ -115,7 +118,14 @@
 
 <para> If a particular DDL script only affects one table, it should be
 unnecessary to lock <emphasis>all</emphasis> application
-tables.</para></listitem>
+tables.</para>
+
+<note><para> Actually, as of version 1.1.5 and later, this
+is <emphasis>NOT TRUE.</emphasis> The danger of someone making DDL
+changes that crosses replication sets seems sufficiently palpable that
+&lslon; has been changed to lock <emphasis>ALL</emphasis> replicated
+tables, whether they are in the specified replication set or
+not. </para></note></listitem>
 
 <listitem><para> You may need to take a brief application outage in
 order to ensure that your applications are not demanding locks that
@@ -124,18 +134,18 @@
 
 <listitem><para> In &slony1; versions 1.0 thru 1.1.5, the script is
 processed as a single query request, which can cause problems if you
-are making complex changes.  In version 1.2, the script is parsed into
-individual SQL statements, and each statement is submitted separately,
-which is a preferable handling of this.  </para>
+are making complex changes.  Starting in version 1.2, the script is
+properly parsed into individual SQL statements, and each statement is
+submitted separately, which is a preferable handling of this.  </para>
 
 <para> The trouble with one query processing a <quote>compound
-statement</quote> is that the SQL parser does its processing for that
+statement</quote> is that the SQL parser does its planning for that
 entire set of queries based on the state of the database at the
 <emphasis>beginning</emphasis> of the query.</para>   
 
 <para> This causes no particular trouble if those statements are
-independent of one another, such as if you add two columns to a
-table.</para>
+independent of one another, such as if you have two statements to add
+two columns to a table.</para>
 
 <para> <command> alter table t1 add column c1 integer; alter table t1 add
 column c2 integer; </command></para>
@@ -206,10 +216,9 @@
 
 <para> If it <emphasis>does</emphasis> matter that the object be
 propagated at the same location in the transaction stream on all the
-nodes, then you but no tables need to be locked, then you might create
-a replication set that contains <emphasis>no</emphasis> tables,
-subscribe all the appropriate nodes to it, and use <command>EXECUTE
-SCRIPT</command>, specifying that <quote>empty set.</quote></para></listitem>
+nodes, then you but no tables need to be locked, then you need to use
+<command>EXECUTE SCRIPT</command>, locking challenges or
+no.</quote></para></listitem>
 
 <listitem><para> You may want an extra index on some replicated
 node(s) in order to improve performance there.</para>
@@ -264,7 +273,7 @@
 at the end, so that the would-be changes roll back.</para>
 
 <para> If this script works OK on all of the nodes, that suggests that
-it should work fine everywhere if executed via Slonik.  If problems
+it should work fine everywhere if executed via &lslonik;.  If problems
 are encountered on some nodes, that will hopefully allow you to fix
 the state of affairs on those nodes so that the script
 <emphasis>will</emphasis> run without error.
Index: logshipping.sgml
===================================================================
RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/logshipping.sgml,v
retrieving revision 1.14
retrieving revision 1.15
diff -Ldoc/adminguide/logshipping.sgml -Ldoc/adminguide/logshipping.sgml -u -w -r1.14 -r1.15
--- doc/adminguide/logshipping.sgml
+++ doc/adminguide/logshipping.sgml
@@ -294,6 +294,10 @@
 represent the time you will wish to have apply to all of the data in
 the given log shipping transaction log. </para>
 </listitem>
+
+<listitem><para> You may find information on how relevant activity is
+logged in <xref linkend="logshiplog">. </para></listitem>
+
 </itemizedlist>
 </sect2>
 </sect1>
Index: slony.sgml
===================================================================
RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/slony.sgml,v
retrieving revision 1.32
retrieving revision 1.33
diff -Ldoc/adminguide/slony.sgml -Ldoc/adminguide/slony.sgml -u -w -r1.32 -r1.33
--- doc/adminguide/slony.sgml
+++ doc/adminguide/slony.sgml
@@ -83,6 +83,7 @@
   <corpauthor>The PostgreSQL Global Development Group</corpauthor>
   <author> <firstname>Christopher</firstname> <surname>Browne</surname> </author>
  </articleinfo>
+ &bestpractices;
  &firstdb;
  &startslons;
  &subscribenodes;
@@ -99,9 +100,10 @@
  &ddlchanges;
  &usingslonik;
  &adminscripts;
+ &slonyupgrade;
  &versionupgrade;
- &bestpractices;
  &testbed;
+ &loganalysis;
  &help;
 </article>
 
Index: filelist.sgml
===================================================================
RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/filelist.sgml,v
retrieving revision 1.16
retrieving revision 1.17
diff -Ldoc/adminguide/filelist.sgml -Ldoc/adminguide/filelist.sgml -u -w -r1.16 -r1.17
--- doc/adminguide/filelist.sgml
+++ doc/adminguide/filelist.sgml
@@ -41,6 +41,8 @@
 <!entity locking            SYSTEM "locking.sgml">
 <!entity supportedplatforms SYSTEM "supportedplatforms.sgml">
 <!entity testbed            SYSTEM "testbed.sgml">
+<!entity loganalysis        SYSTEM "loganalysis.sgml">
+<!entity slonyupgrade       SYSTEM "slonyupgrade.sgml">
 
 <!-- back matter -->
 <!entity biblio     SYSTEM "biblio.sgml">