Sat Jul 29 05:26:55 PDT 2006
- Previous message: [Slony1-commit] By cbbrowne: Reorganized log documentation, and linked it into various
- Next message: [Slony1-commit] By cbbrowne: Add in document on how to perform Slony-I version upgrade
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Log Message: ----------- Doc Updates: A new file has been added (already referenced) that documents what log messages you can expect to see. This contains log-related material that used to be in monitoring.sgml Added Files: ----------- slony1-engine/doc/adminguide: loganalysis.sgml (r1.1) -------------- next part -------------- --- /dev/null +++ doc/adminguide/loganalysis.sgml @@ -0,0 +1,945 @@ +<!-- $Id: loganalysis.sgml,v 1.1 2006/07/29 12:26:54 cbbrowne Exp $ --> +<sect1 id="loganalysis"> +<title>Log Analysis</title> + +<indexterm><primary>Log analysis</primary></indexterm> + +<para>Here are some of things that you may find in your &slony1; logs, +and explanations of what they mean.</para> + +<sect2><title>CONFIG notices</title> + +<para>These entries are pretty straightforward. They are informative +messages about your configuration.</para> + +<para>Here are some typical entries that you will probably run into in +your logs: + +<screen> +CONFIG main: local node id = 1 +CONFIG main: loading current cluster configuration +CONFIG storeNode: no_id=3 no_comment='Node 3' +CONFIG storePath: pa_server=5 pa_client=1 pa_conninfo="host=127.0.0.1 dbname=foo user=postgres port=6132" pa_connretry=10 +CONFIG storeListen: li_origin=3 li_receiver=1 li_provider=3 +CONFIG storeSet: set_id=1 set_origin=1 set_comment='Set 1' +CONFIG main: configuration complete - starting threads +</screen></para></sect2> + +<sect2><title>DEBUG Notices</title> + +<para>Debug notices are always prefaced by the name of the thread that +the notice originates from. You will see messages from the following +threads: + +<variablelist> +<varlistentry><term>localListenThread</term> + +<listitem><para> This is the local thread that listens for events on +the local node.</para></listitem></varlistentry> + +<varlistentry><term>remoteWorkerThread-X</term> + +<listitem><para> The thread processing remote events. You can expect +to see one of these for each node that this node communicates +with.</para></listitem></varlistentry> + +<varlistentry><term>remoteListenThread-X</term> + +<listitem><para>Listens for events on a remote node database. You may +expect to see one of these for each node in the +cluster.</para></listitem></varlistentry> + +<varlistentry><term>cleanupThread</term> <listitem><para> Takes care +of things like vacuuming, cleaning out the confirm and event tables, +and deleting old data.</para></listitem></varlistentry> + +<varlistentry><term>syncThread</term> <listitem><para> Generates SYNC +events.</para></listitem></varlistentry> + +</variablelist> +</para> +</sect2> + +<sect2> <title> How to read &slony1; logs </title> + +<para> Note that as far as slon is concerned, there is no +<quote>master</quote> or <quote>slave.</quote> They are just +nodes. </para> + +<para>What you can expect, initially, is to see, on both nodes, some +events propagating back and forth. Firstly, there should be some +events published to indicate creation of the nodes and paths. If you +don't see those, then the nodes aren't properly communicating with one +another, and nothing else will happen... </para> + +<itemizedlist> + +<listitem><para>Create the two nodes.</para> + +<para> No slons are running yet, so there are no logs to look +at.</para> + +</listitem> + +<listitem><para> Start the two slons</para> + +<para> The logs for each will start out very quiet, as neither node +has much to say, and neither node knows how to talk to another node. +Each node will periodically generate a <command>SYNC</command> event, +but recognize <emphasis>nothing</emphasis> about what is going on on +other nodes.</para> + +</listitem> + +<listitem><para> Do the <xref linkend="stmtstorepath"> to set up +communications paths. That will allow the nodes to start to become +aware of one another.</para> + +<para> The slon logs should now start to receive events from +<quote>foreign</quote> nodes.</para> + +<para> In version 1.0, <xref linkend="table.sl-listen"> is not set up +automatically, so things still remain quiet until you explicitly +submit <command>STORE LISTEN</command> requests. In version 1.1, the +<quote>listen paths</quote> are set up automatically, which will much +more quickly get the communications network up and running. </para> + +<para> If you look at the contents of the tables <xref +linkend="table.sl-node"> and <xref linkend="table.sl-path"> and <xref +linkend="table.sl-listen">, on each node, that should give a good idea +as to where things stand. Until the <xref linkend="slon"> starts, +each node may only be partly configured. If there are two nodes, +there should be two entries in all three of these tables once the +communications configuration is set up properly. If there are fewer +entries than that, well, that should give you some idea of what is +missing.</para> +</listitem> + +<listitem><para> If needed (<emphasis>e.g.</emphasis> - before version +1.1), submit <xref linkend="stmtstorelisten"> requests to indicate how +the nodes will use the communications paths. </para> + +<para> Once this has been done, the nodes' logs should show a greater +level of activity, with events periodically being initiated on one +node or the other, and propagating to the other. </para> +</listitem> + +<listitem> <para> You'll set up the set +(<xref linkend="stmtcreateset">), add tables +(<xref linkend="stmtsetaddtable">), and sequences +(<xref linkend="stmtsetaddsequence">), and will see relevant +events <xref linkend="logaddobjects"> only in the logs for the origin +node for the set. </para></listitem> + +<listitem><para> Then, when you submit the <xref +linkend="stmtsubscribeset"> request, the event should go to both +nodes. </para> + +<para> The origin node has little more to do, after that... The +subscriber will then have a <command>COPY_SET</command> event, which +will lead to logging information about adding each table and copying +its data. See <xref linkend="logsubtime"> for more +details.</para></listitem> + +</itemizedlist> + +<para>After that, you'll mainly see two sorts of behaviour:</para> + +<itemizedlist> + +<listitem><para> On the origin, there won't be too terribly much +logged, just indication that some <command>SYNC</command> events are +being generated and confirmed by other nodes. +See <xref linkend="lognormalsync"> to see the sorts of log entries to +expect.</para></listitem> + +<listitem><para> On the subscriber, there will be reports of +<command>SYNC</command> events, and that the subscriber pulls data +from the provider for the relevant set(s). This will happen +infrequently if there are no updates going to the origin node; it will +happen frequently when the origin sees heavy updates. </para> +</listitem> + +</itemizedlist> + +</sect2> + +<sect2> <title> Log Messages and Implications </title> + +<para> This section lists numerous of the error messages found in +&slony1;, along with a brief explanation of implications. It is a +fairly well comprehensive list, leaving out mostly some of +the <command>DEBUG4</command> messages that are generally +uninteresting.</para> + +<sect3 id="logshiplog"><title> Log Messages Associated with Log +Shipping <title> + +<para> Most of these represent errors that come up if +the <xref linkend="logshipping"> functionality breaks. You may expect +things to break if the filesystem being used for log shipping fills, +or if permissions on that directory are wrongly set. </para> + +<itemizedlist> +<listitem><para><command>ERROR: remoteWorkerThread_%d: log archive failed %s - %s\n</command> </para> + +<para> This indicates that an error was encountered trying to write a +log shipping file. Normally the &lslon; will retry, and hopefully +succeed. </para> </listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: writing archive log...</command></para> + +<para> This indicates that a log shipping archive log is being written for a particular <command>SYNC</command> set. </para></listitem> + +<listitem><para><command>INFO: remoteWorkerThread_%d: Run Archive Command %s</command></para> + +<para> If &lslon; has been configured (<option>-x</option> +aka <envar>command_on_logarchive</envar>) to run a command after +generating each log shipping archive log, this reports when that +process is spawned using <function>system()</function>. </para> </listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: Could not open +COPY SET archive file %s - %s</command></para> + +<para> Seems pretty self-explanatory... </para></listitem> +<listitem><para><command>ERROR: remoteWorkerThread_%d: Could not generate COPY SET archive header %s - %s</command></para> + +<para> Probably means that we just filled up a filesystem... </para></listitem> +</itemizedlist> +</sect3> + +<sect3 id="ddllogs"><title> Log Messages - DDL scripts </title> + +<para> The handling of DDL is somewhat fragile, as described +in <xref linkend="ddlchanges">; here are both informational and error +messages that may occur in the progress of +an <xref linkend="stmtddlscript"> request.</para> + +<itemizedlist> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: DDL preparation +failed - set %d - only on node %</command></para> + +<para> Something broke when applying a DDL script on one of the nodes. +This is quite likely indicates that the node's schema differed from +that on the origin; you may need to apply a change manually to the +node to allow the event to proceed. The scary, scary alternative +might be to delete the offending event, assuming it can't possibly +work...</para> </listitem> +<listitem><para><command>SLON_CONFIG: remoteWorkerThread_%d: DDL request with %d statements</command></para> +<para> This is informational, indicating how many SQL statements were processed. </listitem> +<listitem><para><command>SLON_ERROR: remoteWorkerThread_%d: DDL had invalid number of statements - %d</command></para> + +<para> Occurs if there were < 0 statements (which should be impossible) or > MAXSTATEMENTS statements. Probably the script was bad...</listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: malloc() +failure in DDL_SCRIPT - could not allocate %d bytes of +memory</command></para> + +<para> This should only occur if you submit some extraordinarily large +DDL script that makes a &lslon; run out of memory </para></listitem> + +<listitem><para><command>CONFIG: remoteWorkerThread_%d: DDL Statement %d: [%s]</command></para> + +<para> This lists each DDL statement as it is submitted. </listitem> +<listitem><para><command>ERROR: DDL Statement failed - %s</command></para> + +<para> Oh, dear, one of the DDL statements that worked on the origin +failed on this remote node... </para></listitem> + +<listitem><para><command>CONFIG: DDL Statement success - %s</command></para> + +<para> All's well... </para></listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: Could not generate DDL archive tracker %s - %s</command></para> + +<para> Apparently the DDL script couldn't be written to a log shipping file... </para></listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: Could not submit DDL script %s - %s</command></para> + +<para>Couldn't write the script to a log shipping file.</para> </listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: Could not close DDL script %s - %s</command></para> + +<para>Couldn't close a log shipping file for a DDL script.</para> </listitem> +<listitem><para><command>ERROR: Slony-I ddlScript_prepare(): set % not found</command></para> + +<para> Set wasn't found on this node; you probably gave the wrong ID number... </para></listitem> + +<listitem><para><command>ERROR: Slony-I ddlScript_prepare_int(): set % not found</command></para> + +<para> Set wasn't found on this node; you probably gave the wrong ID number... </para></listitem> + +<listitem><para><command>ERROR: Slony-I: alterTableForReplication(): Table with id % not found</command></para> +<para> Apparently the table wasn't found; could the schema be messed up? </para> </listitem> + +<listitem><para><command>ERROR: Slony-I: alterTableForReplication(): Table % is already in altered state</command></para> + +<para> Curious... We're trying to set a table up for replication +a <emphasis>second</emphasis> time? </para> </listitem> + +<listitem><para><command>ERROR: Slony-I: alterTableRestore(): Table with id % not found</command></para> + +<para> This runs when a table is being restored to <quote>non-replicated</quote> state; apparently the replicated table wasn't found.</para></listitem> + +<listitem><para><command>ERROR: Slony-I: alterTableRestore(): Table % is not in altered state</command></para> + +<para> Hmm. The table isn't in altered replicated state. That shouldn't be, if replication had been working properly...</para> </listitem> +<listitem><para><command>NOTICE: Slony-I: alterTableForReplication(): multiple instances of trigger % on table %'',</command></para> + +<para> This normally happens if you have a table that had a trigger attached to it that replication hid due to this being a subscriber node, and then you added a trigger by the same name back to replication. Now, when trying to "fix up" triggers, those two triggers conflict. </para> + +<para> The DDL script will keep running and rerunning, or the UNINSTALL NODE will keep failing, until you drop the <quote>visible</quote> trigger, by hand, much as you must have added it, by hand, earlier. </para> +</listitem> +<listitem><para><command>ERROR: Slony-I: Unable to disable triggers</command></para> +<para> This is the error that follows the <quote>multiple triggers</quote> problem. </para></listitem> +</itemizedlist> +</sect3> + +<sect3><title> Threading Issues </title> + +<para> There should not be any <quote>user-serviceable</quote> aspects +to the &slony1; threading model; each &lslon; creates a quite +well-specified set of helper threads to manage the various database +connections that it requires. The only way that anything should break +on the threading side is if you have not compiled &postgres; libraries +to <quote>play well</quote> with threading, in which case you will be +unable to compile &slony1; in the first place. </para> + +<itemizedlist> +<listitem><para><command>FATAL: remoteWorkerThread_%d: pthread_create() - %s</command></para> + +<para> Couldn't create a new remote worker thread. </para> +</listitem> +<listitem><para><command>DEBUG1 remoteWorkerThread_%d: helper thread for provider %d created</command></para> + +<para> This normally happens when the &lslon; starts: a thread is created for each node to which the local node should be listening for events.</para></listitem> + +<listitem><para><command>DEBUG1: remoteWorkerThread_%d: helper thread for provider %d terminated </command></para> + +<para> If subscriptions reshape such that a node no longer provides a +subscription, then the thread that works on that node can be +dropped. </para></listitem> + +<listitem><para><command>DEBUG1: remoteWorkerThread_%d: disconnecting +from data provider %d </command></para> + +<para> A no-longer-used data provider may be dropped; if connection +information is changed, the &lslon; needs to disconnect and +reconnect. </para></listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: ignore new events due to shutdown</command></para> + +<para> If the &lslon; is shutting down, it is futile to process more events</listitem> +<listitem><para><command>DEBUG1: remoteWorkerThread_%d: node %d - no worker thread</command></para> + +<para> Curious: we can't wake up the worker thread; there probably +should already be one... </para></listitem> + +</itemizedlist> +</sect3> + +<sect3 id="logsubtime"><title> Log Entries At Subscription Time </title> + +<para> Subscription time is quite a special time in &slony1;. If you +have a large amount of data to be copied to subscribers, this may take +a considerable period of time. &slony1; logs a fairly considerable +amount of information about its progress, which is sure to be useful +to the gentle reader. In particular, it generates log output every +time it starts and finishes copying data for a given table as well as +when it completes reindexing the table. That may not make a 28 hour +subscription go any faster, but at least helps you have some idea of +how it is progressing. </para> + +<itemizedlist> +<listitem><para><command>DEBUG1: copy_set %d</command></para> + +<para> This indicates the beginning of copying data for a new subscription. </para></listitem> +<listitem><para><command>ERROR: remoteWorkerThread_%d: set %d not found in runtime configuration </command></para> + +<para> &lslon; tried starting up a subscription; it couldn't find conninfo for the data source. Perhaps paths are not properly propagated?</para></listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: node %d has no pa_conninfo </command></para> + +<para> Apparently the conninfo configuration +was <emphasis>wrong</emphasis>...</para></listitem> + +<listitem><para><command>ERROR: copy set %d cannot connect to provider DB node %d </command></para> + +<para> &lslon; couldn't connect to the provider. Is the conninfo +wrong? Or perhaps authentication is misconfigured? Or perhaps the +database is down?</para></listitem> + +<listitem><para><command>DEBUG1: remoteWorkerThread_%d: connected to provider DB </command></para> + +<para> Excellent: the copy set has a connection to its provider</para></listitem> +<listitem><para><command>ERROR: Slony-I: sequenceSetValue(): sequence % not found</command></para> +<para> Curious; the sequence object is missing. Could someone have dropped it from the schema by hand (e.g. - not using <xref linkend="stmtddlscript">)?</listitem> + +<listitem><para><command>ERROR: Slony-I: subscribeSet() must be called on provider</command></para> +<para> This function should only get called on the provider node. &lslonik; normally handles this right, unless one had wrong DSNs in a &lslonik; script...</para> +</listitem> + +<listitem><para><command>ERROR: Slony-I: subscribeSet(): set % not found</command></para> +<para> Hmm. The provider node isn't aware of this set. Wrong parms to a &lslonik; script?</para> </listitem> + +<listitem><para><command>ERROR: Slony-I: subscribeSet(): set origin and receiver cannot be identical</command></para> +<para> Duh, an origin node can't subscribe to itself.</para> </listitem> + +<listitem><para><command>ERROR: Slony-I: subscribeSet(): set provider and receiver cannot be identical</command></para> +<para> A receiver must subscribe to a <emphasis>different</emphasis> node...</para> </listitem> +<listitem><para><command>Slony-I: subscribeSet(): provider % is not an active forwarding node for replication set %</command></para> + +<para> You can only use a live, active, forwarding provider as a data +source. </para></listitem> + +<listitem><para>Slony-I: subscribeSet_int(): set % is not active, cannot change provider<command></command></para> +<para> You can't change the provider just yet...</para></listitem> +<listitem><para><command>Slony-I: subscribeSet_int(): set % not found</command></para> +<para> This node isn't aware of the set... Perhaps you submitted wrong parms?</para></listitem> +<listitem><para><command>Slony-I: unsubscribeSet() must be called on receiver</command></para> +<para> Seems obvious... This probably indicates a bad &lslonik; admin DSN...</para></listitem> +<listitem><para><command>Slony-I: Cannot unsubscribe set % while being provider</command></para> + +<para> This should seem obvious; <xref linkend="stmtunsubscribeset"> will fail if a node has dependent subscribers for which it is the provider </para> </listitem> + +<listitem><para><command>Slony-I: cleanupEvent(): Single node - deleting events < %</command></para> +<para> If there's only one node, the cleanup event will delete old events so that you don't get <quote>build-up of crud.</quote></para></listitem> +<listitem><para><command>Slony-I: tableAddKey(): table % not found</command></para> +<para> Perhaps you didn't copy the schema over properly?</para></listitem> +<listitem><para><command>Slony-I: tableDropKey(): table with ID% not found</command></para> +<para> Seems curious; you were presumably replicating to this table, so for this to be gone seems rather odd...</para></listitem> +<listitem><para><command>Slony-I: determineIdxnameUnique(): table % not found</command></para> + +<para>Did you properly copy over the schema to a new node???</para></listitem> +<listitem><para><command>Slony-I: table % has no primary key</command></para> + +<para> This likely signifies a bad loading of schema...</para></listitem> + +<listitem><para><command>Slony-I: table % has no unique index %</command></para> + +<para> This likely signifies a bad loading of schema...</para></listitem> +<listitem><para><command>WARN: remoteWorkerThread_%d: transactions +earlier than XID %s are still in progress</command></para> + +<para> This indicates that some old transaction is in progress from before the earliest available <command>SYNC</command> on the provider. &slony1; cannot start replicating until that transaction completes. This will repeat until thetransaction completes... +</para></listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: prepare to copy table %s </command></para> + +<para> This indicates that &lslon; is beginning preparations to set up subscription for a table.</para></listitem> +<listitem><para><command>DEBUG1: remoteWorkerThread_%d: table %s will require Slony-I serial key</command></para> + +<para> Evidently this is a table defined with <xref linkend="stmttableaddkey"> where &slony1; has to add a surrogate primary key.</listitem> +<listitem><para><command>ERROR: remoteWorkerThread_%d: Could not lock table %s on subscriber</command></para> + +<para> For whatever reason, the table could not be locked, so the +subscription needs to be restarted. If the problem was something like +a deadlock, retrying may help. If the problem was otherwise, you may +need to intervene...</para></listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: all tables for set %d found on subscriber</command></para> + +<para> An informational message indicating that the first pass through the tables found no problems... </para></listitem> +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: copy sequence %s</command></para> + +<para> Processing some sequence... </para></listitem> +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: copy table %s</command></para> + +<para> &lslon; is starting to copy a table... </para></listitem> +<listitem><para><command>DEBUG3: remoteWorkerThread_%d: table %s Slony-I serial key added local</command></para> + +<para> Just added new column to the table to provide surrogate primary key.</listitem> +<listitem><para><command>DEBUG3: remoteWorkerThread_%d: local table %s already has Slony-I serial key</command></para> + +<para> Did not need to add serial key; apparently it was already there.</listitem> +<listitem><para><command>DEBUG3: remoteWorkerThread_%d: table %s does not require Slony-I serial key</command></para> + +<para> Apparently this table didn't require a special serial key... </para> +</listitem> +<listitem><para><command>DEBUG3: remoteWorkerThread_%d: table %s Slony-I serial key added local</command></para> </listitem> +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: Begin COPY of table %s</command></para> + +<para> &lslon; is about to start the COPY on both sides to copy a table... </para></listitem> +<listitem><para><command>ERROR: remoteWorkerThread_%d: Could not generate copy_set request for %s - %s</command></para> + +<para> This indicates that the <command>delete/copy</command> requests +failed on the subscriber. The &lslon; will repeat +the <command>COPY_SET</command> attempt; it will probably continue to +fail.. </para> </listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: copy to stdout on provider - %s %s</command></para> + +<para> Evidently something about the COPY to <filename>stdout</filename> on the provider node broke... The event will be retried... </para> </listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: copy from stdin on local node - %s %s</command></para> + +<para> Evidently something about the COPY into the table on the +subscriber node broke... The event will be +retried... </para> </listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: %d bytes copied for table %s</command></para> + +<para> This message indicates that the COPY of the table has +completed. This is followed by running <command>ANALYZE</command> and +reindexing the table on the subscriber.</para></listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: %.3f seconds +to copy table %s</command></para> + +<para> After this message, copying and reindexing and analyzing the table on the subscriber is complete.</para></listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: set last_value of sequence %s (%s) to %s</command></para> + +<para> As should be no surprise, this indicates that a sequence has been processed on the subscriber.</para> +</listitem> +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: %.3 seconds to copy sequences</command></para> + +<para> Summarizing the time spent processing sequences in the <command>COPY_SET</command> event. </para></listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: query %s did not return a result </command></para> + +<para> This indicates that the query, as part of final processing of <command>COPY_SET</command>, failed. The copy will restart... </para></listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: copy_set no previous SYNC found, use enable event</command></para> + +<para> This takes place if no past SYNC event was found; the current +event gets set to the event point of +the <command>ENABLE_SUBSCRIPTION</command> event. +</para> </listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: copy_set SYNC found, use event seqno %s</command></para> + +<para> This takes place if a SYNC event was found; the current +event gets set as shown. </para> </listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: sl_setsync entry for set %d not found on provider</command></para> + +<para> SYNC synchronization information was expected to be drawn from +an existing subscriber, but wasn't found. Something +replication-breakingly-bad has probably +happened... </para> </listitem> +<listitem><para><command>DEBUG1: remoteWorkerThread_%d: could not insert to sl_setsync_offline</command></para> + +<para> Oh, dear. After setting up a subscriber, and getting pretty +well everything ready, some writes to a log shipping file failed. +Perhaps disk filled up... </para></listitem> + +<listitem><para><command>DEBUG1: remoteWorkerThread_%d: %.3f seconds to build initial setsync status</command></para> + +<para> Indicates the total time required to get the copy_set event finalized...</para> +</listitem> + +<listitem><para><command>DEBUG1: remoteWorkerThread_%d: disconnected from provider DB</command></para> + +<para> At the end of a subscribe set event, the subscriber's &lslon; +will disconnect from the provider, clearing out +connections... </listitem> + +<listitem><para><command>DEBUG1: remoteWorkerThread_%d: copy_set %d done in %.3f seconds</command></para> + +<para> Indicates the total time required to complete copy_set... This indicates a successful subscription!</para> +</listitem> +</itemizedlist> +</sect3> + +<sect3 id="lognormalsync"><title> Log Entries Associated With Normal SYNC activity </title> + +<para> Some of these messages indicate exceptions, but +the <quote>normal</quote> stuff represents what you should expect to +see most of the time when replication is just plain working.</para> + +<itemizedlist> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: forward confirm %d,%s received by %d</command></para> + +<para> These events should occur frequently and routinely as nodes report confirations of the events they receive. </para></listitem> + +<listitem><para><command>DEBUG1: remoteWorkerThread_%d: SYNC %d processing</command></para> + +<para> This indicates the start of processing of a <command>SYNC</command> </para> </listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: No pa_conninfo +for data provider %d</command></para> + +<para> Oh dear, we haven't connection information to connect to the +data provider. That shouldn't be possible, +normally...</para></listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: cannot connect to data provider %d on 'dsn'</command></para> + +<para> Oh dear, we haven't got <emphasis>correct</emphasis> connection +information to connect to the data provider.</para></listitem> + +<listitem><para><command>DEBUG1: remoteWorkerThread_%d: connected to data provider %d on 'dsn'</command></para> + +<para> Excellent; the &lslon; has connected to the provider. </para> </listitem> + +<listitem><para><command>WARN: remoteWorkerThread_%d: don't know what ev_seqno node %d confirmed for ev_origin %d</command></para> + +<para> There's no confirmation information available for this node's provider; need to abort the <command>SYNC</command> and wait a bit in hopes that that information will emerge soon...</para></listitem> + +<listitem><para><command>DEBUG1: remoteWorkerThread_%d: data provider %d only confirmed up to ev_seqno %d for ev_origin %d </command></para> + +<para> The provider for this node is a subscriber, and apparently that subscriber is a bit behind. The &lslon; will need to wait for the provider to catch up until it has <emphasis> new</emphasis> data. </para></listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: data provider %d confirmed up to ev_seqno %s for ev_origin %d - OK</command></para> + +<para> All's well; the provider should have the data that the subscriber needs...</para></listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: syncing set %d with %d table(s) from provider %d</command></para> + +<para> This is declaring the plans for a <command>SYNC</command>: we have a set with some tables to process.</para></listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: ssy_action_list value: %s length: %d</command></para> + +<para> This portion of the query to collect log data to be applied has been known to <quote>bloat up</quote>; this shows how it has gotten compressed... </para></listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: Didn't add OR to provider</command></para> + +<para> This indicates that there wasn't anything in a <quote>provider</quote> clause in the query to collect log data to be applied, which shouldn't be. Things are quite likely to go bad at this point... </para></listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: no sets need syncing for this event</command></para> + +<para>This will be the case for all <command>SYNC</command> events generated on nodes that are not originating replication sets. You can expect to see these messages reasonably frequently. </para></listitem> +<listitem><para><command>DEBUG3: remoteWorkerThread_%d: activate helper %d</command></para> + +<para> We're about to kick off a thread to help process <command>SYNC</command> data...</para></listitem> + +<listitem><para><command>DEBUG4: remoteWorkerThread_%d: waiting for log data</command></para> + +<para> The thread is waiting to get data to consume (e.g. - apply to the replica).</para></listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: %s %s - qualification was %s </command></para> + +<para> Apparently an application of replication data to the subscriber failed... This quite likely indicates some sort of serious corruption.</para></listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: replication query did not affect one row (cmdTuples = %s) - query was: %s qualification was: %s</command></para> + +<para> If <envar> SLON_CHECK_CMDTUPLES</envar> is set, &lslon; applies +changes one tuple at a time, and verifies that each change affects +exactly one tuple. Apparently that wasn't the case here, which +suggests a corruption of replication. That's a rather bad +thing...</para></listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: SYNC aborted</command></para> + +<para> If any errors have been encountered that haven't already aborted the <command>SYNC</command>, this catches and aborts it.</para></listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: new sl_rowid_seq value: %s</command></para> + +<para> This marks the progression of this internal &slony1; sequence. </para></listitem> +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: SYNC %d done in %.3f seconds</command></para> + +<para> This indicates the successful completion of a <command>SYNC</command>. Hurray!</para></listitem> + +<listitem><para><command>DEBUG1: remoteWorkerThread_%d_d:%.3f seconds delay for first row </command></para> + +<para> This indicates how long it took to get the first row from the LOG cursor that pulls in data from the sl_log tables.</para> </listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d_d: large log_cmddata for actionseq %s not found</command></para> + +<para> &lslon; could not find the data for one of the <quote>very large</quote> sl_log table tuples that are pulled individually. This shouldn't happen.</para></listitem> +<listitem><para><command>DEBUG2: remoteWorkerThread_%d_d:%.3f seconds until close cursor </command></para> + +<para> This indicates how long it took to complete reading data from the LOG cursor that pulls in data from the sl_log tables.</para> </listitem> +<listitem><para><command>DEBUG2: remoteWorkerThread_%d_d: inserts=%d updates=%d deletes=%d </command></para> + +<para> This reports how much activity was recorded in the current <command>SYNC</command> set. </para> </listitem> + +<listitem><para><command>DEBUG3: remoteWorkerThread_%d: compress_actionseq(list,subquery) Action list: %s</command></para> + +<para> This indicates a portion of the LOG cursor query that is about to be compressed. (In some cases, this could grow to <emphasis>enormous</emphasis> size, blowing up the query parser...) </para></listitem> +<listitem><para><command>DEBUG3: remoteWorkerThread_%d: compressed actionseq subquery %s</command></para> + +<para> This indicates what that subquery compressed into. </para></listitem> + +</itemizedlist> +</sect3> + +<sect3 id="logaddobjects"><title> Log Entries - Adding Objects to Sets </title> + +<para> These entries will be seen on an origin node's logs at the time +you are configuring a replication set; some of them will be seen on +subscribers at subscription time. </para> + +<itemizedlist> +<listitem><para><command>ERROR: Slony-I: setAddTable_int(): table % has no index %</command></para> + +<para> Apparently a PK index was specified that is absent on this node...</para></listitem> +<listitem><para><command>ERROR: Slony-I setAddTable_int(): table % not found</command></para> + +<para> Table wasn't found on this node; did you load the schema in properly?. </para></listitem> +<listitem><para><command>ERROR: Slony-I setAddTable_int(): table % is not a regular table</command></para> + +<para> You tried to replicate something that isn't a table; you can't do that! </para></listitem> +<listitem><para><command>NOTICE: Slony-I setAddTable_int(): table % PK column % nullable</command></para> + +<para> You tried to replicate a table where one of the columns in the would-be primary key is allowed to be null. All PK columns must be <command>NOT NULL.</command> This request is about to fail. </para></listitem> +<listitem><para><command>ERROR: Slony-I setAddTable_int(): table % not replicable!</command></para> + +<para> +This happens because of the NULLable PK column. </para></listitem> +<listitem><para><command>ERROR: Slony-I setAddTable_int(): table id % has already been assigned!</command></para> + +<para> The table ID value needs to be assigned uniquely +in <xref linkend="stmtsetaddtable">; apparently you requested a value +already in use. +</para></listitem> +<listitem><para><command>ERROR: Slony-I setAddSequence(): set % not found</command></para> +<para> Apparently the set you requested is not available...</para></listitem> + +<listitem><para><command>ERROR: Slony-I setAddSequence(): set % has remote origin</command></para> +<para> You may only add things at the origin node.</para></listitem> + +<listitem><para><command>ERROR: Slony-I setAddSequence(): cannot add sequence to currently subscribed set %</command></para> +<para> Apparently the set you requested has already been subscribed. You cannot add tables/sequences to an already-subscribed set. You will need to create a new set, add the objects to that new set, and set up subscriptions to that.</para></listitem> +<listitem><para><command>ERROR: Slony-I setAddSequence_int(): set % not found</command></para> +<para> Apparently the set you requested is not available...</para></listitem> + +<listitem><para><command>ERROR: Slony-I setAddSequence_int(): sequence % not found</command></para> +<para> Apparently the sequence you requested is not available on this node. How did you set up the schemas on the subscribers???</para></listitem> + +<listitem><para><command>ERROR: Slony-I setAddSequence_int(): % is not a sequence</command></para> +<para> Seems pretty obvious :-).</para></listitem> + +<listitem><para><command>ERROR: Slony-I setAddSequence_int(): sequence ID % has already been assigned</command></para> +<para> Each sequence ID added must be unique; apparently you have reused an ID.</para></listitem> + +</itemizedlist> +</sect3> + +<sect3><title> Logging When Moving Objects Between Sets </title> + +<itemizedlist> +<listitem><para><command>ERROR: Slony-I setMoveTable_int(): table % not found</command></para> + +<para> Table wasn't found on this node; you probably gave the wrong ID number... </para></listitem> + +<listitem><para><command>ERROR: Slony-I setMoveTable_int(): set ids cannot be identical</command></para> + +<para> Does it make sense to move a table from a set into the very same set? </para></listitem> + +<listitem><para><command>ERROR: Slony-I setMoveTable_int(): set % not found</command></para> + +<para> Set wasn't found on this node; you probably gave the wrong ID number... </para></listitem> + +<listitem><para><command>ERROR: Slony-I setMoveTable_int(): set % does not originate on local node</command></para> + +<para> Set wasn't found to have origin on this node; you probably gave the wrong EVENT NODE... </para></listitem> + +<listitem><para><command>ERROR: Slony-I setMoveTable_int(): subscriber lists of set % and % are different</command></para> + +<para> You can only move objects between sets that have identical subscriber lists. </para></listitem> + +<listitem><para><command>ERROR: Slony-I setMoveSequence_int(): sequence % not found</command></para> + +<para> Sequence wasn't found on this node; you probably gave the wrong ID number... </para></listitem> + +<listitem><para><command>ERROR: Slony-I setMoveSequence_int(): set ids cannot be identical</command></para> + +<para> Does it make sense to move a sequence from a set into the very same set? </para></listitem> + +<listitem><para><command>ERROR: Slony-I setMoveSequence_int(): set % not found</command></para> + +<para> Set wasn't found on this node; you probably gave the wrong ID number... </para></listitem> + +<listitem><para><command>ERROR: Slony-I setMoveSequence_int(): set % does not originate on local node</command></para> + +<para> Set wasn't found to have origin on this node; you probably gave the wrong EVENT NODE... </para></listitem> + +<listitem><para><command>ERROR: Slony-I setMoveSequence_int(): subscriber lists of set % and % are different</command></para> + +<para> You can only move objects between sets that have identical subscriber lists. </para></listitem> +</itemizedlist> +</sect3> + +<sect3><title> Issues with Dropping Objects </title> + +<itemizedlist> +<listitem><para><command>ERROR: Slony-I setDropTable(): table % not found</command></para> + +<para> Table wasn't found on this node; are you sure you had the ID right? </para></listitem> +<listitem><para><command>ERROR: Slony-I setDropTable(): set % not found</command></para> + +<para> The replication set wasn't found on this node; are you sure you had the ID right? </para></listitem> +<listitem><para><command>ERROR: Slony-I setDropTable(): set % has remote origin</command></para> + +<para> The replication set doesn't originate on this node; you probably need to specify an <command>EVENT NODE</command> in the <xref linkend="stmtsetdroptable"> command. </para></listitem> + +<listitem><para><command>ERROR: Slony-I setDropSequence_int(): sequence % not found</command></para> +<para> Could this sequence be in another set?</para></listitem> + +<listitem><para><command>ERROR: Slony-I setDropSequence_int(): set % not found</command></para> +<para> Could you have gotten the set ID wrong?</para></listitem> + +<listitem><para><command>ERROR: Slony-I setDropSequence_int(): set % has origin at another node - submit this to that node</command></para> + +<para> This message seems fairly self-explanatory...</para></listitem> + +</itemizedlist> +</sect3> + +<sect3><title> Issues with MOVE SET, FAILOVER, DROP NODE </title> + +<para> Many of these errors will occur if you submit a &lslonik; +script that describes a reconfiguration incompatible with your +cluster's current configuration. Those will lead to the +feeling: <quote>Whew, I'm glad &lslonik; caught that for me!</quote> </quote> + +<para> Some of the others lead to a &lslon; telling itself to fall +over; all <emphasis>should</emphasis> be well when you restart it, as +it will read in the revised, newly-correct configuration when it +starts up.</para> + +<para> Alas, a few indicate that <quote>something bad +happened,</quote> for which the resolution may not necessarily be +easy. Nobody said that replication was easy, alas...</para> + +<itemizedlist> +<listitem><para><command>ERROR: Slony-I: DROP_NODE cannot initiate on the dropped node</command></para> + +<para> You need to have an EVENT NODE other than the node that is to be dropped.... </para></listitem> + +<listitem><para><command>ERROR: Slony-I: Node % is still configured as a data provider</command></para> + +<para> You cannot drop a node that is in use as a data provider; you +need to reshape subscriptions so no nodes are dependent on it first. </para></listitem> +<listitem><para><command>ERROR: Slony-I: Node % is still origin of one or more sets</command></para> + +<para> You can't drop a node if it is the origin for a set! Use <xref linkend="stmtmoveset"> or <xref linkend="stmtfailover"> first. </para></listitem> + +<listitem><para><command>ERROR: Slony-I: cannot failover - node % has no path to the backup node</command></para> + +<para> You cannot failover to a node that isn't connected to all the subscribers, at least indirectly. </para></listitem> +<listitem><para><command>ERROR: Slony-I: cannot failover - node % is not subscribed to set %</command></para> + +<para> You can't failover to a node that doesn't subscribe to all the relevant sets. </para></listitem> + +<listitem><para><command>ERROR: Slony-I: cannot failover - subscription for set % is not active</command></para> <para> If the subscription has been set up, but isn't yet active, that's still no good.</para></listitem> + +<listitem><para><command>ERROR: Slony-I: cannot failover - node % is not a forwarder of set %</command></para> + +<para> You can only failover or move a set to a node that has +forwarding turned on. </para></listitem> + +<listitem><para><command>NOTICE: failedNode: set % has no other direct receivers - move now</command></para> + +<para> If the backup node is the only direct subscriber, then life is a bit simplified... No need to reshape any subscriptions!</para></listitem> +<listitem><para><command>NOTICE: failedNode set % has other direct receivers - change providers only</command></para> +<para> In this case, all direct subscribers are pointed to the backup node, and the backup node is pointed to receive from another node so it can get caught up.</para></listitem> +<listitem><para><command>NOTICE: Slony-I: Please drop schema _ at CLUSTERNAME@</command></para> + +<para> A node has been uninstalled; you may need to drop the schema... </para></listitem> + + +</itemizedlist> +</sect3> + +<sect3><title> Log Switching </title> + +<para> These messages relate to the new-in-1.2 facility whereby +&slony1; periodically switches back and forth between storing data +in <envar>sl_log_1</envar> and <envar>sl_log_2</envar>.</para> + +<itemizedlist> +<listitem><para><command>Slony-I: Logswitch to sl_log_2 initiated'</command></para> +<para> Indicates that &lslon; is in the process of switching over to this log table.</para></listitem> +<listitem><para><command>Slony-I: Logswitch to sl_log_1 initiated'</command></para> +<para> Indicates that &lslon; is in the process of switching over to this log table.</para></listitem> +<listitem><para><command>Previous logswitch still in progress</command></para> + +<para> An attempt was made to do a log switch while one was in progress...</para></listitem> + +<listitem><para><command>ERROR: remoteWorkerThread_%d: cannot determine current log status</command></para> + +<para> The attempt to read from sl_log_status, which determines +whether we're working on <envar>sl_log_1</envar> +or <envar>sl_log_2</envar> got no results; that can't be a good thing, +as there certainly should be data here... Replication is likely about +to halt...</para> </listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: current local log_status is %d</command></para> +<para> This indicates which of <envar>sl_log_1</envar> and <envar>sl_log_2</envar> are being used to store replication data. </para> +</listitem> + +</itemizedlist> +</sect3> +<sect3><title> Miscellanea </title> + +<para> Perhaps these messages should be categorized further; that +remains a task for the documentors.</para> + +<itemizedlist> + +<listitem><para><command>ERROR: Slonik version: @MODULEVERSION@ != Slony-I version in PG build %</command></para> + +<para> This is raised in <function>checkmoduleversion()</function> if there is a mismatch between the version of &slony1; as reported by &lslonik; versus what the &postgres; build has.</para></listitem> +<listitem><para><command>ERROR: Slony-I: registry key % is not an int4 value</command></para> + +<para> Raised in <function>registry_get_int4()</function>, this complains if a requested value turns out to be NULL.</listitem> +<listitem><para><command>ERROR: registry key % is not a text value</command></para> + +<para> Raised in <function>registry_get_text()</function>, this complains if a requested value turns out to be NULL.</listitem> +<listitem><para><command>ERROR: registry key % is not a timestamp value</command></para> + +<para> Raised in <function>registry_get_timestamp()</function>, this complains if a requested value turns out to be NULL.</listitem> +<listitem><para><command>NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=%</command></para> + +<para> This will occur when a &lslon; starts up after another has crashed; this is routine cleanup.</para></listitem> +<listitem><para><command>ERROR: Slony-I: This node is already initialized</command></para> + +<para> This would typically happen if you submit <xref linkend="stmtstorenode"> gainst a node that has already been set up with the &slony1; schema. </para></listitem> +<listitem><para><command>ERROR: Slony-I: node % not found</command></para> + +<para> An attempt to mark a node not listed locally as enabled should fail...</para></listitem> +<listitem><para><command>ERROR: Slony-I: node % is already active</command></para> + +<para> An attempt to mark a node already marked as active as active should fail...</para></listitem> +<listitem><para><command>DEBUG4: remoteWorkerThread_%d: added active set %d to provider %d</command></para> + +<para> Indicates that this set is being provided by this +provider.</para></listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: event %d +ignored - unknown origin</command></para> + +<para> Probably happens if events arrive before +the <command>STORE_NODE</command> event that tells that the new node +now exists... </para></listitem> + +<listitem><para><command>WARN: remoteWorkerThread_%d: event %d ignored - origin inactive</command></para> + +<para> This shouldn't occur now (2006) as we don't support the notion +of deactivating a node... </para> +</listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: event %d ignored - duplicate </command></para> + +<para> This might be expected to happen if the event notification +comes in concurrently from two sources... </para> </listitem> + +<listitem><para><command>DEBUG2: remoteWorkerThread_%d: unknown node %d</command></para> + +<para> Happens if the &lslon; is unaware of this node; probably a sign +of <command>STORE_NODE</command> requests not +propagating... </para> </listitem> +</itemizedlist> +</sect3> +</sect2> +</sect1> +<!-- Keep this comment at the end of the file +Local variables: +mode: sgml +sgml-omittag:nil +sgml-shorttag:t +sgml-minimize-attributes:nil +sgml-always-quote-attributes:t +sgml-indent-step:1 +sgml-indent-data:t +sgml-parent-document:"book.sgml" +sgml-exposed-tags:nil +sgml-local-catalogs:"/usr/lib/sgml/catalog" +sgml-local-ecat-files:nil +End: +-->
- Previous message: [Slony1-commit] By cbbrowne: Reorganized log documentation, and linked it into various
- Next message: [Slony1-commit] By cbbrowne: Add in document on how to perform Slony-I version upgrade
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-commit mailing list