CVS User Account cvsuser
Mon Dec 12 20:44:12 PST 2005
Log Message:
-----------
Add a discussion of assumptions about encodings, FAQ on Unicode problem
between PG 8.0 and 8.1

Modified Files:
--------------
    slony1-engine/doc/adminguide:
        faq.sgml (r1.48 -> r1.49)
        prerequisites.sgml (r1.23 -> r1.24)

-------------- next part --------------
Index: prerequisites.sgml
===================================================================
RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/prerequisites.sgml,v
retrieving revision 1.23
retrieving revision 1.24
diff -Ldoc/adminguide/prerequisites.sgml -Ldoc/adminguide/prerequisites.sgml -u -w -r1.23 -r1.24
--- doc/adminguide/prerequisites.sgml
+++ doc/adminguide/prerequisites.sgml
@@ -109,6 +109,30 @@
 
 </sect2>
 
+<sect2 id="encoding">
+<title> Database Encoding </title>
+
+<para> &postgres; databases may be created in a number of language
+encodings, set up via the <command>createdb --encoding=$ENCODING
+databasename</command> option.  &slony1; assumes that they use
+<emphasis>identical</emphasis> encodings.
+</para>
+
+<para> If the encodings are <quote>closely equivalent</quote>, you may
+be able to get away with them not being absolutely identical.  For
+instance, if the origin system used <envar>LATIN1</envar> and a
+subscriber used <envar>SQL_ASCII</envar> and another subscriber used
+<envar>UNICODE</envar>, and your application never challenges the
+boundary conditions between these variant encodings, you may never
+experience any problems.  </para>
+
+<para> In &postgres; 8.1, changes were made to the
+<envar>UNICODE</envar> encoding because earlier versions accepted some
+invalid encodings.  This can lead to <link linkend="faqunicode">
+replication problems.</link> </para>
+
+</sect2>
+
 <sect2 id="times">
 <title> Time Synchronization</title>
 
Index: faq.sgml
===================================================================
RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/faq.sgml,v
retrieving revision 1.48
retrieving revision 1.49
diff -Ldoc/adminguide/faq.sgml -Ldoc/adminguide/faq.sgml -u -w -r1.48 -r1.49
--- doc/adminguide/faq.sgml
+++ doc/adminguide/faq.sgml
@@ -1760,6 +1760,52 @@
 when they sporadically have series' of very large tuples. </para>
 </answer>
 </qandaentry>
+
+<qandaentry id="faqunicode"> <question> <para> I am trying to replicate
+<envar>UNICODE</envar> data from &postgres; 8.0 to &postgres; 8.1, and
+am experiencing problems. </para>
+</question>
+
+<answer> <para> &postgres; 8.1 is quite a lot more strict about what
+UTF-8 mappings of Unicode characters it accepts as compared to version
+8.0.</para>
+
+<para> If you intend to use &slony1; to update an older database to 8.1, and
+might have invalid UTF-8 values, you may be for an unpleasant
+surprise.</para>
+
+<para> Let us suppose we have a database running 8.0, encoding in UTF-8.
+That database will accept the sequence <command>'\060\242'</command> as UTF-8 compliant,
+even though it is really not. </para>
+
+<para> If you replicate into a &postgres; 8.1 instance, it will complain
+about this, either at subscribe time, where &slony1; will complain
+about detecting an invalid Unicode sequence during the COPY of the
+data, which will prevent the subscription from proceeding, or, upon
+adding data, later, where this will hang up replication fairly much
+irretrievably.  (You could hack on the contents of sl_log_1, but
+that quickly gets <emphasis>really</emphasis> unattractive...)</para>
+
+<para>There have been discussions as to what might be done about this.  No
+compelling strategy has yet emerged, as all are unattractive. </para>
+
+<para>If you are using Unicode with &postgres; 8.0, you run a
+considerable risk of corrupting data.  </para>
+
+<para> If you use replication for a one-time conversion, there is a risk of
+failure due to the issues mentioned earlier; if that happens, it
+appears likely that the best answer is to fix the data on the 8.0
+system, and retry. </para>
+
+<para> In view of the risks, running replication between versions seems to be
+something you should not keep running any longer than is necessary to
+migrate to 8.1. </para>
+
+<para> For more details, see the <ulink url=
+"http://archives.postgresql.org/pgsql-hackers/2005-12/msg00181.php">
+discussion on postgresql-hackers mailing list. </ulink>.  </para>
+</answer>
+</qandaentry>
 </qandaset>
 
 <!-- Keep this comment at the end of the file Local variables:


More information about the Slony1-commit mailing list