Christopher Browne cbbrowne
Tue Sep 12 13:54:11 PDT 2006
Marcin Mank wrote:
>> There surely should be some better way, such as finding which specific
>> tuples are problematic, and updating them on the source database.
>>
>> A thought...  You might do two dumps:
>>
>> 1.  Raw, no conversion using iconv
>>
>> 2.  Another, which converts using iconv [generate this by passing file #1
>>     
> thru iconv...]
>   
>
> I am just struggling with this issue, my solution:
>
> CREATE OR REPLACE FUNCTION utf8_encode(text)
>   RETURNS text AS
> $BODY$
> ($s)=@_;
>  utf8::encode($s);
> return $s;
> $BODY$
>   LANGUAGE 'plperlu' IMMUTABLE;
>
>
>
> and now :
>
> foreach suspect table {
>     update table set field=utf8_encode(field) where field<>
> utf8_encode(field)
> }
>
> kinda slow, but might be good enough.
>
> Greetings
> Marcin
>   

That is likely to break when the field can be NULL, right?  After all
NULL <> NULL...

You could possibly get a bit better performance with
  foreach suspect table {
    update table set f1 = utf8_encode(f1), f2 = utf8_encode(f2), ... fn
= utf8_encode(fn)
    where
       (f1 <> utf8_encode(f1) and f1 is not null) or
       (f2 <> utf8_encode(f2) and f2 is not null) or ...
       (fn <> utf8_encode(fn) and fn is not null);
 }

By the way, we have *some* Unicode tests in the Slony-I test suite. 
Does anyone have any comprehensive way of generating large swathes of
the Unicode character sets?  I'd like to have something that's more
nearly complete.



More information about the Slony1-general mailing list