[Slony1-general] UTF-8 Data

Tue Sep 12 13:54:11 PDT 2006

Marcin Mank wrote:
>> There surely should be some better way, such as finding which specific
>> tuples are problematic, and updating them on the source database.
>>
>> A thought...  You might do two dumps:
>>
>> 1.  Raw, no conversion using iconv
>>
>> 2.  Another, which converts using iconv [generate this by passing file #1
>>     
> thru iconv...]
>   
>
> I am just struggling with this issue, my solution:
>
> CREATE OR REPLACE FUNCTION utf8_encode(text)
>   RETURNS text AS
> $BODY$
> ($s)=@_;
>  utf8::encode($s);
> return $s;
> $BODY$
>   LANGUAGE 'plperlu' IMMUTABLE;
>
>
>
> and now :
>
> foreach suspect table {
>     update table set field=utf8_encode(field) where field<>
> utf8_encode(field)
> }
>
> kinda slow, but might be good enough.
>
> Greetings
> Marcin
>   

That is likely to break when the field can be NULL, right?  After all
NULL <> NULL...

You could possibly get a bit better performance with
  foreach suspect table {
    update table set f1 = utf8_encode(f1), f2 = utf8_encode(f2), ... fn
= utf8_encode(fn)
    where
       (f1 <> utf8_encode(f1) and f1 is not null) or
       (f2 <> utf8_encode(f2) and f2 is not null) or ...
       (fn <> utf8_encode(fn) and fn is not null);
 }

By the way, we have *some* Unicode tests in the Slony-I test suite. 
Does anyone have any comprehensive way of generating large swathes of
the Unicode character sets?  I'd like to have something that's more
nearly complete.