[Slony1-general] slony not replicating after re-initializing the slave cluster

Thu Sep 16 16:35:53 PDT 2010

Hi all,
    Due to realizing that our 1 master -> 1 slave slony cluster had 
different encodings on each box, we attempted to fix that. Our master 
had encoding of LATIN1 and our slave had the encoding of SQL_ASCII (they 
were initialized so long ago, we don't know who did it or why it was 
done that way).
    Slony worked with this setup, but we wanted to fix it, due to some 
other problems, by moving the slave from SQL_ASCII to LATIN1.

    So we brought down the slon daemons, brought down the slave database 
and rebooted the physical machine the slave is on (dozens of cron jobs 
we commented out and wanted to verify they were all dead).

    When we rebooted the machine, we brought the slave postgres cluster 
online and preformed a pg_dump on the entire database (including the 
_slony schema). Then we brought down the postgres cluster, ran initdb to 
create a new one with LATIN1 encoding, brought the new cluster online 
and ran a pg_restore on it with the dump file we created before.

    After that we restarted our cron jobs, which also started up the two 
slon daemons, we started monitoring the slave and noticed that no 
updates are being applied. We're running the slon daemons with -s 60000 
(force a sync every 60 seconds) and a -x flag to get some slony logs for 
log shipping. These slony logs that are generated with -x are empty 
(they have the slony header and footer, but no insert data).

    On the master, if I do a # select * from _slony.sl_status; I get 
back that there are anywhere between 0 - 2 events, and a lag time no 
greater than 3 minutes. Monitoring the slave slony log output also 
verifies that events are being receved and processed without error every 
minute.

    Again, on the master, # select count(*) from _slony.sl_log_1; 
returns with 12,000 + rows, and it continually grows. So from what I can 
tell, the master is getting events qued up, but not pushing them in the 
events to the slave, each event is completely void of data, and it looks 
like sl_log_1 just keeps building up.

    One theory is that even though we have an exact data dump of the old 
slave cluster restored to to the new slave cluster, since the encoding 
has changed perhaps the master doesn't recognize the slave as the same 
slave it had before. If thats the case, is there any way we can get it 
to recognize it without having to rebuild the slony cluster? (rebuilding 
the cluster would mean a few days of work if not week/s).

    Other than that, I'm unsure what to make of this. I've restarted the 
daemons, and neither the master nor the slave daemon report any errors 
in the logs. I verified that the triggers exist on the master as they 
should (we never touched the master anyways, but still checking 
everything), the path to the slave remained the same as the previous 
slave (same dbname, host, port, user).

    Any thoughts or things I can check would be appreciated. Or if my 
theory about the master not recognizing the new slave cluster as the old 
one is correct, then if we can fix that that would be great.

thanks in advance,
                      Brian F