[Slony1-general] slony not replicating after re-initializing the slave cluster

Wed Sep 22 12:14:08 PDT 2010

On 10-09-21 07:23 PM, Brian Fehrle wrote:
> I got some time and decided to test this again on some VM boxes rather
> than our live environment, but had little luck.
>
> Simply so I can have this logged in the mailing list with what was done
> (and hopefully a solution in the near future), here's my process I
> preformed.
>
> I created two clusters that mirror our live boxes as closely as possible.
> - - PostgreSQL version 8.4.2
> - - Slony version 1.2.20
> - - both installed via source
>
> I created the master cluster as:
> # initdb -D /usr/local/pgsql/encoding_master/ --locale=C --encoding=LATIN1
> I created the slave cluster as:
> # initdb -D /usr/local/pgsql/encoding_slave/ --locale=C --encoding=SQL_ASCII
>
> I set up a master ->  slave slony cluster and replicated a single table
> in a single replication set, and verified that replication was taking place.
>
> I wrote a small daemon that inserts a row into the table being
> replicated on the master once a minute.
>
> I brought down the slon daemons, and preformed a pg_dump on the slave:
> # pg_dump -p 5433 -Fc postgres>  /tmp/postgres_dump.sql
>
> I brought down the slave cluster, then created a new one with the LATIN1
> encoding:
> # initdb -D /usr/local/pgsql/encoding_slave_latin/ --locale=C
> --encoding=SQL_ASCII
>
> I brought the cluster online and started up the slon daemons. The slave
> slon daemon reported remoteworker and remote listener threads, and
> reported increasing SYNC numbers, however did not actually replicate
> data from the master to the slave, and _slony.sl_log_1 on the master
> grew in numbers with every insert that took place . NOTE: This is the
> same behavior I experienced before on our live servers.
>
> I then executed the following:
> #!/bin/bash
> . etc/slony.env
> echo "Repair config"
>
> slonik<<_EOF_
> cluster name = $CLUSTERNAME ;
> node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST
> port=$MASTERPORT user=$REPUSER';
> node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST
> port=$SLAVEPORT user=$REPUSER';
> REPAIR CONFIG (SET ID = 1, EVENT NODE = 1, EXECUTE ONLY ON = 2);
> _EOF_
>

Try

REPAIR CONFIG (SET ID=1, EVENT NODE=2, EXECUTE ONLY ON=2);

I tried a somewhat simliar sequence to what you described (though with a 
different postgresql and slony version) and the REPAIR CONFIG did not 
seem to do anything on node 2.  Ie the oid values in sl_table did NOT 
match what was in pg_class.  When I ran it with event node=2 then it did 
seem to update sl_table on node 2.

> it executed without error, however replication did not start working,
> and the slave daemon started acting weird with the child process being
> terminated constantly, then restarted every 10 seconds just to be
> terminated again.