Slawek Jarosz Slawek.Jarosz at
Fri Sep 14 03:23:52 PDT 2007
Hello, I am having problems with the stability of Slony-I (version
1.2.6).  I have a simple set up with 1 master and 1 slave database.
Both are running on 2 GHz, SuSE 9.2 Linux servers connect directly via
an ethernet cable.  I'm also running High-Availability Linux which I'm
using to manage the virtual database IP addresses and handle
network/machine failure events.
The test I'm doing is writing a UNIX timestamp the prime database and
checking if both databases are updated with the timestamp.  This runs
fine for a number of hours then a number of problems occur (sometimes
1) the slave database is no longer updated with the timestamp (the
master is updated)
2) database primeship changes (master becomes slave) but according to HA
Linux no failure has occured.
>From the Slony log file I can see some errors which occur every 30
seconds or so:
2007-09-12 13:39:15 GMT ERROR  remoteWorkerThread_1: "begin transaction;
set transaction isolation level serializable; lock table
"_t1".sl_config_lock; select "_t1".failoverSet_int(1, 2, 1, 10787);
notify "_t1_Event"; notify "_t1_Confirm"; insert into "_t1".sl_event
(ev_origin, ev_seqno, ev_timestamp,      ev_minxid, ev_maxxid, ev_xip,
ev_type , ev_data1, ev_data2, ev_data3    ) values ('1', '10787',
'2007-09-12 07:34:50.791482', '9692768', '9692769', '', 'FAILOVER_SET',
'1', '2', '1'); insert into "_t1".sl_confirm      (con_origin,
con_received, con_seqno, con_timestamp)    values (1, 2, '10787',
now()); commit transaction;" PGRES_FATAL_ERROR ERROR:  duplicate key
violates unique constraint "pg_trigger_tgrelid_tgname_index"
>From log file slon-smsdb-node2.err (where smsdb is the name of my
WATCHDOG: No Slon is running for node node2!
WATCHDOG: You ought to check the postmaster and slon for evidence of a
WATCHDOG: I'm going to restart slon for node2...
WATCHDOG: Restarted slon for the t1 cluster, PID 3240
>From PostgreSQL log file
2007-09-13 04:16:53 LOG:  SSL SYSCALL error: EOF detected
2007-09-13 04:16:53 LOG:  could not receive data from client: Connection
reset by peer
2007-09-13 04:16:53 LOG:  unexpected EOF on client connection

So the questions I have:
1) Where (i.e. log files) can I find out more information about what's
2) If Slony-I fails and looks like watchdog cannot recover from it, how
can I restart it?
3) And of course, any ideas why is Slony failing?
Thank you for your help,
-------------- next part --------------
An HTML attachment was scrubbed...

More information about the Slony1-general mailing list