<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.3157" name=GENERATOR></HEAD>
<BODY>
<DIV><SPAN class=682350209-14092007><FONT face=Arial size=2>Hello, I am having
problems with the stability of Slony-I (version 1.2.6). I have a simple
set up with 1 master and 1 slave database. Both are running on 2 GHz, SuSE
9.2 Linux servers connect directly via an ethernet cable. I'm also
running High-Availability Linux which I'm using to manage the virtual
database IP addresses and handle network/machine
failure events.</FONT></SPAN></DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial size=2>The test I'm doing
is writing a UNIX timestamp the prime database and checking if both databases
are updated with the timestamp. This runs fine for a number of hours then
a number of problems occur (sometimes independantly):</FONT></SPAN></DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial size=2>1) the slave
database is no longer updated with the timestamp (the master is
updated)</FONT></SPAN></DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial size=2>2) database
primeship changes (master becomes slave) but according to HA Linux no
failure has occured.</FONT></SPAN></DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial size=2>From the Slony
log file I can see some errors which occur every 30 seconds or
so:</FONT></SPAN></DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial size=2>2007-09-12 13:39:15
GMT ERROR remoteWorkerThread_1: "begin transaction; set transaction
isolation level serializable; lock table "_t1".sl_config_lock; select
"_t1".failoverSet_int(1, 2, 1, 10787); notify "_t1_Event"; notify "_t1_Confirm";
insert into "_t1".sl_event (ev_origin, ev_seqno,
ev_timestamp, ev_minxid, ev_maxxid, ev_xip,
ev_type , ev_data1, ev_data2, ev_data3 ) values ('1', '10787',
'2007-09-12 07:34:50.791482', '9692768', '9692769', '', 'FAILOVER_SET', '1',
'2', '1'); insert into "_t1".sl_confirm
(con_origin, con_received, con_seqno, con_timestamp) values
(1, 2, '10787', now()); commit transaction;" PGRES_FATAL_ERROR ERROR:
duplicate key violates unique constraint
"pg_trigger_tgrelid_tgname_index"</FONT></SPAN></DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial size=2>From log file
slon-smsdb-node2.err (where smsdb is the name of my
database)</FONT></SPAN></DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial size=2>WATCHDOG: No Slon is
running for node node2!<BR>WATCHDOG: You ought to check the postmaster and slon
for evidence of a crash!<BR>WATCHDOG: I'm going to restart slon for
node2...<BR>WATCHDOG: Restarted slon for the t1 cluster, PID
3240<BR></FONT> </SPAN></DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial size=2>From PostgreSQL log
file</FONT></SPAN></DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial size=2>2007-09-13 04:16:53
LOG: SSL SYSCALL error: EOF detected<BR>2007-09-13 04:16:53 LOG:
could not receive data from client: Connection reset by peer<BR>2007-09-13
04:16:53 LOG: unexpected EOF on client connection<BR></FONT></SPAN></DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial size=2>So the questions I
have:</FONT></SPAN></DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial size=2>1) Where (i.e. log
files) can I find out more information about what's
happening?</FONT></SPAN></DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial size=2>2) If Slony-I fails
and looks like watchdog cannot recover from it, how can I restart
it?</FONT></SPAN></DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial size=2>3) And of course,
any ideas why is Slony failing?</FONT></SPAN></DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial size=2>Thank you for your
help,</FONT></SPAN></DIV>
<DIV><SPAN class=682350209-14092007><FONT face=Arial
size=2>Slawek</DIV></FONT></SPAN></BODY></HTML>