Fri Sep 14 03:23:52 PDT 2007
- Previous message: [Slony1-general] cluster broken
- Next message: [Slony1-general] Replication stopping
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello, I am having problems with the stability of Slony-I (version 1.2.6). I have a simple set up with 1 master and 1 slave database. Both are running on 2 GHz, SuSE 9.2 Linux servers connect directly via an ethernet cable. I'm also running High-Availability Linux which I'm using to manage the virtual database IP addresses and handle network/machine failure events. The test I'm doing is writing a UNIX timestamp the prime database and checking if both databases are updated with the timestamp. This runs fine for a number of hours then a number of problems occur (sometimes independantly): 1) the slave database is no longer updated with the timestamp (the master is updated) 2) database primeship changes (master becomes slave) but according to HA Linux no failure has occured. >From the Slony log file I can see some errors which occur every 30 seconds or so: 2007-09-12 13:39:15 GMT ERROR remoteWorkerThread_1: "begin transaction; set transaction isolation level serializable; lock table "_t1".sl_config_lock; select "_t1".failoverSet_int(1, 2, 1, 10787); notify "_t1_Event"; notify "_t1_Confirm"; insert into "_t1".sl_event (ev_origin, ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type , ev_data1, ev_data2, ev_data3 ) values ('1', '10787', '2007-09-12 07:34:50.791482', '9692768', '9692769', '', 'FAILOVER_SET', '1', '2', '1'); insert into "_t1".sl_confirm (con_origin, con_received, con_seqno, con_timestamp) values (1, 2, '10787', now()); commit transaction;" PGRES_FATAL_ERROR ERROR: duplicate key violates unique constraint "pg_trigger_tgrelid_tgname_index" >From log file slon-smsdb-node2.err (where smsdb is the name of my database) WATCHDOG: No Slon is running for node node2! WATCHDOG: You ought to check the postmaster and slon for evidence of a crash! WATCHDOG: I'm going to restart slon for node2... WATCHDOG: Restarted slon for the t1 cluster, PID 3240 >From PostgreSQL log file 2007-09-13 04:16:53 LOG: SSL SYSCALL error: EOF detected 2007-09-13 04:16:53 LOG: could not receive data from client: Connection reset by peer 2007-09-13 04:16:53 LOG: unexpected EOF on client connection So the questions I have: 1) Where (i.e. log files) can I find out more information about what's happening? 2) If Slony-I fails and looks like watchdog cannot recover from it, how can I restart it? 3) And of course, any ideas why is Slony failing? Thank you for your help, Slawek -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20070914/93fce2d8/attachment.htm
- Previous message: [Slony1-general] cluster broken
- Next message: [Slony1-general] Replication stopping
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list