Thu Apr 20 05:53:31 PDT 2006
- Previous message: [Slony1-general] Postmaster restart breaks slony
- Next message: [Slony1-general] Postmaster restart breaks slony
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 4/20/2006 2:24 AM, Ian Burrell wrote: > On 4/19/06, Christopher Browne <cbbrowne at ca.afilias.info> wrote: >> Ian Burrell wrote: >> > >> > We didn't notice the node1 slon was down until a few hours later. I >> > started the node1 slon daemon which inserted SYNC events. The slave >> > slon daemon then started processing the SYNC events and trying to >> > transfer data. Since the data connection had failed and the slave >> > slon daemons started failing. I noticed the error and restart all the >> > slon daemons which fixed the problem. From the CVS log for src/slon/remote_worker.c: revision 1.86.2.5 date: 2005/10/08 19:37:29; author: wieck; state: Exp; lines: +10 -1 Check existing provider DB connection in sync event processing. A DB connection loss during fetching of log rows does not cause the database connection to be dropped within the helper thread. This was able to cause a dead connection to stall replication. This fix was released with version 1.1.2. Jan >> > >> > Shouldn't the slon daemons reconnect if the remoteWorkerThread >> > connection goes down? Even dying and being restarted would be better >> > than continuously failing in a loop. We are using 1.1.0 with most of >> > the 1.1.1 patches. Has this problem been fixed in 1.1.5? >> > >> Unfortunately, the real fix to this is in CVS HEAD/1.2, which fairly >> significantly restructures the thread handling, needful for Windows >> support... >> >> I wish there were a better answer; unfortunately, the "sorta-internal >> watchdog" that was added in 1.1.0 leaves something to be desired. The >> fix is the "big fix," which is what's in CVS HEAD. >> >> All I can recommend for now is to make sure that there's a watchdog running. >> > > The problem is that the watchdog does not help for the slave slon > daemons because they do not die. They kept trying to use the broken > database connection, aborting the sync event, and then trying again. > > We had a different problem with the slon_watchdog script dying itself > instead of starting the master slon daemon. > > - Ian > _______________________________________________ > Slony1-general mailing list > Slony1-general at gborg.postgresql.org > http://gborg.postgresql.org/mailman/listinfo/slony1-general -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck at Yahoo.com #
- Previous message: [Slony1-general] Postmaster restart breaks slony
- Next message: [Slony1-general] Postmaster restart breaks slony
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list