bugzilla-daemon at main.slony.info bugzilla-daemon at main.slony.info
Wed Sep 7 10:19:04 PDT 2016
http://www.slony.info/bugzilla/show_bug.cgi?id=365

           Summary: Replication Lag? - All nodes appear to lag when a
                    single provider node is unreachable
           Product: Slony-I
           Version: devel
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: minor
          Priority: low
         Component: slon
        AssignedTo: slony1-bugs at lists.slony.info
        ReportedBy: glynastill at yahoo.co.uk
                CC: slony1-bugs at lists.slony.info
   Estimated Hours: 0.0


This bug was reported on Slony1-general back in February here:

    http://lists.slony.info/pipermail/slony1-general/2016-February/013267.html

I read the message and recalled seeing similar behaviour myself, but then got
waylaid by something else and forgot about it.

I remembered just now, and could reproduce it in a quick test with 2.2.4 (I'm
assuming this hasn't been fixed in 2.2.5) as follows:

 - Have multiple subscribers to a set that are also providers / were subscribed
with FORWARD = YES
 - Stop postgres on one of those subscribers

What appears to happen is that changes are still replicated to the remaining
subscribers, and confirms are generated on those subscribers but they don't
manage to make their way back to the origin until the postgres instance we
stopped is started again.

In my test setup I've 4 nodes, as follows (though I'm pretty sure node 5 being
subscribed to node 4 is irrelevent to the issue):

node 8 = origin of set
node 7 = forwarding subscriber to set, subscribed to node 8
node 4 = forwarding subscriber to set, subscribed to node 8
node 5 = forwarding subscriber to set, subscribed to node 4

Here is the log from the slon against the origin (node 8):

2016-09-07_155951 BSTERROR  slon_connectdb: PQconnectdb("dbname=TEST
host=192.168.0.102 user=slony") failed - could not connect to server:
Connection refused
        Is the server running on host "192.168.0.102" and accepting
        TCP/IP connections on port 5432?
2016-09-07_155951 BSTWARN   remoteListenThread_5: DB connection failed - sleep
10 seconds

2016-09-07_155951 BSTERROR  slon_connectdb: PQconnectdb("dbname=TEST
host=192.168.0.102 user=slony") failed - could not connect to server:
Connection refused
        Is the server running on host "192.168.0.102" and accepting
        TCP/IP connections on port 5432?
2016-09-07_155951 BSTERROR  remoteWorkerThread_7: cannot connect to data
provider 5 on 'dbname=TEST host=192.168.0.102 user=slony'

2016-09-07_155958 BSTERROR  slon_connectdb: PQconnectdb("dbname=TEST
host=192.168.0.102 user=slony") failed - could not connect to server:
Connection refused
        Is the server running on host "192.168.0.102" and accepting
        TCP/IP connections on port 5432?
2016-09-07_155958 BSTERROR  remoteWorkerThread_4: cannot connect to data
provider 5 on 'dbname=TEST host=192.168.0.102 user=slony'

2016-09-07_160001 BSTERROR  slon_connectdb: PQconnectdb("dbname=TEST
host=192.168.0.102 user=slony") failed - could not connect to server:
Connection refused
        Is the server running on host "192.168.0.102" and accepting
        TCP/IP connections on port 5432?
2016-09-07_160001 BSTWARN   remoteListenThread_5: DB connection failed - sleep
10 seconds

And on another subscriber (node 7):

2016-09-07_155957 BSTERROR  slon_connectdb: PQconnectdb("dbname=TEST
host=192.168.0.102 user=slony") failed - could not connect to server:
Connection refused
        Is the server running on host "192.168.0.102" and accepting
        TCP/IP connections on port 5432?
2016-09-07_155957 BSTERROR  remoteWorkerThread_4: cannot connect to data
provider 5 on 'dbname=TEST host=192.168.0.102 user=slony'

2016-09-07_160001 BSTERROR  slon_connectdb: PQconnectdb("dbname=TEST
host=192.168.0.102 user=slony") failed - could not connect to server:
Connection refused
        Is the server running on host "192.168.0.102" and accepting
        TCP/IP connections on port 5432?
2016-09-07_160001 BSTWARN   remoteListenThread_5: DB connection failed - sleep
10 seconds

-- 
Configure bugmail: http://www.slony.info/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Slony1-bugs mailing list