Tue Sep 13 06:59:46 PDT 2016
- Previous message: [Slony1-general] sync performance
- Next message: [Slony1-general] Controlled Switchover
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Interesting test results. By adding date commands inside and outside the script, it’s clear there’s 11-12 secs of startup contact before any commands get going. After that, I see syncs can take anywhere from 6-15 secs to execute. Once in a while, I’l also get a postgres timeout error, and I know the DB hasn’t gone down. Early on I adopted a habit of providing all conninfo for every node at the start of each script. It seems now I should be aiming for either minimal conn info or fewer scripts, or both. root at prodrpl-Amst:~# date && akaslonik /tmp/commcheck-2.slk Tue Sep 13 13:50:14 UTC 2016 /tmp/commcheck-2.slk:44: 2016-09-13 13:50:26 /tmp/commcheck-2.slk:47: 2016-09-13 13:50:34 /tmp/commcheck-2.slk:50: waiting for event (2,5001432258) to be confirmed on node 5 /tmp/commcheck-2.slk:51: 2016-09-13 13:50:47 /tmp/commcheck-2.slk:55: 2016-09-13 13:50:53 root at prodrpl-Amst:~# root at prodrpl-Amst:~# root at prodrpl-Amst:~# date && akaslonik /tmp/commcheck-2.slk Tue Sep 13 13:51:01 UTC 2016 /tmp/commcheck-2.slk:44: 2016-09-13 13:51:12 /tmp/commcheck-2.slk:46: waiting for event (2,5001432264) to be confirmed on node 5 /tmp/commcheck-2.slk:47: 2016-09-13 13:51:22 /tmp/commcheck-2.slk:51: 2016-09-13 13:51:31 /tmp/commcheck-2.slk:55: 2016-09-13 13:51:40 root at prodrpl-Amst:~# root at prodrpl-Amst:~# date && akaslonik /tmp/commcheck-2.slk Tue Sep 13 13:51:47 UTC 2016 /tmp/commcheck-2.slk:44: 2016-09-13 13:51:58 /tmp/commcheck-2.slk:46: waiting for event (2,5001432272) to be confirmed on node 5 /tmp/commcheck-2.slk:47: 2016-09-13 13:52:12 /tmp/commcheck-2.slk:50: waiting for event (2,5001432274) to be confirmed on node 5 /tmp/commcheck-2.slk:51: 2016-09-13 13:52:23 /tmp/commcheck-2.slk:54: waiting for event (2,5001432276) to be confirmed on node 5 /tmp/commcheck-2.slk:55: 2016-09-13 13:52:38 root at prodrpl-Amst:~# Tom ☺ On 9/12/16, 4:38 PM, "Steve Singer" <steve at ssinger.info> wrote: On 09/12/2016 11:39 AM, Tignor, Tom wrote: > Seems I have an additional data point: the sync test > always takes longer (> 20 secs) if I include conninfo for all cluster > nodes instead of just the local node. I had previously thought conninfo > data was only used when needed. Is this not the case? What if you do sync(id=2); wait for event (origin=2, confirmed=5, wait on=2, timeout=30); sync(id=2); wait for event (origin=2, confirmed=5, wait on=2, timeout=30); sync(id=2); wait for event (origin=2, confirmed=5, wait on=2, timeout=30); 3 times (or more) in a row, does it still take about the same amount of time as 1 sync ? When slonik starts up it contacts all the nodes it has admin conninfo for to get the current state/last event from each node. Maybe your time is spent establishing all those connections over SSL > > Tom J > > *From: *Tom Tignor <ttignor at akamai.com> > *Date: *Monday, September 12, 2016 at 10:52 AM > *To: *"slony1-general at lists.slony.info" <slony1-general at lists.slony.info> > *Subject: *sync performance > > Hello slony1 community, > > We’ve recently been testing communication reliability > between our cluster nodes. Our config is a simple setup with one > provider producing a modest volume of changes (measured in KB/s) > consumed by 5 direct subscribers, though these are geographically > distributed. The test is just a sync event followed by a wait on the > sync originator. Example: > > cluster name = ams_cluster; > > node 5 admin > > conninfo='dbname=ams > > host=23.79.242.182 > > user=ams_slony > > sslmode=verify-ca > > sslcert=/usr/local/akamai/.ams_certs/complete-ams_slony.crt > > sslkey=/usr/local/akamai/.ams_certs/ams_slony.private_key > > sslrootcert=/usr/local/akamai/etc/ssl_ca/canonical_ca_roots.pem'; > > node 2 admin conninfo = 'dbname=ams user=ams_slony'; > > sync(id=2); > > wait for event (origin=2, confirmed=5, wait on=2, timeout=30); > > Tests show the script takes 10-20 secs to run on > different nodes. > > Can anyone explain what’s happening internally during > this time, and why it takes so long? On a healthy, lightly loaded > system, we might have hoped for a sync response in just a couple > seconds. Our slon daemons are running with mostly default startup options. > > Thanks in advance, > > Tom J > > > > _______________________________________________ > Slony1-general mailing list > Slony1-general at lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general >
- Previous message: [Slony1-general] sync performance
- Next message: [Slony1-general] Controlled Switchover
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list