From bugzilla-daemon at main.slony.info Thu Dec 10 11:12:47 2015 From: bugzilla-daemon at main.slony.info (bugzilla-daemon at main.slony.info) Date: Thu, 10 Dec 2015 11:12:47 -0800 (PST) Subject: [Slony1-bugs] [Bug 336] Review transaction isolation level requirements for slon database connections. In-Reply-To: References: Message-ID: <20151210191247.1A4FB2917B4@main.slony.info> http://www.slony.info/bugzilla/show_bug.cgi?id=336 --- Comment #1 from Tom Tignor 2015-12-10 11:12:47 PST --- The following is a summary of discussion on a crippling performance problem resolved by reducing the remote listener from serializable to repeatable read isolation. Per Steve S, I'm using this ticket to recommend we move forward with a solution like this in some future release. The consensus from the discussion appeared to be: - Reduced isolation for the remote listener could be safe and effective for certain workloads. - Should provide as a non-default runtime option. - Should include big red warnings in the documentation. ---- Hello slony1 community, I?m part of a team at Akamai working on a notification service based on postgres. (We call it an Alert Management System.) We?re at the point where we need to scale past the single instance DB and so have been working with slony1-2.2.4 (and postgresql-9.1.18) to make that happen. Most tests in the past few months have been great, but in recent tests the reassuring SYNC-event-output-per-two-seconds suddenly disappeared. Throughout the day, it returns for a few minutes (normally less than 5, never 10) and then re-enters limbo. Vigorous debugging ensued, and the problem was proven to be the serializable isolation level set in slon/remote_listen.c. Our recent test environment doesn?t have a tremendous write rate (measured in KB/s), but it does have 200-400 clients at any one time, which may be a factor. Below is the stack shown in gdb of the postgres server proc (identified via pg_stat_activity) while slon is in limbo. What are the thoughts on possible changes to the remote listener isolation level and their impact? I?ve tested changes using repeatable read instead, and also with serializable but dropping the deferrable option. The latter offers little improvement if any, but the former seems to return us to healthy replication. In searching around, I found Jan W filed Bug 336 last year (link below) which suggests we could relax the isolation level here and elsewhere. If it was helpful, I could verify an agreed solution and submit it back as a patch. (Not really in the slony community yet, just looking at the process now.) Thanks in advance, http://www.slony.info/bugzilla/show_bug.cgi?id=336 (gdb) thread apply all bt Thread 1 (process 13052): #0 0xffffe430 in __kernel_vsyscall () #1 0xf76d2c0f in semop () from /lib32/libc.so.6 #2 0x08275a26 in PGSemaphoreLock (sema=0xf69d6784, interruptOK=1 '\001') at pg_sema.c:424 #3 0x082b52cb in ProcWaitForSignal () at proc.c:1443 #4 0x082bb57a in GetSafeSnapshot (origSnapshot=) at predicate.c:1520 #5 RegisterSerializableTransaction (snapshot=0x88105a0) at predicate.c:1580 #6 0x083b3f35 in GetTransactionSnapshot () at snapmgr.c:138 #7 0x082c460a in exec_simple_query ( query_string=0xa87d248 "select ev_origin, ev_seqno, ev_timestamp, ev_snapshot, \"pg_catalog\".txid_snapshot_xmin(ev_snapshot), \"pg_catalog\".txid_snapshot_xmax(ev_snapshot), ev_type, ev_data1,"...) at postgres.c:948 #8 PostgresMain (argc=1, argv=0xa7cd1e0, dbname=0xa7cd1d0 "ams", username=0xa7cd1b8 "ams_slony") at postgres.c:4021 #9 0x08284a58 in BackendRun (port=0xa808118) at postmaster.c:3657 #10 BackendStartup (port=0xa808118) at postmaster.c:3330 #11 ServerLoop () at postmaster.c:1483 #12 0x082854d8 in PostmasterMain (argc=3, argv=0xa7ccb58) at postmaster.c:1144 #13 0x080cb430 in main (argc=3, argv=0xa7ccb58) at main.c:210 (gdb) Tom :-) ---- The last time we had a change to isolation levels was in response to this thread http://lists.slony.info/pipermail/slony1-general/2011-November/011939.html Also know as bug #255 (http://www.slony.info/bugzilla/show_bug.cgi?id=255) I can't recall if anyone figured out if we could reduce the remote listener isolation level to read committed - read only or not. One concern at the back of my mind is if a read only repeatable read transactions would result in more pivot conflicts than a read only serializable deferrable transaction where the conflicts are with a serializable transaction running on the origin by some application transaction. ---- Hi Steve, Sorry for the delay getting back. Inspired by your questions, I?ve been reading up on SSI, the Cahill paper and slony1 and postgres code. To answer your question, I don?t believe reducing the isolation level for the remote listener can increase pivot conflicts. As I understand pivots, they sit in the middle of a ?dangerous structure,? on either side of an rw-dependency relationship for two other transactions. So a read-only transaction can?t be a pivot. Also, since we?re not changing the data remote listener reads, I don?t believe we?d be creating new rw-dependencies and so making pivots of other transactions. For us, I think there is a broader issue. I found the README-SSI in the postgres 9.1.18 package. It seems clear the benefits of SSI in postgres only arrive if all your transactions are serializable. ?? * Any transaction which is run at a transaction isolation level other than SERIALIZABLE will not be affected by SSI. If you want to enforce business rules through SSI, all transactions should be run at the SERIALIZABLE transaction isolation level, and that should probably be set as the default. ?? Comments in predicate.c also seem to support the idea. I believe all the apps in our DB (other than slony1) are using the default read committed isolation level. As I review our DB-facing procs, I can see listeners have rw-dependencies on the remote worker (via sl_event) and the remote worker has an rw-dependency on any of our clients writing to sl_log_1/2. As I understand SSI, that constitutes a ?dangerous structure,? but we still can?t expect postgres SSI to save us if the clients are non-serializable. Under these conditions, what benefit comes from serializable slony1 transactions? Maybe a solution could be to provide a reduced serialization level as a runtime option? Requirements vary between apps. For bank transactions, it?s certainly clear that everything should be bulletproof. Far better to get it done late than to do it wrong. For our notification service, though, timeliness is more important. No ones likes losing data, but the value of the data degrades in minutes (and unaddressed alarms are likely to be regenerated.) It?s far less tolerable to stop replication in its tracks for long periods in order to achieve serializability. I see this message has gotten long. Thanks in advance for your time and consideration. Tom :-) ---- For whatever it's worth, when Slony was first designed SERIALIZABLE was the only option -- the alternative was READ COMMITTED and IIRC there were some corner cases where the data changing in the log tables could make a big difference (memory fails, but I think it had to do with log switching). ISTR it was partly a lesson we learned from erserver, where some of the basic concepts underlying Slony had been worked out. I'd advocate a lot of caution in making changes, ideally by analysing all the kinds of transaction paths Slony could be seeing. But with new isolation levels available, it's probably worth doing some analysis. A ---- On Wed, Nov 18, 2015 at 02:26:15PM +0000, Tom Tignor wrote: ... Sorry for the delay getting back. Inspired by your questions, I?ve been reading up on SSI, the Cahill paper and slony1 and postgres code. ... It should be pointed out that 9.1 goes EOL (End of Life) in less than a year (Sep 2016), and transaction handling has changed a *lot* since then, so any changes that core Slony makes may not even work for you. (FWIW, I think dropping the isolation level in this particular instance seems safe, however.) -- Greg Sabino Mullane greg at endpoint.com End Point Corporation PGP Key: 0x14964AC8 ---- Greg, Andrew, Thanks for the feedback. Greg, can you describe the transaction handling changes you?re referring to? I recently got the latest pg 9.4 distribution. The README-SSI is identical and while there have been some changes in predicate.c, they don?t appear sweeping. The doc on Transaction Isolation appears pretty much the same too. A general question for the group: if we would consider a change like this (as a runtime option or otherwise), what?s the correct way to move it forward? Should I file a bug? Are there specific tests or analysis which should be performed? Tom :-) ---- I would use bug 336 for this. My gut is to make this a runtime option since I am not confident that downgrading the isolation level won't impact someones workload, having said that I am open to being convinced otherwise. I would also be inclined to leave the default value unchanged for 2.2.x and then maybe consider changing the default in 2.3. If we actually any concrete plans for a 2.3 version I would say just leave the change for 2.3 but I don't see a 2.3 release happening soon and I don't want to change the default behaviour in a minor release. Generally when we make a change like this I try to do lots of runs through the disorder/clustertest suite that doesn't cover all types of workloads. Have you tried this change on your workload? Does it actually help things? It would be great if Jan and Chris were to comment on this thread ---- Yes indeed. With this change, we see replication on all three subscriber nodes in the 2-5 secs range per SYNC event. (We?re configured with the 2-sec default.) Without it, the subscribers spend most of the day paralyzed, with sporadic blocks of some minutes of consciousness which they use to frenetically catch up. Figure I?ll await others? comments and then post excerpts to bug 336. Your point on not impacting others? workloads is very well taken. A non-default runtime option seems like the right way forward. Tom :-) ---- Without taking SYNC snapshots in a SERIALIZABLE transaction I believe that a Slony-I replica could suffer the same inconsistency dangers that a pg_dump without --serializable-deferrable can suffer. Namely a replica would not be usable as a source for reporting. From the 9.4 pg_dump docs: "This option is not beneficial for a dump which is intended only for disaster recovery. It could be useful for a dump used to load a copy of the database for reporting or other read-only load sharing while the original database continues to be updated. Without it the dump may reflect a state which is not consistent with any serial execution of the transactions eventually committed. For example, if batch processing techniques are used, a batch may show as closed in the dump without all of the items which are in the batch appearing." Changing the default isolation levels(s) may therefore change, what a replica can safely be used for and I believe that creating reports is one of the major use cases. Using options with big, bold, red, flashing warnings in the documentation would be the only way to go. Regards, Jan ---- I was wondering if this is actually possible or not. The remote slon only selects from sl_event and sl_log_*. The remote worker is only going to see rows that are covered by a snapshot range in the SYNC in sl_event. Rows in sl_log_* might be visible from a transaction point of view but they won't be captured by the where conditions for pulling from sl_log. The snapshot for the event in sl_log is done by the local sync connection which is a read-write connection not by a slon remote read-only connection. (I'm ignoring copy_set from the above analysis). ---- One point to clarify: while a general review of isolation levels could be helpful, the problem I saw and addressed was specifically in the remote listener. The remedy I?ve been working with is a one-line change. The behavior we had without it seems similar to the observation Jan makes in the bug 336 description. --- src/slon/remote_listen.c.SAVE 2015-11-12 11:09:06.405693227 -0500 +++ src/slon/remote_listen.c 2015-11-12 11:13:00.157825287 -0500 @@ -318,7 +318,7 @@ } if (PQserverVersion(dbconn) >= 90100) { - slon_mkquery(&query1, "SET SESSION CHARACTERISTICS AS TRANSACTION read only isolation level serializable deferrable"); + slon_mkquery(&query1, "SET SESSION CHARACTERISTICS AS TRANSACTION read only isolation level repeatable read"); res = PQexec(dbconn, dstring_data(&query1)); if (PQresultStatus(res) != PGRES_COMMAND_OK) { Tom :-) ---- On Thu, Nov 19, 2015 at 06:09:54PM +0000, Tignor, Tom wrote: Thanks for the feedback. Greg, can you describe the transaction handling changes you?re referring to? Sorry, my bad - I misremembered the serializable overhaul as happening in 9.2, not 9.1. -- Greg Sabino Mullane greg at endpoint.com End Point Corporation PGP Key: 0x14964AC8 -- Configure bugmail: http://www.slony.info/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. You are the assignee for the bug. From bugzilla-daemon at main.slony.info Thu Dec 10 11:17:03 2015 From: bugzilla-daemon at main.slony.info (bugzilla-daemon at main.slony.info) Date: Thu, 10 Dec 2015 11:17:03 -0800 (PST) Subject: [Slony1-bugs] [Bug 336] Review transaction isolation level requirements for slon database connections. In-Reply-To: References: Message-ID: <20151210191703.E508A2917BC@main.slony.info> http://www.slony.info/bugzilla/show_bug.cgi?id=336 Tom Tignor changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ttignor at akamai.com -- Configure bugmail: http://www.slony.info/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. You are the assignee for the bug.