Wed Nov 30 13:17:33 PST 2011
- Previous message: [Slony1-general] Excessive locking during Slony catch-up
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 11-11-30 08:55 AM, Zac Bentley wrote: You might want to review http://bugs.slony.info/bugzilla/show_bug.cgi?id=222 and http://bugs.slony.info/bugzilla/show_bug.cgi?id=167 167 means that the load on the master will be very high after 10 days of no replication. I am not sure if 222 would cause the issue you are describing or not. Exactly what locks was postgresql waiting on obtaining and which transactions held those locks? > We recently had a database load failure, and it seems to have been > caused by Slony. I'm wondering why/how this occurred, and how it could > be prevented in the future. > > > We run a webapp with a relatively high number of database hits/minute. > We have a pair of database servers replicated via Slony: one master, one > slave. The slave runs the slons for both nodes. The two are in > physically distant locations across the US from each other, but our > hosting provider maintains a very high-bandwidth, low-latency link > between the two; it transfers data faster than our 100mbit onsite LAN > does. We use Postgres 8.4, Slony 2.0, and apache/php for the webapp. All > of our Slony options are the defaults. > > > Awhile ago, we set up replication between the two servers, and completed > the initial subscription process. Everything went well, and replication > was tested working. Then, due to a firewall problem, node 2 (the slave) > couldn’t talk to node 1 (the master) for 10 days. During that time, > there was a LOT of database activity (DML only) on the master, but Slony > wasn’t replicating any of it. > > > When we finally fixed the problem after 10 days of non-syncing, I could > see hundreds of sync requests being received, queued, and processed in > the slave’s log. I figured it would take a day or more to catch up, but > that wasn’t a problem. > > Around an hour after re-establishing the link our webapp crashed. > Checking apache’s monitoring showed that all available database > connections were filled (apache limits them; Postgres allows unlimited) > and waiting for the database to respond: a garden-variety database load > failure. We purged the connections and restarted Postgres on the master. > > > Then it happened again. And again, 20 minutes later. I checked pgadmin > for the master, and saw a fair amount of replication activity, but it > appeared to be the generation of a lot of SYNC events, nothing more. > However, the number of locks on important tables was so large and > growing so rapidly that it was causing load failures. These failures > kept occurring until we disconnected the slon host (and the slave > database—same computer) from the network again. > > > Why would SYNC catch-up cause lock bloat on our master node? According > to the SYNC documentation, no locking is supposed to take place. Is this > caused by the fact that our slon daemons run remotely from the master? > Is this normal behavior for Slony when a slave has a > many-days-out-of-date database that needs to be caught up? > > > > _______________________________________________ > Slony1-general mailing list > Slony1-general at lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general
- Previous message: [Slony1-general] Excessive locking during Slony catch-up
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list