Wed Aug 17 22:25:09 PDT 2005
- Previous message: [Slony1-general] Re: Slony1-general Digest, Vol 19, Issue 25
- Next message: [Slony1-general] Re: Slow catch-up
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Marc Munro wrote: >Brad, thank you so much for your response. I have more information but >still no solution. > >In response to Brad's suggestions: > >1) pg_stat_activity >Interesting. There is an idle in transaction process from the provider. >Shutting down and restarting the database resulted in a new session with >the same status. This appears to be something to do with slony. > > That may be fine, as long as the idle-in-transaction process isn't old. >2) vacuum subscriber's pg_listener >No bloat to report. Only a handful of rows existed. > > > The question is how many rows it *couldn't* purge; how many dead tuples were kicking around. e.g. - the output of VACUUM ANALYZE on the table... >3) vacuum subscribers sl_log_1 >Nothing - there are no rows in the subscriber. > >4) Vacuuming subscriber >Already done. > >Results of further investigation: >I think the problem may simply be that the load on the provider during >my initial sync was very high. This has resulted in sl_log_1 becoming >huge. In my slony logs on the subscriber I often see lines like: > >2005-08-17 09:08:47 PDT DEBUG2 remoteHelperThread_1_1: 83.952 seconds >delay for first row > >Vacuuming the sl_log_1 table on the provider shows it to be pretty big: > >INFO: --Relation _cage2.sl_log_1-- >INFO: Pages 714558: Changed 1084, Empty 0; Tup 35986240: Vac 0, Keep 0, >UnUsed 7172. > Total CPU 52.50s/9.31u sec elapsed 177.23 sec. >INFO: --Relation pg_toast.pg_toast_2177383887-- >INFO: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0. > Total CPU 0.00s/0.00u sec elapsed 0.00 sec. >INFO: Analyzing _cage2.sl_log_1 > >I have tracked down a specific slony query that takes a long time to run >on the provider (100+ secs): > >select log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, >log_cmddata from "_cage2".sl_log_1 where log_origin = 1 and ( ( > log_tableid in >(100,200,300,400,500,600,700,800,900,1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000,2100,2200,2300,2400,2500,2600,2700,2800,2900,3000,3100,3200,3300,3400,3500,3600,3700,3800,3900,4000,4100,4200,4300,4400,4500,4600,4700,4800,4900,5000,5100,5200,5300,5400,5500,5600,5700,5800,5900,6000,6100,6200,6300,6400,6500,6600,6700,6800,6900,7000,7100,7200,7300,7400,7500,99999) > and (log_xid < '2106689363' and "_cage2".xxid_lt_snapshot(log_xid, >'2106433975:2106689363:''2106433975'',''2106688480''')) > and (log_xid >= '2106433975' and "_cage2".xxid_ge_snapshot(log_xid, >'2106433975:2106689266:''2106433975'',''2106688480''')) >) ) order by log_actionseq; > >This returns 662 rows. If I remove the xxid_ge_snapshot and >xxid_lt_snapshot function calls it returns 2135674. This would seem to >be the root of my performance problem. > >I don't understand what the functions do, but allowing them to so >drastically prune the result set means that the query does a lot of work >for relatively few rows. > >Can anyone explain what is going on here? > > You're hitting a "Rod Taylor" style situation, where you have some pretty enormous amount of activity going on. Each SYNC is running for a really long time, and there's no much way of avoiding that :-(. What Rod did under analagous conditions was to group a lot of SYNCs together. Since it'll probably be doing a Seq Scan, you might as well try to process most of sl_log_1. With the default "-g 6", you'll probably see replication falling behind :-( Suggestion: set the -g option as high as you can get away with. The SYNCs will run for quite a long time, but at least they'll find lots of data to replicate. In v 1.0.x, the maximum setting out of the box for -g is "-g 100", which tries to process 100 SYNCs at a time, if you're running behind. If you're running 1.1.x, the maximum value is 10000, and you'd probably find it useful to set it pretty high until things catch up. And set the "desired_sync_time" value (the -o option to slon) to 0, so that it doesn't do the "adaptive" thing.
- Previous message: [Slony1-general] Re: Slony1-general Digest, Vol 19, Issue 25
- Next message: [Slony1-general] Re: Slow catch-up
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list