Sun Dec 11 10:50:34 PST 2005
- Previous message: [Slony1-general] Preparations for Slony-I 1.1.5
- Next message: [Slony1-general] replication stalled - with no errors raported by slon ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
hi i have database of about 40G running under postgresql 8.0.4 i added slony 1.1.2 replication for all tables (about 50 tables, in 5 schemas). database is quite loaded - both witl a lot of selects, and tens of thousands of updates/inserts daily. everything worked really fine. i made mrtg graph showing replication lag (taken from st_lag_time in sl_status view). during the day the lag was up to 5 seconds, withusually 2-3 peaks of up to 200 seconds every night (big, fast imports take place at this time). then - on friday - replication seemed to stop - or works *very* slow. even: select * from sl_status takes a lot of time: # explain analyze SELECT * from sl_status ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=4487.07..4497.20 rows=1 width=48) (actual time= 7527.381..7527.442 rows=1 loops=1) -> Nested Loop (cost=4487.07..4490.56 rows=1 width=60) (actual time= 7527.302..7527.316 rows=1 loops=1) -> Merge Join (cost=4487.07..4487.09 rows=1 width=40) (actual time=7527.191..7527.199 rows=1 loops=1) Merge Cond: ("outer".ev_origin = "inner".con_origin) -> Sort (cost=4220.76..4220.76 rows=1 width=24) (actual time=6399.919..6399.920 rows=1 loops=1) Sort Key: e.ev_origin -> Nested Loop (cost=4217.27..4220.75 rows=1 width=24) (actual time=6399.896..6399.909 rows=1 loops=1) -> Subquery Scan "IN_subquery" (cost= 4217.27..4217.28 rows=1 width=12) (actual time=6399.829..6399.834 rows=1 loops=1) -> HashAggregate (cost=4217.27..4217.27rows=1 width=12) (actual time= 6399.824..6399.826 rows=1 loops=1) -> Seq Scan on sl_event (cost= 0.00..3963.37 rows=50779 width=12) (actual time=71.569..6251.907 rows=48949 loops=1) Filter: (ev_origin = _irr.getlocalnodeid('_irr'::name)) -> Index Scan using "sl_event-pkey" on sl_event e (cost=0.00..3.45 rows=1 width=20) (actual time=0.051..0.054 rows=1 loops=1) Index Cond: ((e.ev_origin = "outer".ev_origin) AND (e.ev_seqno = "outer".max)) -> Sort (cost=266.31..266.31 rows=2 width=16) (actual time= 1127.256..1127.257 rows=1 loops=1) Sort Key: "IN_subquery".con_origin -> Subquery Scan "IN_subquery" (cost=266.27..266.30rows=2 width=16) (actual time= 1127.224..1127.229 rows=1 loops=1) -> HashAggregate (cost=266.27..266.28 rows=2 width=16) (actual time=1127.218..1127.220 rows=1 loops=1) -> Seq Scan on sl_confirm (cost= 0.00..266.02 rows=34 width=16) (actual time=1125.590..1127.156 rows=7 loops=1) Filter: (con_origin = _irr.getlocalnodeid('_irr'::name)) -> Index Scan using "sl_event-pkey" on sl_event ce (cost= 0.00..3.45 rows=1 width=20) (actual time=0.094..0.097 rows=1 loops=1) Index Cond: (("outer".ev_origin = ce.ev_origin) AND ("outer".max = ce.ev_seqno)) -> Index Scan using sl_confirm_idx1 on sl_confirm c (cost=0.00..6.61rows=1 width=24) (actual time= 0.038..0.064 rows=1 loops=1) Index Cond: ((c.con_origin = "outer".ev_origin) AND (c.con_received= "outer".con_received) AND ( c.con_seqno = "outer".max)) Total runtime: 7527.723 ms (24 rows) which is very strange. as i see it - the problem lies in sl_event table, which has now 49222 records. sl_status shows: # SELECT * from sl_status ; -[ RECORD 1 ]-------------+--------------------------- st_origin | 1 st_received | 2 st_last_event | 251011 st_last_event_ts | 2005-12-11 13:47:00.524709 st_last_received | 201902 st_last_received_ts | 2005-12-11 13:44:55.793112 st_last_received_event_ts | 2005-12-09 20:44:57.414378 st_lag_num_events | 49109 st_lag_time | 1 day 17:02:03.85666 most probably - the information i am giving you are not enoght, but i just dont know what else i can supply to help you help me. basically - when the problem showed - there was nothing special happening in database. right now both slons are raporting similar things. master: 2005-12-11 12:24:50 MSK DEBUG1 cleanupThread: 5.653 seconds for delete logs 2005-12-11 12:36:52 MSK DEBUG1 cleanupThread: 28.315 seconds for cleanupEvent() 2005-12-11 12:37:01 MSK DEBUG1 cleanupThread: 9.285 seconds for delete logs 2005-12-11 12:48:56 MSK DEBUG1 cleanupThread: 31.703 seconds for cleanupEvent() 2005-12-11 12:49:05 MSK DEBUG1 cleanupThread: 8.735 seconds for delete logs 2005-12-11 13:01:44 MSK DEBUG1 cleanupThread: 27.764 seconds for cleanupEvent() 2005-12-11 13:01:51 MSK DEBUG1 cleanupThread: 7.696 seconds for delete logs 2005-12-11 13:13:36 MSK DEBUG1 cleanupThread: 12.179 seconds for cleanupEvent() 2005-12-11 13:13:38 MSK DEBUG1 cleanupThread: 2.141 seconds for delete logs 2005-12-11 13:26:16 MSK DEBUG1 cleanupThread: 29.539 seconds for cleanupEvent() 2005-12-11 13:26:22 MSK DEBUG1 cleanupThread: 6.845 seconds for delete logs 2005-12-11 13:37:28 MSK DEBUG1 cleanupThread: 23.460 seconds for cleanupEvent() 2005-12-11 13:37:34 MSK DEBUG1 cleanupThread: 5.345 seconds for delete logs slave: 2005-12-11 12:28:13 MSK DEBUG1 cleanupThread: 0.025 seconds for delete logs 2005-12-11 12:39:07 MSK DEBUG1 cleanupThread: 0.038 seconds for cleanupEvent() 2005-12-11 12:39:07 MSK DEBUG1 cleanupThread: 0.033 seconds for delete logs 2005-12-11 12:50:04 MSK DEBUG1 cleanupThread: 0.033 seconds for cleanupEvent() 2005-12-11 12:50:04 MSK DEBUG1 cleanupThread: 0.026 seconds for delete logs 2005-12-11 13:02:02 MSK DEBUG1 cleanupThread: 0.049 seconds for cleanupEvent() 2005-12-11 13:02:02 MSK DEBUG1 cleanupThread: 0.025 seconds for delete logs 2005-12-11 13:14:07 MSK DEBUG1 cleanupThread: 0.037 seconds for cleanupEvent() 2005-12-11 13:14:07 MSK DEBUG1 cleanupThread: 0.037 seconds for delete logs 2005-12-11 13:25:40 MSK DEBUG1 cleanupThread: 0.032 seconds for cleanupEvent() 2005-12-11 13:25:40 MSK DEBUG1 cleanupThread: 0.025 seconds for delete logs 2005-12-11 13:36:46 MSK DEBUG1 cleanupThread: 0.055 seconds for cleanupEvent() 2005-12-11 13:36:46 MSK DEBUG1 cleanupThread: 0.035 seconds for delete logs 2005-12-11 13:48:20 MSK DEBUG1 cleanupThread: 0.036 seconds for cleanupEvent() 2005-12-11 13:48:20 MSK DEBUG1 cleanupThread: 0.026 seconds for delete logs could you help me somehow? this replication is very improtant for me - as we have to deploy it finally, and still have problems. best regards depesz -------------- next part -------------- An HTML attachment was scrubbed... URL: http://gborg.postgresql.org/pipermail/slony1-general/attachments/20051211/1e5feb1f/attachment-0001.html
- Previous message: [Slony1-general] Preparations for Slony-I 1.1.5
- Next message: [Slony1-general] replication stalled - with no errors raported by slon ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list