Mon Sep 10 15:44:20 PDT 2007
- Previous message: [Slony1-general] Processing of SYNC from origin node
- Next message: [Slony1-general] Processing of SYNC from origin node
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 9/10/07, Cyril SCETBON <cscetbon.ext at orange-ftgroup.com> wrote: > > > Cyril SCETBON wrote: > > > > > > Jan Wieck wrote: > >> On 9/7/2007 9:36 AM, Cyril SCETBON wrote: > >>> Hi, > >>> > >>> I got this configuration Node1 --> Node2 (5 seconds > >>> late) > >>> | > >>> --> Node3 > >>> (2 hours late) > >>> > >>> Node2 is processing each SYNC from Node3 and Node2, but Node3 is > >>> processing each SYNC from Node2 but not from Node1 which is the > >>> origin of the sets : > >>> > >>> On Node3 we see `grep processing > >>> /var/log/slony1/node3-pns_profiles_preprod.log|awk '{print > >>> $5}'|sort|uniq -c` > >>> 19 remoteWorkerThread_1: > >>> 963 remoteWorkerThread_2: > >>> > >>> On Node2 we see `grep processing > >>> /var/log/slony1/node2-pns_profiles_preprod.log |awk '{print > >>> $5}'|sort|uniq -c` > >>> 1570 remoteWorkerThread_1: > >>> 865 remoteWorkerThread_3: > >>> > >>> Why is there so many SYNC not processed on Node3 ??? > >>> > >>> Node3 got 22440 queue event and 25 Received event from > >>> remoteWorkerThread_1, while Node2 got 4467 queue event and 1578 > >>> Received event from the same worker. > >>> > >>> Is there something to do ? > >> > >> How about looking for some error messages? > > None. > I've put slon in debug level 2 > >> > >> What comes to mind would be that sl_event is grossly out of shape and > >> that the event selection times out. > > Seems vacuuming sl_log_1 takes too much time cause of > > vacuum_cost_delay and that selecting from this table use a seq scan. > > I'm investiguating. > I forced vacuum to go faster and checked slon logs of subscribers. They > got similar disks capabilities which seems to be the bottleneck on all > node (wait io ~=3D50% in vmstat). > > I found replication tasks time are different : > > On node 3 : > delay in seconds =3D 585.974ms > cleanupEvent in seconds =3D 9.25167s > > On node 2 : > delay in seconds =3D 37.6463ms > cleanupEvent in seconds =3D 0.203265s > > May these times explain why node 3 is late compared to node 2 ? What do > you think I have to investiguate now ? > > PS: hosts consume the same processor load but node 2 is a biprocessor > 2.6Ghz and node 3 is a biprocessor dual core 1.8Ghz (4 processors seen > by Linux kernel SMP) > So... the computer with the slower processor is slower? What delay are you referring too? If it's from _foo.sl_status.st_lag_time then you should be aware that it's actual precision is about +/-5 seconds. While the cleanup is disk intensive, it also does a good chunk of number crunching. I'm surprised to see an order of magnitude in difference, but... not shocked. Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20070910/= b7f9e201/attachment.htm
- Previous message: [Slony1-general] Processing of SYNC from origin node
- Next message: [Slony1-general] Processing of SYNC from origin node
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list