John Sidney-Woollett johnsw
Thu Dec 22 08:40:34 PST 2005
In trying to investigate a possible memory issue that affects only one 
of our servers, I have been logging the process list for postgres 
related items 4 times a day for the past few days.

This server uses postgres 7.4.6 + slon 1.1.0 on Debian i686 (Linux 
server2 2.6.8.1-4-686-smp) and is a slon slave in a two server 
replicated cluster. Our master DB (similar setup) does not exbibit this 
problem at all - only the subscriber node...

The load average starts to go mental once the machine has to start 
swapping (ie starts running out of physical RAM). The solution so far is 
to stop and restart both slon and postgres and things return to normal 
for another 2 weeks.

I know other people have reported similar things but there doesn't seem 
to be an explanation or solution (other than stopping and starting the 
two processes).

Can anyone suggest what else to look at on the server to see what might 
be going on?

Appreciate any help or advice anyone can offer. I'm not a C programmer 
nor a unix sysadmin, so any advice needs to be simple to understand.

Thanks

John

The first log is 14th Dec and the second is the 22nd Dec. You can see 
the slon process (id=27844) using more memory over time. It's memory map 
and the postmaster are posted below too.

~/meminfo # cat 200512141855.log
27806     1 1052 15288  0.0  0.1 /usr/local/pgsql/bin/postmaster
27809 27806  812  6024  0.0  0.0 pg: stats buffer process
27810 27809  816  5032  0.0  0.0 pg: stats collector process
27821 27806 10744 16236  0.1  1.0 pg: postgres bp_live 192.168.22.76 idle
27842     1  620  2324  0.0  0.0 /usr/local/pgsql/bin/slon -d 1 bprepl4
27844 27842 5920 66876  0.0  0.5 /usr/local/pgsql/bin/slon -d 1 bprepl4
27847 27806 10488 16020  0.0  1.0 pg: postgres bp_live [local] idle
27852 27806 12012 17020  1.1  1.1 pg: postgres bp_live [local] idle
27853 27806 11452 16868  0.0  1.1 pg: postgres bp_live [local] idle
27854 27806 10756 16240  0.0  1.0 pg: postgres bp_live [local] idle

~/meminfo # cat 200512220655.log
27806     1  940 15288  0.0  0.0 /usr/local/pgsql/bin/postmaster
27809 27806  752  6024  0.0  0.0 p: stats buffer process
27810 27809  764  5032  0.0  0.0 pg: stats collector process
27821 27806 4684 16236  0.0  0.4 pg: postgres bp_live 192.168.22.76 idle
27842     1  564  2324  0.0  0.0 /usr/local/pgsql/bin/slon -d 1 bprepl4
27844 27842 2368 70096  0.0  0.2 /usr/local/pgsql/bin/slon -d 1 bprepl4
27847 27806 4460 16020  0.0  0.4 pg: postgres bp_live [local] idle
27852 27806 11576 17020  1.0  1.1 pg: postgres bp_live [local] idle
27853 27806 11328 16868  0.0  1.0 pg: postgres bp_live [local] idle
27854 27806 4640 16240  0.0  0.4 pg: postgres bp_live [local] idle

The top listing (right now is) - the key thing is the kswapd0 process. 
Once physical memory becomes exhausted, the server goes into rapid 
decline as the swap burden increases...

top-08:27:27 up 43 days, 42 min, 1 user, load average: 0.01, 0.04, 0.00
Tasks:  85 total,  1 running, 84 sleeping,  0 stopped,  0 zombie
Cpu(s):  0.1% us, 0.0% sy, 0.0% ni, 99.4% id, 0.5% wa, 0.0% hi, 0.0% si
Mem:   1035612k total,  1030512k used,     5100k free,    46416k buffers
Swap:   497972k total,   157088k used,   340884k free,    28088k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
27821 postgres  16   0 16236 6480  14m S  0.3  0.6  14:00.34 postmaster
18939 root      16   0  2044 1040 1820 R  0.3  0.1   0:00.02 top
     1 root      16   0  1492  136 1340 S  0.0  0.0   0:05.43 init
     2 root      RT   0     0    0    0 S  0.0  0.0   0:02.51 migration/0
     3 root      34  19     0    0    0 S  0.0  0.0   0:00.02 ksoftirqd/0
     4 root      RT   0     0    0    0 S  0.0  0.0   0:05.35 migration/1
     5 root      34  19     0    0    0 S  0.0  0.0   0:00.05 ksoftirqd/1
     6 root      RT   0     0    0    0 S  0.0  0.0   0:04.91 migration/2
     7 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/2
     8 root      RT   0     0    0    0 S  0.0  0.0   0:21.87 migration/3
     9 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/3
    10 root       5 -10     0    0    0 S  0.0  0.0   0:00.20 events/0
    11 root       5 -10     0    0    0 S  0.0  0.0   0:00.06 events/1
    12 root       5 -10     0    0    0 S  0.0  0.0   0:00.01 events/2
    13 root       5 -10     0    0    0 S  0.0  0.0   0:00.00 events/3
    14 root       8 -10     0    0    0 S  0.0  0.0   0:00.00 khelper
    15 root       7 -10     0    0    0 S  0.0  0.0   0:00.00 kacpid
    67 root       5 -10     0    0    0 S  0.0  0.0  19:26.36 kblockd/0
    68 root       5 -10     0    0    0 S  0.0  0.0   0:59.05 kblockd/1
    69 root       5 -10     0    0    0 S  0.0  0.0   0:08.40 kblockd/2
    70 root       5 -10     0    0    0 S  0.0  0.0   0:10.17 kblockd/3
    82 root      15   0     0    0    0 S  0.0  0.0 624:18.25 kswapd0
[snipped]

The memory map for the slon process is below.

cat /proc/27844/maps
08048000-08067000 r-xp 00000000 08:0c 198200   /usr/local/pgsql/bin/slon
08067000-08069000 rw-p 0001f000 08:0c 198200   /usr/local/pgsql/bin/slon
08069000-088ae000 rw-p 08069000 00:00 0
40000000-40015000 r-xp 00000000 08:02 10242439 /lib/ld-2.3.2.so
40015000-40016000 rw-p 00015000 08:02 10242439 /lib/ld-2.3.2.so
40016000-40017000 rw-p 40016000 00:00 0
40017000-4002e000 r-xp 00000000 08:0c 211788 
/usr/local/pgsql/lib/libpq.so.3.1
4002e000-4002f000 rw-p 00017000 08:0c 211788 
/usr/local/pgsql/lib/libpq.so.3.1
40033000-40040000 r-xp 00000000 08:02 10240037 
/lib/tls/i686/cmov/libpthread-0.60.so
40040000-40041000 rw-p 0000c000 08:02 10240037 
/lib/tls/i686/cmov/libpthread-0.60.so
40041000-40043000 rw-p 40041000 00:00 0
40043000-4016b000 r-xp 00000000 08:02 10240024 
/lib/tls/i686/cmov/libc-2.3.2.so
4016b000-40174000 rw-p 00127000 08:02 10240024 
/lib/tls/i686/cmov/libc-2.3.2.so
40174000-40177000 rw-p 40174000 00:00 0
40177000-4017b000 r-xp 00000000 08:02 10240025 
/lib/tls/i686/cmov/libcrypt-2.3.2.so
4017b000-4017c000 rw-p 00003000 08:02 10240025 
/lib/tls/i686/cmov/libcrypt-2.3.2.so
4017c000-401a3000 rw-p 4017c000 00:00 0
401a3000-401b1000 r-xp 00000000 08:02 10240038 
/lib/tls/i686/cmov/libresolv-2.3.2.so
401b1000-401b2000 rw-p 0000e000 08:02 10240038 
/lib/tls/i686/cmov/libresolv-2.3.2.so
401b2000-401b4000 rw-p 401b2000 00:00 0
401b4000-401c5000 r-xp 00000000 08:02 10240029 
/lib/tls/i686/cmov/libnsl-2.3.2.so
401c5000-401c6000 rw-p 00011000 08:02 10240029 
/lib/tls/i686/cmov/libnsl-2.3.2.so
401c6000-401e9000 rw-p 401c6000 00:00 0
401ed000-401f4000 r-xp 00000000 08:02 10240030 
/lib/tls/i686/cmov/libnss_compat-2.3.2.so
401f4000-401f5000 rw-p 00006000 08:02 10240030 
/lib/tls/i686/cmov/libnss_compat-2.3.2.so
401f5000-401fd000 r-xp 00000000 08:02 10240034 
/lib/tls/i686/cmov/libnss_nis-2.3.2.so
401fd000-401fe000 rw-p 00008000 08:02 10240034 
/lib/tls/i686/cmov/libnss_nis-2.3.2.so
401fe000-40206000 r-xp 00000000 08:02 10240032 
/lib/tls/i686/cmov/libnss_files-2.3.2.so
40206000-40207000 rw-p 00008000 08:02 10240032 
/lib/tls/i686/cmov/libnss_files-2.3.2.so
40207000-40208000 ---p 40207000 00:00 0
40208000-40a08000 rwxp 40208000 00:00 0
40a08000-40a09000 ---p 40a08000 00:00 0
40a09000-41209000 rwxp 40a09000 00:00 0
41209000-4120a000 ---p 41209000 00:00 0
4120a000-41a0a000 rwxp 4120a000 00:00 0
41a0a000-41a0b000 ---p 41a0a000 00:00 0
41a0b000-4220b000 rwxp 41a0b000 00:00 0
4220b000-4220c000 ---p 4220b000 00:00 0
4220c000-42a0c000 rwxp 4220c000 00:00 0
42a0c000-42a0d000 ---p 42a0c000 00:00 0
42a0d000-4320d000 rwxp 42a0d000 00:00 0
43211000-43214000 r-xp 00000000 08:02 10240031 
/lib/tls/i686/cmov/libnss_dns-2.3.2.so
43214000-43215000 rw-p 00002000 08:02 10240031 
/lib/tls/i686/cmov/libnss_dns-2.3.2.so
43300000-43382000 rw-p 43300000 00:00 0
43382000-43400000 ---p 43382000 00:00 0
43400000-43401000 ---p 43400000 00:00 0
43401000-43c01000 rwxp 43401000 00:00 0
43d00000-43d21000 rw-p 43d00000 00:00 0
43d21000-43e00000 ---p 43d21000 00:00 0
bfffc000-c0000000 rw-p bfffc000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0

The memory map for the postmaster is

cat /proc/27806/maps
08048000-08274000 r-xp 00000000 08:0c 198133 
/usr/local/pgsql/bin/postgres
08274000-0827d000 rw-p 0022b000 08:0c 198133 
/usr/local/pgsql/bin/postgres
0827d000-082d8000 rw-p 0827d000 00:00 0
40000000-40015000 r-xp 00000000 08:02 10242439   /lib/ld-2.3.2.so
40015000-40016000 rw-p 00015000 08:02 10242439   /lib/ld-2.3.2.so
40016000-40017000 rw-p 40016000 00:00 0
4001b000-4002b000 r-xp 00000000 08:0c 1107640    /usr/lib/libz.so.1.2.1.1
4002b000-4002c000 rw-p 0000f000 08:0c 1107640    /usr/lib/libz.so.1.2.1.1
4002c000-40052000 r-xp 00000000 08:02 10240428   /lib/libreadline.so.4.3
40052000-40056000 rw-p 00025000 08:02 10240428   /lib/libreadline.so.4.3
40056000-40057000 rw-p 40056000 00:00 0
40057000-4005b000 r-xp 00000000 08:02 10240025 
/lib/tls/i686/cmov/libcrypt-2.3.2.so
4005b000-4005c000 rw-p 00003000 08:02 10240025 
/lib/tls/i686/cmov/libcrypt-2.3.2.so
4005c000-40084000 rw-p 4005c000 00:00 0
40084000-40092000 r-xp 00000000 08:02 10240038 
/lib/tls/i686/cmov/libresolv-2.3.2.so
40092000-40093000 rw-p 0000e000 08:02 10240038 
/lib/tls/i686/cmov/libresolv-2.3.2.so
40093000-40095000 rw-p 40093000 00:00 0
40095000-400a6000 r-xp 00000000 08:02 10240029 
/lib/tls/i686/cmov/libnsl-2.3.2.so
400a6000-400a7000 rw-p 00011000 08:02 10240029 
/lib/tls/i686/cmov/libnsl-2.3.2.so
400a7000-400a9000 rw-p 400a7000 00:00 0
400a9000-400ab000 r-xp 00000000 08:02 10240026 
/lib/tls/i686/cmov/libdl-2.3.2.so
400ab000-400ac000 rw-p 00001000 08:02 10240026 
/lib/tls/i686/cmov/libdl-2.3.2.so
400ac000-400ce000 r-xp 00000000 08:02 10240027 
/lib/tls/i686/cmov/libm-2.3.2.so
400ce000-400cf000 rw-p 00021000 08:02 10240027 
/lib/tls/i686/cmov/libm-2.3.2.so
400cf000-401f7000 r-xp 00000000 08:02 10240024 
/lib/tls/i686/cmov/libc-2.3.2.so
401f7000-40200000 rw-p 00127000 08:02 10240024 
/lib/tls/i686/cmov/libc-2.3.2.so
40200000-40202000 rw-p 40200000 00:00 0
40202000-40236000 r-xp 00000000 08:02 10240103   /lib/libncurses.so.5.4
40236000-4023e000 rw-p 00034000 08:02 10240103   /lib/libncurses.so.5.4
4023e000-40240000 rw-p 4023e000 00:00 0
40244000-4024b000 r-xp 00000000 08:02 10240030 
/lib/tls/i686/cmov/libnss_compat-2.3.2.so
4024b000-4024c000 rw-p 00006000 08:02 10240030 
/lib/tls/i686/cmov/libnss_compat-2.3.2.so
4024c000-40254000 r-xp 00000000 08:02 10240034 
/lib/tls/i686/cmov/libnss_nis-2.3.2.so
40254000-40255000 rw-p 00008000 08:02 10240034 
/lib/tls/i686/cmov/libnss_nis-2.3.2.so
40255000-4025d000 r-xp 00000000 08:02 10240032 
/lib/tls/i686/cmov/libnss_files-2.3.2.so
4025d000-4025e000 rw-p 00008000 08:02 10240032 
/lib/tls/i686/cmov/libnss_files-2.3.2.so
4025e000-40c62000 rw-s 00000000 00:06 65536      /SYSV0052e2c1 (deleted)
bfffc000-c0000000 rw-p bfffc000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0



More information about the Slony1-general mailing list