Jason Chen yunfeng82 at gmail.com
Sat Oct 30 04:19:53 PDT 2010
After the error system run several hours, it becomes normal again. So I need
to redeploy the testbed and get the backtrace. Basically, I have compared
the error master node with normal master node. The only difference are there
have only 5 threads in error master which missing remoteListener and
remoteWorker thread. If you need this details, I will get it and let you
know next Monday after access my system.

Do you think there has any issue in the configuration process?

Here is the backtrace of the error master node which has become normal
currently.

*(gdb) thread apply all bt

Thread 7 (Thread 0x4159a940 (LWP 6365)):
#0  0x00007fc74a5f6da2 in select () from /lib64/libc.so.6
#1  0x000000000041396e in sched_mainloop (dummy=<value optimized out>) at
scheduler.c:532
#2  0x00007fc74a8852f7 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fc74a5fd85d in clone () from /lib64/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x4094c940 (LWP 6371)):
#0  0x00007fc74a8894a6 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x000000000041334e in sched_wait_conn (conn=0x630d70, condition=0) at
scheduler.c:230
#2  0x00000000004056ee in localListenThread_main (dummy=<value optimized
out>) at local_listen.c:701
#3  0x00007fc74a8852f7 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fc74a5fd85d in clone () from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x41d9b940 (LWP 6376)):
#0  0x00007fc74a8894a6 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x000000000041334e in sched_wait_conn (conn=0x6317f0, condition=0) at
scheduler.c:230
#2  0x0000000000412a7e in cleanupThread_main (dummy=<value optimized out>)
at cleanup_thread.c:113
#3  0x00007fc74a8852f7 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fc74a5fd85d in clone () from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x4274c940 (LWP 6380)):
#0  0x00007fc74a8894a6 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x000000000041334e in sched_wait_conn (conn=0x642650, condition=0) at
scheduler.c:230
#2  0x00000000004125b6 in syncThread_main (dummy=<value optimized out>) at
sync_thread.c:101
#3  0x00007fc74a8852f7 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fc74a5fd85d in clone () from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x42f4d940 (LWP 8283)):
#0  0x00007fc74a8894a6 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x000000000040c5c3 in remoteWorkerThread_main (cdata=<value optimized
out>) at remote_worker.c:479
#2  0x00007fc74a8852f7 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fc74a5fd85d in clone () from /lib64/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x4374e940 (LWP 8285)):
#0  0x00007fc74a8894a6 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x000000000041334e in sched_wait_conn (conn=0x6433b0, condition=0) at
scheduler.c:230
#2  0x0000000000406b0a in remoteListenThread_main (cdata=<value optimized
out>) at remote_listen.c:339
#3  0x00007fc74a8852f7 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fc74a5fd85d in clone () from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7fc74aeba6e0 (LWP 6363)):
#0  0x00007fc74a8865b5 in pthread_join () from /lib64/libpthread.so.0
#1  0x0000000000413582 in sched_wait_mainloop () at scheduler.c:172
#2  0x0000000000402f31 in SlonWatchdog () at slon.c:740
#3  0x0000000000403c58 in main (argc=6, argv=0x7fff9a3202b8) at slon.c:355
#0  0x00007fc74a8865b5 in pthread_join () from /lib64/libpthread.so.0*


On Sat, Oct 30, 2010 at 4:28 AM, Steve Singer <ssinger at ca.afilias.info>wrote:

> On 10-10-29 11:12 AM, Jason Chen wrote:
>
>> That is correct. In the error node, the master node cannot get
>> STORE_PATH event and cannot start remoteListen and remoteWorker threads.
>>
>>
> You mentioned previosuly something about gdb.
>
> Can you connect to the slon process while it is in this state to see what
> it is doing.
>
> ie  'info threads' to display a list of threads
>
> thread 1
> thread 2
> etc..
> to switch between threads.
>
> and bt to show the stack trace of each thread.
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-hackers/attachments/20101030/d6eb5f5f/attachment.htm 


More information about the Slony1-hackers mailing list