[Slony1-general] Issue

Mon Aug 30 22:29:16 PDT 2004

I seem to be making a habit of replying to my own posts.  Anyone seen 
this before?  I'm beginning to wounder if I have a BO problem.

DeJuan Jackson wrote:

> Well, bad form replying to my own post, but no one else has so I'll 
> have to.
>
> This is the slon output at -d 4 Node 1:
> CONFIG main: local node id = 1
> CONFIG main: loading current cluster configuration
> CONFIG storeNode: no_id=2 no_comment='Node 2'
> DEBUG2 setNodeLastEvent: no_id=2 event_seq=7032
> CONFIG storePath: pa_server=2 pa_client=1 
> pa_conninfo="dbname=test_destination host=river user=postgres" 
> pa_connretry=10
> CONFIG storeListen: li_origin=2 li_receiver=1 li_provider=2
> CONFIG storeSet: set_id=1 set_origin=1 set_comment='All pgbench tables'
> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
> CONFIG storeSet: set_id=2 set_origin=1 set_comment='seq_test table'
> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
> DEBUG2 main: last local event sequence = 8095
> CONFIG main: configuration complete - starting threads
> DEBUG1 localListenThread: thread starts
> FATAL  localListenThread: Another slon daemon is serving this node 
> already
>
> Node 2:
> CONFIG main: local node id = 2
> CONFIG main: loading current cluster configuration
> CONFIG storeNode: no_id=1 no_comment='Node 1'
> DEBUG2 setNodeLastEvent: no_id=1 event_seq=8083
> CONFIG storePath: pa_server=1 pa_client=2 
> pa_conninfo="dbname=test_source host=river user=postgres" pa_connretry=10
> CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1
> CONFIG storeSet: set_id=1 set_origin=1 set_comment='All pgbench tables'
> WARN   remoteWorker_wakeup: node 1 - no worker thread
> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
> CONFIG storeSet: set_id=2 set_origin=1 set_comment='seq_test table'
> WARN   remoteWorker_wakeup: node 1 - no worker thread
> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
> CONFIG storeSubscribe: sub_set=1 sub_provider=1 sub_forward='f'
> WARN   remoteWorker_wakeup: node 1 - no worker thread
> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
> CONFIG enableSubscription: sub_set=1
> WARN   remoteWorker_wakeup: node 1 - no worker thread
> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
> CONFIG storeSubscribe: sub_set=2 sub_provider=1 sub_forward='f'
> WARN   remoteWorker_wakeup: node 1 - no worker thread
> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
> CONFIG enableSubscription: sub_set=2
> WARN   remoteWorker_wakeup: node 1 - no worker thread
> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
> DEBUG2 main: last local event sequence = 7032
> CONFIG main: configuration complete - starting threads
> DEBUG1 localListenThread: thread starts
> FATAL  localListenThread: Another slon daemon is serving this node 
> already
>
> Don't know if that will help.
> I looked at the pg_listener layout, and the only fix I can think of is 
> to check for the pid in the query.  This would only work from the DB 
> and only if stats are on.  But assuming stats are on then the 
> pg_stat_get_backend_idset function combined with the 
> pg_stat_get_backend_pid function would tell you what PID's are 
> current;y connected.  So, you could filter your listner list by this 
> data, and get a more representative list of active listners.  But this 
> would neccesitate a way to determine if stats are on, so you could 
> just use pg_stat_get_backend_idset to see if any rows come back after 
> an appropriate delay (I believe TOM said it can be as much as 500ms), 
> because your own connection should always be there at minimum.
>
> So, should I file this as a Bug should I submit a patch, or should I 
> just stick my problem somewhere dark and dank so that it's never heard 
> from again?  Enquiring minds want to know.
>
> DeJuan Jackson wrote:
>
>> I've been putting Slony-I 1.0.2 though it's paces so to speak and I 
>> have a concern /question.
>> select version();
>>
>>                                                 
>> version                                                
>> --------------------------------------------------------------------------------------------------------- 
>>
>> PostgreSQL 7.4.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 
>> 3.3.3 20040412 (Red Hat Linux 3.3.3-7)
>>
>> When I do the ever faithful pull the power on the test box while 
>> pgbench and replication is running,  once the box comes back up the 
>> slon's (both source and destination) die with a FATAL message 
>> "localListenThread: Another slon daemon is serving this node 
>> already".  I tracked this down to a check in 
>> src/slon/local_listner.c  The error message only happens when a row 
>> exists in the pg_catalog.pg_listener where relname = 
>> '_<clustername>_Restart'.
>>
>> I can clear the error up by issuing a NOTIFY "_<clustername>_Restart" 
>> on both the source and the target, then issuing a kill -9 on the two 
>> slon's that are running and the re-launching them (I've waited 
>> approcimately 3 minutes with no response from the slon's and normal 
>> kill doesn't work).  The NOTIFY get's rid of the old pg_listener 
>> entries, the kill get's rid of the current entries, and the restart 
>> prompts the new slon's to pick up where they left off before the 
>> simulated outage.
>>
>> Need any more info?
>
>
>
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/slony1-general