Jason Chen yunfeng82 at gmail.com
Fri Oct 29 03:15:03 PDT 2010
Hi Steve,

After more troubleshooting and several times reconfiguration whole system,
it seems the issue should be in master node configuration. However, I still
haven't figured out the root cause and reproduce step. This is not
reproduced every time.

In the normal correct case, after configure master node, there has
consistent SYNC events which will be generated. However, in the error case,
looks like master node slon has hung and there is no new SYNC events
generated.

Do you have any more insight on the potential issue which might happen
during master node configuration?

Thanks,
Jason


On Fri, Oct 29, 2010 at 11:05 AM, Jason Chen <yunfeng82 at gmail.com> wrote:

> Hi Steve,
>
> Thanks for your response. During those 7 hours, actually I have done
> nothing but just leave it there. The sl_path will be correctly updated after
> that time. Another workaround to solve this issue is restart master node
> slon service which will also update sl_path to the correct value in
> pa_conninfo.
>
> I have turned on the highest level log on both slon and postgresql but
> didn't find more useful information. In the normal case, pa_conninfo in
> sl_path will be event pending only for a short time and then will be updated
> correctly after master generates STORE_PATH event. However, in the error
> case, looks like master will not generates STORE_PATH event.
>
> What will be case that cause master node hangs and cannot generate
> STORE_PATH event? Is there any more debug information I can check in gdb for
> the slon process?
>
> Thanks,
> Jason
>
>
>
> On Fri, Oct 29, 2010 at 2:05 AM, Steve Singer <ssinger at ca.afilias.info>wrote:
>
>> My question is what were the slons doing during those 7 hours.
>>
>> If you configure your slon processes to log at the debug level they should
>> print a fair amount of stuff.
>>
>> You get the <event pending> entries in sl_path when you subscribe the set
>> before processing the STORE_PATH message on the other node.
>>
>> What maybe want to do is move define_replication_set to come after you've
>> started up the slons.
>>
>> Having said that even if you do things in the order you described things
>> still should have worked and I don't see why it took 7 hours to update
>> sl_path.  What were the slons doing during those 7 hours.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-hackers/attachments/20101029/9157f911/attachment.htm 


More information about the Slony1-hackers mailing list