[Slony1-general] fetch 500 from LOG

Fri Nov 11 12:46:55 PST 2011

General lag on the slave node (as recorded in sl_status) is less then 30 seconds.  This is a heavily transacted system running on very nice hardware so perhaps any problems are being masked by that.

I've read up on the issue and we don't appear to be experiencing any of the bugs related to this issue that I can find in the news groups.  No long running transactions, no old nodes in the sl_ tables.  In general, the system appears to be healthy (idle proc time ~95%), good buffer cache hit ratios, etc.

Thanks for the replies though.  I'll look into implementing 2.1 although we just did the upgrade to 2.0.7 and I'm not sure management will go for another down during the holiday season.  Just doing my due diligence as our load will rise steadily through the holiday season to very large load on these servers and I wanted to make sure the servers looked solid before we through 30 X the current load at them.

Mike Wilson
Predicate Logic
Cell: (310) 600-8777
SkypeID: lycovian

On Nov 11, 2011, at 11:09 AM, Steve Singer wrote:

> On 11-11-11 02:04 PM, Mike Wilson wrote:
>> 
>> Mike Wilson
>> Predicate Logic
>> Cell: (310) 600-8777
>> SkypeID: lycovian
>> 
>> 
>> From my postgresql.log:
>> 2011-11-11 11:03:15.237 PST db1.lax.jib(55096):LOG:  duration: 133.011 ms  statement: fetch 500 from LOG;
>> 2011-11-11 11:03:17.241 PST db1.lax.jib(55096):LOG:  duration: 134.842 ms  statement: fetch 500 from LOG;
>> 2011-11-11 11:03:19.239 PST db1.lax.jib(55096):LOG:  duration: 133.919 ms  statement: fetch 500 from LOG;
>> 2011-11-11 11:03:21.240 PST db1.lax.jib(55096):LOG:  duration: 133.194 ms  statement: fetch 500 from LOG;
>> 2011-11-11 11:03:23.241 PST db1.lax.jib(55096):LOG:  duration: 134.288 ms  statement: fetch 500 from LOG;
>> 2011-11-11 11:03:25.241 PST db1.lax.jib(55096):LOG:  duration: 133.226 ms  statement: fetch 500 from LOG;
>> 
>> I'm only logging statements that take longer than 100ms to run.
>> 
>> Here is my output from sl_log1/2:
>> select (select count(*) from sl_log_1) sl_log_1, (select count(*) from sl_log_2) sl_log_2;
>>  sl_log_1 | sl_log_2
>> ----------+----------
>>    119239 |    43685
> 
> The fetch is taking a long time because sl_log_1 is big.  (The reason it takes so long is actually a bug that was fixed in 2.1)  sl_log_1 being that big probably means that log switching isn't happening.
> 
> Do you have any nodes that are behind?  (query sl_status on all your nodes)
> Do you have any old nodes that are still listed in sl_node that you aren't using anymore?
> Do (did) you have a long running transaction in your system that is preventing the log switch from taking place?
> 
> 
> 
> 
> 
>> 
>> 
>> On Nov 11, 2011, at 5:07 AM, Steve Singer wrote:
>> 
>>> On 11-11-09 01:19 PM, Mike Wilson wrote:
>>>> Seeing "fetch 500 from LOG" almost continuously in my PG logs for a new Slony 2.0.7 install.  The previous version (2.0.3?) didn't show these messages in the PG log.  Researching the issue, historically, this message was usually accompanied by a performance issue.  This isn't the case with my databases though, they appear to be running just as well as ever and the lag between replicated nodes appears to be about the same as the previous version.
>>>> 
>>>> I guess my question is what does this message mean in this version of Slony?  Is it an indication of sub-optimal slon parameters?
>>>> slon -g 20 $SLON_CLUSTER "host=$HOSTNAME port=$PORT dbname=$DB user=$USER"
>>>> 
>>>> And how can I get rid of it if it's not an issue?
>>>> 
>>>> Mike
>>> 
>>> What is causing the 'fetch 500' statements to show up in the server log? Are you only logging SQL that takes longer than x milliseconds? If so how long are your fetch 500 statements taking?  How many rows are in your sl_log_1 and sl_log_2?
>>> 
>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Slony1-general mailing list
>>>> Slony1-general at lists.slony.info
>>>> http://lists.slony.info/mailman/listinfo/slony1-general
>>> 
>> 
>