JP Fletcher jpfletch at ca.afilias.info
Wed Sep 17 09:44:26 PDT 2008
Jan Wieck wrote:
> On 9/16/2008 3:21 PM, JP Fletcher wrote:
>> Hi,
>>
>> I've just compiled v 1.2.15 with pg 8.1.13, and found that the 
>> testcase 'testmultiplemoves' fails about 50% of the time.
>
> Such test should not have a timeout on wait for event. Unfortunately, 
> the wait for event has a default timeout, so one has to explicitly 
> specify timeout = 0.
Probably there shouldn't be a timeout, but i'm a bit confused as to why 
an '<IDLE IN TRANSACTION>' connection appears on the third node.  I 
reproduced the case, and found that the offending connection did the 
following, prior to becoming '<IDLE IN TRANSACTION>':

2008-09-17 16:11:52.804 UTC [672172] [unknown] [unknown] LOG:  
connection received: host=127.0.0.1 port=60694
2008-09-17 16:11:52.805 UTC [672172] slonyregress3 pgdba 127.0.0.1LOG:  
connection authorized: user=pgdba database=slonyregress3
2008-09-17 16:11:52.815 UTC [672172] slonyregress3 pgdba 127.0.0.1LOG:  
statement: set datestyle to 'ISO'
2008-09-17 16:11:52.818 UTC [672172] slonyregress3 pgdba 127.0.0.1LOG:  
statement: SELECT version();
2008-09-17 16:11:52.819 UTC [672172] slonyregress3 pgdba 127.0.0.1LOG:  
statement: set escape_string_warning to 'off'
2008-09-17 16:11:52.820 UTC [672172] slonyregress3 pgdba 127.0.0.1LOG:  
statement: select "_slony_regress1".setSessionRole('_slony_regress1', 
'slon');
2008-09-17 16:11:52.831 UTC [672172] slonyregress3 pgdba 127.0.0.1LOG:  
statement: begin transaction; set transaction isolation level 
serializable; lock table "_slony_regress1".sl_config_lock;
2008-09-17 16:11:52.832 UTC [672172] slonyregress3 pgdba 127.0.0.1LOG:  
statement: update "_slony_regress1".sl_archive_counter     set ac_num = 
ac_num + 1,         ac_timestamp = CURRENT_TIMESTAMP; select ac_num, 
ac_timestamp from "_slony_regress1".sl_archive_counter;

I've seen the following messages in slon logs, seemingly related:

slon_log.3:2008-09-17 16:11:52 UTC WARN   serialization problem updating 
sl_archive_counter: restarting slon

>
>
> Jan
>
>>
>> The subscription of node2 to set 2 eventually times out when it 
>> doesn't get a confirmation from node 3.
>>
>> ./run_test.sh: ERROR: Slonik error see 
>> /tmp/slony-regress.xbeqaimyk/slonik.log for details
>> pgdba at ydb1.int.libertyrms.com:/opt/rg/data_dba/src/slony1-1.2.15/tests 
>> $ cat /tmp/slony-regress.xbeqaimyk/slonik.log
>> <stdin>:25: timeout exceeded while waiting for event confirmation
>>
>> slonyregress3=# SELECT * from _slony_regress1.sl_event where ev_type 
>> <> 'SYNC' order by ev_seqno desc limit 1;
>>  ev_origin | ev_seqno |       ev_timestamp        | ev_minxid | 
>> ev_maxxid | ev_xip |       ev_type       | ev_data1 | ev_data2 | 
>> ev_data3 | ev_data4 | ev_data5 | ev_data6 | ev_data7 | ev_data8
>> -----------+----------+---------------------------+-----------+-----------+--------+---------------------+----------+----------+----------+----------+----------+----------+----------+---------- 
>>
>>          1 |       26 | 2008-09-16 19:02:24.31597 | 97730     | 
>> 97731     |        | ENABLE_SUBSCRIPTION | 2        | 1        | 
>> 2        | t        |          |          |          |
>> (1 row)
>>
>>
>> This is caused by the presence of an '<IDLE IN TRANSACTION>' 
>> connection on node 3 that is holding an AccessExclusiveLock on 
>> sl_config_lock:
>>
>> slonyregress3=# SELECT * from pg_locks where not granted;
>>  locktype | database | relation | page | tuple | transactionid | 
>> classid | objid | objsubid | transaction |  pid   |        
>> mode         | granted
>> ----------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+--------+---------------------+--------- 
>>
>>  relation |    29184 |    30112 |      |       |               
>> |         |       |          |       98030 | 979420 | 
>> AccessExclusiveLock | f
>> (1 row)
>>
>> slonyregress3=# SELECT relname from pg_class where oid = 30112;
>>     relname    ----------------
>>  sl_config_lock
>> (1 row)
>>
>> slonyregress3=# SELECT * from pg_stat_activity where procpid = 
>> (SELECT pid from pg_locks where not granted);
>>  datid |    datname    | procpid | usesysid | usename 
>> |                                                 
>> current_query                                                  
>> |          query_start          |         backend_start         | 
>> client_addr | client_port
>> -------+---------------+---------+----------+---------+----------------------------------------------------------------------------------------------------------------+-------------------------------+-------------------------------+-------------+------------- 
>>
>>  29184 | slonyregress3 |  979420 |       10 | pgdba   | begin 
>> transaction; set transaction isolation level serializable; lock table 
>> "_slony_regress1".sl_config_lock;  | 2008-09-16 19:02:37.567212+00 | 
>> 2008-09-16 19:01:23.220162+00 | 127.0.0.1   |       57649
>> (1 row)
>>
>> slonyregress3=# SELECT * from pg_locks where relation = 30112 and 
>> granted; locktype | database | relation | page | tuple | 
>> transactionid | classid | objid | objsubid | transaction |  pid   
>> |        mode         | granted
>> ----------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+--------+---------------------+--------- 
>>
>>  relation |    29184 |    30112 |      |       |               
>> |         |       |          |       98029 | 950632 | 
>> AccessExclusiveLock | t
>> (1 row)
>>
>> slonyregress3=# SELECT * from pg_stat_activity where procpid = 950632;
>>  datid |    datname    | procpid | usesysid | usename |     
>> current_query     |          query_start          |         
>> backend_start         | client_addr | client_port
>> -------+---------------+---------+----------+---------+-----------------------+-------------------------------+-------------------------------+-------------+------------- 
>>
>>  29184 | slonyregress3 |  950632 |       10 | pgdba   | <IDLE> in 
>> transaction | 2008-09-16 19:02:37.568543+00 | 2008-09-16 
>> 19:01:23.201131+00 | 127.0.0.1   |       57644
>> (1 row)
>>
>> Killing the '<IDLE IN TRANSACTION>' connection makes the problem go 
>> away.  What's going on here?
>>
>>
>
>


-- 
JP Fletcher
Database Administrator
Afilias Canada
voice: 416.646.3304 ext. 4123
fax: 416.646.3305
mobile: 416.561.4763
jpfletch at ca.afilias.info




More information about the Slony1-general mailing list