Wed Sep 17 09:44:26 PDT 2008
- Previous message: [Slony1-general] testmultiplemoves testcase fails
- Next message: [Slony1-general] Replication works for a short while on windows XP, then breaks on restart
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jan Wieck wrote: > On 9/16/2008 3:21 PM, JP Fletcher wrote: >> Hi, >> >> I've just compiled v 1.2.15 with pg 8.1.13, and found that the >> testcase 'testmultiplemoves' fails about 50% of the time. > > Such test should not have a timeout on wait for event. Unfortunately, > the wait for event has a default timeout, so one has to explicitly > specify timeout = 0. Probably there shouldn't be a timeout, but i'm a bit confused as to why an '<IDLE IN TRANSACTION>' connection appears on the third node. I reproduced the case, and found that the offending connection did the following, prior to becoming '<IDLE IN TRANSACTION>': 2008-09-17 16:11:52.804 UTC [672172] [unknown] [unknown] LOG: connection received: host=127.0.0.1 port=60694 2008-09-17 16:11:52.805 UTC [672172] slonyregress3 pgdba 127.0.0.1LOG: connection authorized: user=pgdba database=slonyregress3 2008-09-17 16:11:52.815 UTC [672172] slonyregress3 pgdba 127.0.0.1LOG: statement: set datestyle to 'ISO' 2008-09-17 16:11:52.818 UTC [672172] slonyregress3 pgdba 127.0.0.1LOG: statement: SELECT version(); 2008-09-17 16:11:52.819 UTC [672172] slonyregress3 pgdba 127.0.0.1LOG: statement: set escape_string_warning to 'off' 2008-09-17 16:11:52.820 UTC [672172] slonyregress3 pgdba 127.0.0.1LOG: statement: select "_slony_regress1".setSessionRole('_slony_regress1', 'slon'); 2008-09-17 16:11:52.831 UTC [672172] slonyregress3 pgdba 127.0.0.1LOG: statement: begin transaction; set transaction isolation level serializable; lock table "_slony_regress1".sl_config_lock; 2008-09-17 16:11:52.832 UTC [672172] slonyregress3 pgdba 127.0.0.1LOG: statement: update "_slony_regress1".sl_archive_counter set ac_num = ac_num + 1, ac_timestamp = CURRENT_TIMESTAMP; select ac_num, ac_timestamp from "_slony_regress1".sl_archive_counter; I've seen the following messages in slon logs, seemingly related: slon_log.3:2008-09-17 16:11:52 UTC WARN serialization problem updating sl_archive_counter: restarting slon > > > Jan > >> >> The subscription of node2 to set 2 eventually times out when it >> doesn't get a confirmation from node 3. >> >> ./run_test.sh: ERROR: Slonik error see >> /tmp/slony-regress.xbeqaimyk/slonik.log for details >> pgdba at ydb1.int.libertyrms.com:/opt/rg/data_dba/src/slony1-1.2.15/tests >> $ cat /tmp/slony-regress.xbeqaimyk/slonik.log >> <stdin>:25: timeout exceeded while waiting for event confirmation >> >> slonyregress3=# SELECT * from _slony_regress1.sl_event where ev_type >> <> 'SYNC' order by ev_seqno desc limit 1; >> ev_origin | ev_seqno | ev_timestamp | ev_minxid | >> ev_maxxid | ev_xip | ev_type | ev_data1 | ev_data2 | >> ev_data3 | ev_data4 | ev_data5 | ev_data6 | ev_data7 | ev_data8 >> -----------+----------+---------------------------+-----------+-----------+--------+---------------------+----------+----------+----------+----------+----------+----------+----------+---------- >> >> 1 | 26 | 2008-09-16 19:02:24.31597 | 97730 | >> 97731 | | ENABLE_SUBSCRIPTION | 2 | 1 | >> 2 | t | | | | >> (1 row) >> >> >> This is caused by the presence of an '<IDLE IN TRANSACTION>' >> connection on node 3 that is holding an AccessExclusiveLock on >> sl_config_lock: >> >> slonyregress3=# SELECT * from pg_locks where not granted; >> locktype | database | relation | page | tuple | transactionid | >> classid | objid | objsubid | transaction | pid | >> mode | granted >> ----------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+--------+---------------------+--------- >> >> relation | 29184 | 30112 | | | >> | | | | 98030 | 979420 | >> AccessExclusiveLock | f >> (1 row) >> >> slonyregress3=# SELECT relname from pg_class where oid = 30112; >> relname ---------------- >> sl_config_lock >> (1 row) >> >> slonyregress3=# SELECT * from pg_stat_activity where procpid = >> (SELECT pid from pg_locks where not granted); >> datid | datname | procpid | usesysid | usename >> | >> current_query >> | query_start | backend_start | >> client_addr | client_port >> -------+---------------+---------+----------+---------+----------------------------------------------------------------------------------------------------------------+-------------------------------+-------------------------------+-------------+------------- >> >> 29184 | slonyregress3 | 979420 | 10 | pgdba | begin >> transaction; set transaction isolation level serializable; lock table >> "_slony_regress1".sl_config_lock; | 2008-09-16 19:02:37.567212+00 | >> 2008-09-16 19:01:23.220162+00 | 127.0.0.1 | 57649 >> (1 row) >> >> slonyregress3=# SELECT * from pg_locks where relation = 30112 and >> granted; locktype | database | relation | page | tuple | >> transactionid | classid | objid | objsubid | transaction | pid >> | mode | granted >> ----------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+--------+---------------------+--------- >> >> relation | 29184 | 30112 | | | >> | | | | 98029 | 950632 | >> AccessExclusiveLock | t >> (1 row) >> >> slonyregress3=# SELECT * from pg_stat_activity where procpid = 950632; >> datid | datname | procpid | usesysid | usename | >> current_query | query_start | >> backend_start | client_addr | client_port >> -------+---------------+---------+----------+---------+-----------------------+-------------------------------+-------------------------------+-------------+------------- >> >> 29184 | slonyregress3 | 950632 | 10 | pgdba | <IDLE> in >> transaction | 2008-09-16 19:02:37.568543+00 | 2008-09-16 >> 19:01:23.201131+00 | 127.0.0.1 | 57644 >> (1 row) >> >> Killing the '<IDLE IN TRANSACTION>' connection makes the problem go >> away. What's going on here? >> >> > > -- JP Fletcher Database Administrator Afilias Canada voice: 416.646.3304 ext. 4123 fax: 416.646.3305 mobile: 416.561.4763 jpfletch at ca.afilias.info
- Previous message: [Slony1-general] testmultiplemoves testcase fails
- Next message: [Slony1-general] Replication works for a short while on windows XP, then breaks on restart
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list