Steve Singer steve at ssinger.info
Thu Feb 22 15:02:30 PST 2018
On Thu, 22 Feb 2018, Tignor, Tom wrote:

Looks like?
http://lists.slony.info/pipermail/slony1-general/2016-September/013331.html

I can't remember if that was what prompted 
http://lists.slony.info/pipermail/slony1-hackers/2016-December/000560.html


https://github.com/ssinger/slony1-engine/tree/bug375

I can't seem to find a reason why this wasn't committed.


> 
>  
> 
>                 Hello slony1 community,
> 
>                 We have a head scratcher here. It appears a DROP NODE command was not fully processed. The
> command was issued and confirmed on all our nodes at approximately 2018-02-21 19:19:50 UTC. When we went to
> restore it over two hours later, all replication stopped on an sl_event constraint violation. Investigation
> showed a SYNC event for the dropped node with a timestamp of just a few seconds before the drop. I believe this
> is a first for us. The DROP NODE command is supposed to remove all state for the dropped node. Is that right? Is
> there a potential race condition somewhere which could leave behind state?
> 
>                 Thanks in advance,
> 
>  
> 
> ---- master log replication freeze error ----
> 
> 2018-02-21 21:38:52 UTC [5775] ERROR  remoteWorkerThread_8: "insert into "_ams_cluster".sl_event     (ev_origin,
> ev_seqno, ev_timestamp,      ev_snapshot, ev\
> 
> _type     ) values ('8', '5002075962', '2018-02-21 19:19:41.958719+00', '87044110:87044110:', 'SYNC'); insert
> into "_ams_cluster".sl_confirm       (con_origi\
> 
> n, con_received, con_seqno, con_timestamp)    values (8, 1, '5002075962', now()); select
> "_ams_cluster".logApplySaveStats('_ams_cluster', 8, '0.139 s'::inter\
> 
> val); commit transaction;" PGRES_FATAL_ERROR ERROR:  duplicate key value violates unique constraint
> "sl_event-pkey"
> 
> DETAIL:  Key (ev_origin, ev_seqno)=(8, 5002075962) already exists.
> 
> 2018-02-21 21:38:52 UTC [13649] CONFIG slon: child terminated signal: 9; pid: 5775, current worker pid: 5775
> 
> 2018-02-21 21:38:52 UTC [13649] CONFIG slon: restart of worker in 10 seconds
> 
> ---- master log replication freeze error ----
> 
>  
> 
>  
> 
> ---- master DB leftover event ----
> 
> ams at ams6.cmb.netmgmt:~$ psql -U akamai -d ams
> 
> psql (9.1.24)
> 
> Type "help" for help.
> 
>  
> 
> ams=# select * from sl_event_bak;
> 
>  ev_origin |  ev_seqno  |         ev_timestamp          |    ev_snapshot     | ev_type | ev_data1 | ev_data2 |
> ev_data3 | ev_data4 | ev_data5 | ev_data6 | ev_
> 
> data7 | ev_data8 
> 
> -----------+------------+-------------------------------+--------------------+---------+----------+----------+-
> ---------+----------+----------+----------+----
> 
> ------+----------
> 
>          8 | 5002075962 | 2018-02-21 19:19:41.958719+00 | 87044110:87044110: | SYNC    |          |          | 
>         |          |          |          |    
> 
>       | 
> 
> (1 row)
> 
>  
> 
> ams=# 
> 
> ---- master DB leftover event ----
> 
>  
> 
> ---- master log drop node record ----
> 
> 2018-02-21 19:19:50 UTC [22582] CONFIG disableNode: no_id=8
> 
> 2018-02-21 19:19:50 UTC [22582] CONFIG storeListen: li_origin=4 li_receiver=1 li_provider=4
> 
> 2018-02-21 19:19:50 UTC [22582] CONFIG storeListen: li_origin=7 li_receiver=1 li_provider=7
> 
> 2018-02-21 19:19:50 UTC [22582] CONFIG storeListen: li_origin=3 li_receiver=1 li_provider=3
> 
> 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: update provider configuration
> 
> 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: connection for provider 4 terminated
> 
> 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: disconnecting from data provider 4
> 
> 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: connection for provider 7 terminated
> 
> ---- master log drop node record ----
> 
>  
> 
> ---- replica log drop node record ----
> 
> 2018-02-21 19:19:51 UTC [22650] WARN   remoteWorkerThread_1: got DROP NODE for local node ID
> 
> NOTICE:  Slony-I: Please drop schema "_ams_cluster"
> 
> 2018-02-21 19:19:53 UTC [22650] INFO   remoteWorkerThread_7: SYNC 5001868819 done in 2.153 seconds
> 
> NOTICE:  drop cascades to 243 other objects
> 
> DETAIL:  drop cascades to table _ams_cluster.sl_node
> 
> drop cascades to table _ams_cluster.sl_nodelock
> 
> drop cascades to table _ams_cluster.sl_set
> 
> drop cascades to table _ams_cluster.sl_setsync
> 
> drop cascades to table _ams_cluster.sl_table
> 
> drop cascades to table _ams_cluster.sl_sequence
> 
> ---- replica log drop node record ----
> 
>  
> 
>  
> 
>                 Tom    ☺
> 
>  
> 
>  
> 
> 
>


More information about the Slony1-general mailing list