Richard Yen dba at richyen.com
Mon Sep 29 11:35:22 PDT 2008
Hi,

I'm noticing that there's a very old row in sl_confirm on one of my  
nodes.  Wondering what could be the problem...

test_slony_state says the following (I can paste more if needed):
> Listen Path Analysis
> ===================================================
> No problems found with sl_listen
>
> --------------------------------------------------------------------------------
> Summary of event info
>  Origin  Min SYNC  Max SYNC Min SYNC Age Max SYNC Age
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ======================================================================
>       1   5269981   5272295     00:00:00     00:20:00    0
>       4   1172599   1172992     00:00:00     00:20:00    0
>       2    846926    847325     00:00:00     00:20:00    0
>       3   1697518   1700013     00:00:00     00:21:00    0
>
>
> ---------------------------------------------------------------------------------
> Summary of sl_confirm aging
>    Origin   Receiver   Min SYNC   Max SYNC  Age of latest SYNC  Age  
> of eldest SYNC
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ======================================================================
>         1          2    5269992    5272295      00:00:00       
> 00:20:00    0
>         1          3    5269984    5272295      00:00:00       
> 00:20:00    0
>         1          4    5269981    5272293      00:00:00       
> 00:20:00    0
>         2          1     846926     847325      00:00:00       
> 00:20:00    0
>         2          3     846926     847325      00:00:00       
> 00:20:00    0
>         2          4     846926     847325      00:00:00       
> 00:20:00    0
>         3          1    3778920    3778920  44 days 14:13:00  44  
> days 14:13:00    1
>         3          2    1697665    1700013      00:00:00       
> 00:20:00    0
>         3          4    1697518    1699955      00:00:00       
> 00:21:00    0
>         4          1    1172599    1172992      00:00:00       
> 00:20:00    0
>         4          2    1172599    1172992      00:00:00       
> 00:20:00    0
>         4          3    1172599    1172992      00:00:00       
> 00:20:00    0
>
>
> ------------------------------------------------------------------------------
>
> Listing of old open connections
>        Database             PID            User    Query  
> Age                Query
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ======================================================================
>
> ISSUES FOUND:
>
>
> Node: 3 Confirmations not propagating from 3 to 1
> ================================================
> Confirmations not propagating quickly in sl_confirm -
>
> For origin node 3, receiver node 1, earliest propagated
> confirmation has age 44 days 14:13:00 > 00:30:00
>
> Are slons running for both nodes?
>
> Could listen paths be missing so that confirmations are not  
> propagating?

I've looked at sl_log1, sl_log2, and sl_event, but nothing corresponds  
to con_seqno in the sl_confirm table:
> tii=# select * from _tii.sl_confirm order by con_timestamp;
>  con_origin | con_received | con_seqno |       con_timestamp
> ------------+--------------+-----------+----------------------------
>           3 |            1 |   3778920 | 2008-08-15 21:12:43.108286
>
> tii=# select * from _tii.sl_event where ev_seqno = 3778920;
>  ev_origin | ev_seqno | ev_timestamp | ev_minxid | ev_maxxid |  
> ev_xip | ev_type | ev_data1 | ev_data2 | ev_data3 | ev_data4 |  
> ev_data5 | ev_data6 | ev_data7 | ev_data8
> -----------+----------+--------------+-----------+----------- 
> +--------+---------+----------+----------+----------+---------- 
> +----------+----------+----------+----------
> (0 rows)

I have also restarted the slon daemons on nodes 1 and 3, but that  
doesn't change anything.

Wondering if this may be the cause of erratic sync behavior?  My  
monitoring indicates that this node (node 3) is regularly having lag  
issues, but the other two nodes (nodes 2 and 4) haven't even a squeak.

Any ideas where else to look?  What else can I do?

--Richard


More information about the Slony1-general mailing list