Jan Wieck JanWieck
Wed Dec 21 13:44:47 PST 2005
On 12/19/2005 7:56 PM, Marc G. Fournier wrote:
> 'k, setting up monitoring, and the script is reporting 1 out of 3 nodes 
> out of sync:
> 
> ./check_slony_cluster.sh dns ams ams.hub.org
> ERROR - 2 of 3 nodes not in sync
> 
> no problem, figured out in the script how it is being determined, and:
> 
>   st_received |    cfmdelay
> -------------+-----------------
>             2 | 00:00:00.010721
>             3 | 03:59:55.2181
>             4 | 00:00:00.125318
> (3 rows)
> 
> wow ... 3 hours and 59 minutes where the other two (Node 4 is a remote 
> server, somewhere in the US, while node 3 is the server beside the master) 
> ...
> 
> Now, I've checked Node 3, and it contains the same # of records as Node 1 
> ..
> 
> Now, I just did an update on one record in the table, and checked all 3 
> slaves and they see the change, yet now I'm seeing:
> 
>   st_received |    cfmdelay
> -------------+-----------------
>             2 | 00:00:00.009916
>             3 | 03:59:55.175099
>             4 | 01:46:02.69134
> (3 rows)
> 
> Node 4 just shot up ...
> 
> Looking at sl_status:
> 
> # select * from "_dns".sl_status;
>   st_origin | st_received | st_last_event |      st_last_event_ts      | st_last_received |    st_last_received_ts     | st_last_received_event_ts  | st_lag_num_events |   st_lag_time 
> -----------+-------------+---------------+----------------------------+------------------+----------------------------+----------------------------+-------------------+-----------------
>           1 |           2 |           837 | 2005-12-19 20:52:23.576685 |              837 | 2005-12-19 20:52:23.589583 | 2005-12-19 20:52:23.576685 |                 0 | 00:00:06.669823
>           1 |           3 |           837 | 2005-12-19 20:52:23.576685 |              837 | 2005-12-20 00:52:18.736552 | 2005-12-19 20:52:23.576685 |                 0 | 00:00:06.669823
>           1 |           4 |           837 | 2005-12-19 20:52:23.576685 |              837 | 2005-12-19 22:36:25.514229 | 2005-12-19 20:52:23.576685 |                 0 | 00:00:06.669823
> 
> So, what is st_last_received_ts, and why isn't Node 3 updating it?  I've 
> checked my slon_ams.out file on Node 3, and there are no errors being 
> generated that I can see ... and replication appears to be working fine on 
> all the Nodes ...
> 
> Somewhere else I need to be looking for this?

The timestamps st_last_event_ts (origin) and st_last_received_ts 
(subscriber) are taken on different servers. If you play around with the 
clocks of them you will find also settings where sl_status reports them 
to be in the future.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #


More information about the Slony1-general mailing list