Vick Khera vivek at khera.org
Sat Nov 6 22:53:53 PDT 2010
Following up on the discussion earlier this week regarding using
timestamp with timezone for slony's tables... I was just awoken to a
false alarm page from my nagios monitoring of the replication on my
servers.  The negative time lag confused nagios.  It will remain this
way until 2am, at which point it will become 1am again.


[d01]% slony-vkmlm/repstatus
-[ RECORD 1 ]-------------+---------------------------
st_origin                 | 4
st_received               | 1
st_last_event             | 15075665
st_last_event_ts          | 2010-11-07 01:06:22.61718
st_last_received          | 15075664
st_last_received_ts       | 2010-11-07 01:06:22.494841
st_last_received_event_ts | 2010-11-07 01:06:20.611323
st_lag_num_events         | 1
st_lag_time               | -00:59:56.265608


[d01]% uname -a
FreeBSD d01.m1e.net 8.1-RELEASE FreeBSD 8.1-RELEASE #7: Mon Aug 30
13:57:04 EDT 2010
steve at d01:/n/lorax1/usr8/obj.amd64/n/lorax1/usr8/src/sys/KCI64SMP
amd64
[d01]% slon -v
slon version 1.2.20
[d01]% psql --version
psql (PostgreSQL) 9.0.1

The origin is running Pg 8.3.11.

This is occurring on all my DB replicas (I have two other databases I
replicate between different boxes).  I restarted all the slons, and
even set the TZ=UTC but that made not one bit of difference.

It appears that the error is coming from the discrepancy of timezones
of now() and the ev_timestamp:


# select now()-ev_timestamp,now()-CAST(ev_timestamp as timestamptz) AS
x from _mailermailer.sl_event order by ev_timestamp desc limit 3;
     ?column?     |        x
------------------+------------------
 -00:59:56.741957 | -00:59:56.741957
 -00:59:54.736794 | -00:59:54.736794
 -00:59:52.731122 | -00:59:52.731122
(3 rows)


# select ev_timestamp,CAST(ev_timestamp as timestamptz) AS x from
_mailermailer.sl_event order by ev_timestamp desc limit 3;
        ev_timestamp        |               x
----------------------------+-------------------------------
 2010-11-07 01:40:37.72615  | 2010-11-07 01:40:37.72615-05
 2010-11-07 01:40:36.176595 | 2010-11-07 01:40:36.176595-05
 2010-11-07 01:40:34.171789 | 2010-11-07 01:40:34.171789-05
(3 rows)



# select now();
              now
-------------------------------
 2010-11-07 01:40:44.088584-04
(1 row)


Note how now() says it is in zone -04 but the cast of ev_timestamp to
timestamptz gives zone -05.

The above is run on my 9.0.1 replica server.

I suppose this could be a postgres bug, but then again, the timestamps
right now are ambiguous... are they in the pre-switch or post-switch
"1am" hour?  Postgres seems to decide that the 1am hour is always
after the switchover, so puts it in the standard timezone.

The only fix I see is to store the times with timezone in slony.


More information about the Slony1-general mailing list