Christopher Browne cbbrowne at ca.afilias.info
Tue Jan 15 07:35:04 PST 2008
Craig James <craig_james at emolecules.com> writes:
> Brad Nicholson wrote:
>>> Andrew Sullivan wrote:
>>>> On Sat, Jan 12, 2008 at 05:51:57PM -0800, Jeff Frost wrote:
>>>>> You might check the slon logs for any indication this is what
>>>>> caused your problem.  I thought slon did its work based entirely
>>>>> on xxid, but I could be wrong.  Perhaps one of the developers
>>>>> will comment.  In any event, I'm sure you know the clocks should
>>>>> be in sync as a best practice.
>>>> Not entirely based on xxid, no.  They can't be synced across nodes.  You
>>>> _must_ be nntp-synced between your nodes.  They should peer from one
>>>> another, and _then_ get their time from some other source.
>>> I want to be sure I understand this: Are you saying that a few milliseconds
>>>  of time skew between servers will make Slony fail?
>>>
>> No, it won't.  There is always going to be some minor skew, even when
>> synced with NTP.
>
> Then why is it required to run  nntp between nodes, and only have one external connection to a time server?  Even using independent connections to a time service, the various Slony hosts keep their clocks very closely synchronized.  If a few milliseconds doesn't matter, then it seems to me that an ordinary ntpd(1) should be good enough.
>
> Is there an actual spec for "how much time skew is too much" for Slony?  Like, "All hosts must have their clocks synced within 100 msec."  Or is it just a "good idea" that's not formalized?
>
> What are the consequences of having to servers with out-of-sync clocks?

The consequences are not terribly bad, for Slony-I, directly.

It uses local XID values to determine ordering of things, so it
oughtn't get confused.[1]

The consequences may be bad, *for your application.*  If your
application looks at timestamps on DB servers, then it will make wrong
decisions to the extent to which the times are wrong.

And certainly you'll find things confusing if you are rummaging thru
logs, trying to figure out when something happened, only to discover
that timestamps steer you wrong.

Footnotes: 

[1] One possible exception: The cleanup thread won't clean up events
until they are confirmed everywhere, and at least 10 minutes old.  If
systems disagree on what "10 minutes old" means, you might purge them
earlier or later.  However, note that the events aren't purged until
*confirmed everywhere,* so replication will have taken place before
the purge...
-- 
(reverse (concatenate 'string "ofni.sesabatadxunil" "@" "enworbbc"))
http://cbbrowne.com/info/sgml.html
"Any man whose errors take ten years to correct is quite a man."
-- J. Robert Oppenheimer, speaking of Albert Einstein 


More information about the Slony1-general mailing list