[Slony1-general] proper procedure for re-starting slony after replication slave reboots

Wed Feb 20 15:33:07 PST 2008

Geoffrey <lists at serioustechnology.com> writes:
> Andrew Sullivan wrote:
>> I am by no means willing to dismiss the suggestion that there are bugs in
>> Slony; but this still looks to me very much like there's something we don't
>> know about what happened, that explains the errors you're seeing.
>
> I would so love to figure out this issue.  I appreciate your efforts.
>
> I simply don't understand how one table inparticular could get so far
> out of sync.  We're talking 300 records.
>
> I can't imagine that slony is that fragile.  There's got to be
> something going on that we don't see.

I agree.  From what I have heard, it doesn't sound like you have
experienced anything that should be scratching any of the edge points
of Slony-I.

300 records don't just disappear.

When I put this all together, I'm increasingly suspicious that you may
have experienced hardware problems or some such thing that might cause
data loss that Slony-I would have no way to address.

> I started the replication of this database last night.  Neither
> machine has been rebooted and neither postmaster was restarted.
>
> Is it possible I should be tweaking the configuration in some way?  I
> see a default value for SYNC_CHECK_INTERNAL.  Is 1000 a good value?

That makes it try to do a SYNC each second, so that the granularity of
possible data loss is, well, 1000ms.

Reducing that to 100ms would tend to lead to somewhat more
aggressively-quick replication, though it is not obvious that the
system would necessarily replicate much faster.

I don't see fiddling with that being a particularly useful thing to
do.  It's "grasping at straws."

You've grown suspicious about *every* component, which, on the one
hand, is unsurprising, but on the other, not much useful.  I haven't
heard you mention anything that would cause me to expect Slony-I to
have eaten data, or to have even "started to look hungrily at the
data."

The notices you have mentioned are all benign things.  The one
question that comes to mind: Any interesting ERROR messages in the
PostgreSQL logs?  I'm getting more and more suspicious that something
about the entire DB cluster has gotten unstable, and if that's the
case, Slony-I wouldn't do any better than the DB it is running on...
-- 
let name="cbbrowne" and tld="linuxdatabases.info" in String.concat "@" [name;tld];;
http://linuxfinances.info/info/lisp.html
"On the  other hand, O'Reilly's book  about running W95 has  a toad as
the cover animal.  Makes sense; both have lots of  warts and croak all
the time."  --- Michael Kagalenko,