Jeff Frost jeff at pgexperts.com
Mon Aug 24 06:49:37 PDT 2009
Karl Denninger wrote:
>>   =

> But they should have switched the master to Node #4 when the move set
> command was executed.  When they reconnect they should be doing so to
> Node #2, not Node #2 - IF they saw the "move set" command (and it
> appears they did.)
>
> Further, I ran the change in the paths on that node - that is, locally
> to that machine.  No difference.
When you indicate that you ran the store path on that node, can you be
specific about what you did?


>>> I'm wondering what happened here.  It is almost as if the "move set"
>>> never executed on the other subscribers - an impossibility, no?  They
>>> WERE all replicating and current just before the shutdown - I checked
>>> them all.  How does that happen under these circumstances?
>>>
>>> Is there a better way for the future?  I'm back up now, but the entire
>>> point of this exercise was to AVOID having to copy the entire database
>>> over - while I avoided any material downtime for my users, I was left
>>> EXPOSED to a failure for the copy period, which was kinda nasty.
>>>
>>> Thoughts appreciated.
>>>
>>>     =

>>
>> Probably the way to avoid it would have been to issue the store path
>> changes before switching the ports.  But, if you forget to do it in the
>> future, you can fix it afterwards by going bare metal and updating the
>> paths in the _tickerform.sl_path table on the nodes that don't have the
>> correct information.
>>
>>   =

> I still don't understand why the node change wasn't picked up by these
> slaves when the move set executed; I would have expected that this
> would be the case (that is, it would be expecting Node #4 to be the
> master) and although it showed up on the "wrong" ip address a store
> path should have fixed that.
>
> It APPEARS that it was looking for the old master on Node #2....
> implying (I think) that it never saw the move set.
>
> Or am I misunderstanding how the internals work here?
>

I don't think the problem is that it didn't see the move set, I think
the problem is that it didn't get the store path commands because it
didn't connect to the 'new' master after you changed the ports out from
under it.  I don't think slony is well designed for having the paths
changed out from under it and you'll likely have to fix them by hand
when you do this.

I'm pretty sure what happened (and hopefully someone will correct me if
I'm wrong) is even though you ran the slonik store path command on the
broken node, slonik connected to the new master, updated the master's DB
with the store path info and put this event in the log to propagate out
to the slaves.  Unfortunately, because the broken slave still had the
old path in the sl_path table, it didn't know how to connect to the new
master and therefore never received the new path information. =


-- =

Jeff Frost <jeff at pgexperts.com>
COO, PostgreSQL Experts, Inc.
Phone: 1-888-PG-EXPRT x506
http://www.pgexperts.com/ =


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-general/attachments/20090824/=
cdd24713/attachment-0001.htm


More information about the Slony1-general mailing list