Jan Wieck JanWieck at Yahoo.com
Sat Feb 23 10:46:58 PST 2008
On 2/23/2008 1:28 PM, Craig James wrote:
> Jan Wieck wrote:
>> On 2/23/2008 12:20 AM, Craig James wrote:
>>> A little more info on this problem...
>>>
>>> Craig James wrote:
>>>> I'm trying to migrate a node for the second time, and no luck.  Last 
>>>> time I tried it, it just got stuck, and due to lack of time, I didn't 
>>>> investigate.
>>>>
>>>> This time I watched -- it got stuck again, doing some sort of huge 
>>>> SELECT statement.  I was under the impression that migrating a node 
>>>> was a fairly simple operation that should happen in a short time 
>>>> (less than a minute?) even for large databases.
>>>>
>>>> I waited 10 minutes, during which the entire system was completely 
>>>> locked up (no other process could access the database), and our web 
>>>> site was offline.  I finally had to kill all of the slon daemons and 
>>>> kill Postgres to get our site back on the air, then run the 
>>>> node-unlock command to get Slony back in shape.
>>>>
>>>> This system appears to otherwise be working well.  I can insert, 
>>>> update and delete records, and they're copied to the slave node 
>>>> immediately.
>>>>
>>>> What's up?  Am I just too impatient?
>>>
>>> I tried it again, after vacuuming the slony tables that are subject to 
>>> bloat.  This time I shut everything off, started the migration of the 
>>> master to node 2, and waited for 35 minutes, but the SELECT never 
>>> finished.  vmstat showed massive I/O and CPU activity the whole time.
>> 
>> What SELECT are you referring to? I don't see where in the MOVE SET you 
>> have to perform any SELECT.
> 
> You tell me?  It is the slon daemon that is executing this select.  There were no other connections to the database the second time I tried this.
> 
> 
>>> Again, after I killed postgres, restarted, and unlocked the node, 
>>> Slony went back to performing perfectly.
>> 
>> Killing postgres is a bad idea. Stop that habit right now, before you 
>> physically corrupt any of your databases.
> 
> Thanks for the advice, but I don't think it's a problem.  That's one of the features of a robust relational database with a write-ahead log -- it can withstand being killed without corrupting data.  Besides, I had no choice, my web site went offline because slon apparently took an exclusive lock on the tables, blocking all other activity.  And I killed a SELECT, not an INSERT or UPDATE.
> 
> But that's a topic for a separate discussion ... I have to fix this Slony problem first.

I don't think there is anything to discuss about killing the postmaster.

If you can't do it with pg_ctl, then don't do it.


> 
>> Anyhow, apparently the LOCK SET part of the process succeeds. So what I 
>> now assume is that the WAIT FOR EVENT never finishes. First, you don't 
>> need a WAIT FOR EVENT between LOCK SET and MOVE SET. Both events are 
>> executed on the origin, so by the time the LOCK SET finishes, everything 
>> is ready for the MOVE.
> 
> I don't think it got as far as this, but I don't know the internals.  When I execute the script, the SELECT starts, and that's where everything comes to a sudden halt.

I do know a little bit about the internals.

The lockSet() stored procedure, called by slonik to perform the LOCK SET 
command, tries to add a trigger to all the tables in the set. That 
requires an access exclusive lock on those tables. If nothing else is 
connected to the database, then there is no reason why that would block.

So either your slonik script gets past that point, or your assumption 
nothing is holding any kind of lock on any table in the set is wrong.

> 
>> But what this indicates is that node 2 never confirms the LOCK SET. Can 
>> it be that you actually have a problem with the connection from node 2 
>> to node 1? What is the content of the view sl_status on both nodes?
> 
> Both nodes seem to be normal -- the st_last_event_ts is just a few seconds prior to the query, st_lag_time is 00:00:11.465251 (node 1) and 00:00:07.50164.

That tells us at least that your sl_path settings are correct.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin



More information about the Slony1-general mailing list