[Slony1-general] Re: slon won't start after EXECUTE QUERY

Mon Nov 15 03:16:33 PST 2004

>===== Original Message From Jan Wieck <JanWieck at Yahoo.com> =====
>On 11/13/2004 4:42 PM, David Pitkin wrote:
>> I guess my little problem was a bit vague, but I'm somewhat surprised 
no-one
>> thought of what now seems like the obvious reason that slon wouldn't start. 
I
>> was right in that node 2 died on the DDL SCRIPT. Then, everytime I tried to
>> restart slon, it re-received the event, and died again. I'm still quite
>> confused as to why the script fails on only this node (especially since the
>> exact same script succeeds when run from psql).
>
>On the same node? Hmmm ... it might have to do with different
>permissions or search paths. What is the exact error message you get in
>the postmaster log ... the one that causes the script to abort the
>transaction?

I thought of that. So I ran the script from psql, logging in as the exact same 
user that slony logs in as. The script still worked. (Admittedly, I had to add 
some sql commands before running the script, such as starting a transaction, 
locking tables I needed to modify, and dropping slony's 'denyaccess' triggers. 
But that wouldn't explain why the script succeeds in psql, but not in slon).

The error message I get is 'Column field_new does not exist'. This would seem 
to suggest that the command which creates the temporary column 'field_new' is 
failing, but that doesn't make any sense.

>
>>
>> Anyways, I need to get this problem fixed, which means one of three things:
>> deleting the DDL SCRIPT event from node 1, adding a fake confirmation from
>> node 2, or changing the script itself (the one saved in sl_event) to 
something
>> that will definitely succeed.
>
>What about fixing the root of the problem? And without really knowing
>what causes the script to fail, the next question is a bit hard to answer.
>
>
>Jan
>

I'd love to fix the root of the problem, but so far the cause is unclear. I'm 
about ready to blame postgreSQL; this isn't entirely farfetched, because node 
2 is running 8beta3, whereas node 3 is 8beta2 and node 1 is 7.4.6. But if it 
IS postgreSQL, then the only solution is to convince the master node to stop 
sending the DDL SCRIPT event. Is there a way to do that?

David Pitkin

>>
>> Can anyone tell me which would be the best solution, and more importantly, 
how
>> to do it safely?
>>
>> David Pitkin
>>
>>
>> -------Original Message-------------------------
>> Hello,
>>
>> I'm brand new to SlonyI. Someone else set it up with a master node (node 1)
>> and two slaves (node 2 and node 3). I needed to change the schema, and have
>> successfully managed to break node 2 in the process (happily this is still 
in
>> the development stage). Here's what happened. Hopefully someone can tell me
>> what I did wrong:
>>
>> 1. First, I should mention that node 1 and node 2 are on the same machine
>> (Linux), with node 3 on a seperate machine. I needed to change the data 
type
>> of a column, using sql like this:
>> ALTER TABLE table ADD COLUMN field_new;
>> UPDATE table SET field_new = field;
>> ALTER TABLE table DROP COLUMN field;
>> ALTER TABLE table RENAME COLUMN field_new TO field.
>>
>> 2. I ran this script using the EXECUTE QUERY command in slonik. It failed
>> initially, because I forgot that the schema containing the table I needed 
to
>> modify was not in the search path for the 'slony' user. It failed on node 
1,
>> and appeared to be isolated there (i.e. the event did not get sent to the
>> other two nodes). I've checked the Schemadoc, and this seems to be what
>> happens. I also double checked the process list at that point, and verified
>> that two slon processes were still running (for nodes 1 and 2).
>>
>> 3. I fixed the script and ran it a second time. It succeeded on node 1, and 
on
>> node 3. But node 2 was unchanged, and further investigation showed that the
>> corresponding slon process was dead. I tried restarting it, and it 
complained
>> a few times about there being no remote worker thread for node 1, and died
>> with an empty error message.
>>
>> 4. I manually fixed the schema on node 2, and started slon again. Slon died 
in
>> the same way.
>>
>> I checked the slonyI tables, and it appears the node 2 confirmed the SYNC
>> event sent by node 1 just before the DDL_SCRIPT event (the timestamps of 
both
>> events match). This suggests that the script killed node 2, and a quick 
glance
>> at the remote worker thread source code suggests that if a script were to
>> fail, the thread would immediately die. But I can't figure out why the slon
>> process refuses to restart.
>>
>> Does anyone have any thoughts?
>>
>> David Pitkin
>>
>>
>> _______________________________________________
>> Slony1-general mailing list
>> Slony1-general at gborg.postgresql.org
>> http://gborg.postgresql.org/mailman/listinfo/slony1-general
>
>
>--
>#======================================================================#
># It's easier to get forgiveness for being wrong than for being right. #
># Let's break this rule - forgive me.                                  #
>#================================================== JanWieck at Yahoo.com #