[Slony1-bugs] Slony 1.2.10: Deadlock on slave during execute script

Mon Nov 12 13:18:33 PST 2007

On Mon, 12 Nov 2007, Jeff Frost wrote:

> On Fri, 28 Sep 2007, Christopher Browne wrote:
>
>> Jeff Frost <jeff at frostconsultingllc.com> writes:
>>> I think the deadlocks aren't load related but speed related.  That is,
>>> if the acquiring of all the locks by the execute script takes longer
>>> on a slower machine, the window of opportunity for one of these
>>> selects to cause a deadlock seems greater, no?  They do seem to happen
>>> on the slower machine more regularly than the faster one.
>> 
>> The problem had nothing to do with deadlocks, per se, but rather with
>> the fact that a refactoring of the code *broke* things by taking out a
>> leading "begin;" statement.
>> 
>> It should present no *fundamental* problem if the node hits a
>> deadlock; if the deadlock affects the "EXECUTE SCRIPT" event, then the
>> worst that should happen is that the work gets rolled back, and ten
>> seconds later, the node retries, hopefully with greater success.
>> 
>> The fix for this has been committed to the 1.2 branch (never was a
>> problem in 2.0), so that we should have this addressed RSN.
>
> Will this fix be in 1.2.12?
>
> Also, can someone explain to me why this doesn't fix it:
>
> select _nerdcluster.altertableforreplication(tab_id) from
> _cluster.sl_table where  tab_altered is false;
>
> When I run that, it seems to restore them all to their prior to slony state.
>
> But if I run:
>
> select _cluster.altertableforreplication(1);
>
> it properly alters it for replication.

Oh, but after I select _cluster.altertableforreplication the last one, they 
all get reset to false and restored. *scratches head*

That's with slony1-1.2.10.

-- 
Jeff Frost, Owner 	<jeff at frostconsultingllc.com>
Frost Consulting, LLC 	http://www.frostconsultingllc.com/
Phone: 650-780-7908	FAX: 650-649-1954