Jan Wieck JanWieck at Yahoo.com
Sat Sep 29 04:30:34 PDT 2007
On 9/28/2007 5:30 PM, Christopher Browne wrote:
> Jeff Frost <jeff at frostconsultingllc.com> writes:
>> I think the deadlocks aren't load related but speed related.  That is,
>> if the acquiring of all the locks by the execute script takes longer
>> on a slower machine, the window of opportunity for one of these
>> selects to cause a deadlock seems greater, no?  They do seem to happen
>> on the slower machine more regularly than the faster one.
> 
> The problem had nothing to do with deadlocks, per se, but rather with
> the fact that a refactoring of the code *broke* things by taking out a
> leading "begin;" statement.
> 
> It should present no *fundamental* problem if the node hits a
> deadlock; if the deadlock affects the "EXECUTE SCRIPT" event, then the
> worst that should happen is that the work gets rolled back, and ten
> seconds later, the node retries, hopefully with greater success.
> 
> The fix for this has been committed to the 1.2 branch (never was a
> problem in 2.0), so that we should have this addressed RSN.

Just as a side remark, a deadlock is caused (in a simple case) by two 
sessions trying to lock two objects in opposite order (one can create 
more complex cases, but for the sake of this discussion the 2x2 case is 
sufficient).

It doesn't matter that the client transaction is only doing SELECT at 
all, since the lock the DDL script requires is an access exclusive one, 
and that is in conflict with even a share lock.

I thought that we at least would guarantee that the tables are locked by 
slony in the order of their ID in sl_table, but a quick glance over the 
code didn't confirm that. So it is indeed unpredictable if and when the 
locking order will be compatible with the access pattern of the 
concurrent application.

This locking order problem will be history in version 2.0, since it does 
not do an alterTableForReplication() and alterTableRestore() any more 
but uses the new session_replication_role feature in 8.3. So in 2.0, the 
only tables that are locked are the ones actually touched by the DDL 
script itself and in exactly the order they are used in there.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #


More information about the Slony1-bugs mailing list