[Slony1-bugs] Slony 1.2.10: Deadlock on slave during execute script

Fri Sep 28 08:14:51 PDT 2007

Jeff Frost <jeff at frostconsultingllc.com> writes:
> I think the deadlocks aren't load related but speed related.  That is,
> if the acquiring of all the locks by the execute script takes longer
> on a slower machine, the window of opportunity for one of these
> selects to cause a deadlock seems greater, no?  They do seem to happen
> on the slower machine more regularly than the faster one.

Oh, dear.  I found the problem.

<http://lists.slony.info/pipermail/slony1-commit/2007-April/001677.html>

This patch (which I applied, so I'm the guilty one) drew the DDL
processing into a function (rather than it being inline, in the event
loop).

Unfortunately, that code did not apply the "begin; set transaction
isolation mode serializable;" that is in the main loop, which explains
why you could get a "partial application" of updates.

I'm going to shift the code *BACK* to the main loop, where it
belonged.

It's rather surprising to me that you could get into a deadlock at the
particular point that you did; in thinking that part through, I
realized that at the point int he code where it tries "restoring
replication", the slon connection must already hold exclusive locks on
ALL of the replicated tables.

If you could check as to what relation it was reporting it deadlocked
on (e.g. - "select * from pg_class where oid = [relation number that
was in the log];"), that would be somewhat interesting to know.  If it
was a Slony-I-created table, and you have reinitialized replication,
then you won't be able to find out :-(.

I would expect to find it somewhat surprising which relation this was.
-- 
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://linuxfinances.info/info/rdbms.html
One good turn gets most of the blankets.