Jeff Frost jeff at frostconsultingllc.com
Wed Sep 26 15:41:52 PDT 2007
On Wed, 26 Sep 2007, Christopher Browne wrote:

> Jeff Frost wrote:
>> On Wed, 26 Sep 2007, Christopher Browne wrote:
>> 
>>> Jeff Frost <jeff at frostconsultingllc.com> writes:
>>>> For extra background on this bug, see the thread here:
>>>> http://lists.slony.info/pipermail/slony1-general/2007-September/006687.html 
>>>> 
>>>> This is on Slony1-1.2.10, PostgreSQL-8.1.9:
>>>> 
>>>> I've run into a situation on a client cluster composed of a master and
>>>> 2 slave nodes where a deadlock on one of the slaves happens quite
>>>> regularly during any EXECUTE SCRIT commands.  It seems if slony loses
>>>> the deadlock, some of the tables are left in a not altered for
>>>> replication state and this breaks replication.
>>> 
>>> I'm going to set up (heh) a test case, let's call it
>>> "testdeadlockddl", which will try to "tickle" this problem.
>>> 
>>> The approach:
>>> 
>>> - I'll start by creating a set that is replicated, and where some data
>>>  is flowing thru.
>>> 
>>> - I'll set up some queries against the replica in order to try to
>>>  encourage deadlocks.
>>> 
>>> - Then an "EXECUTE SCRIPT" will try adding a new table.  As a side effect,
>>>  it will implicitly require locks on all the replicated tables.
>>> 
>>> I don't see that there are other particular details to this, right?
>> 
>> You seem to be right on the money with my experience, Christopher.  Perhaps 
>> if your slave is inside a VM making it a little less performant would be 
>> more likley to cause the deadlock and show the problem?  I mention this 
>> because our slave is not as performant as the master and so it deadlocks 
>> more often during execute scripts than the master did when the master 
>> shoulders the entire load.
>> 
> I have a test added in; it hasn't yet tickled any deadlocks, so I think I'm 
> not yet seeing the pattern of queries that you are using.
>
> I'll fight further with this tomorrow; if it's at all possible for you to 
> show a pattern of queries (not involving Slony-I necessarily at all) that 
> causes the deadlock to emerge, that would be helpful.
>
> This is NOT about system load - deadlocks can emerge under pretty simple 
> conditions, and the simpler the condition we can come up with, the better.  A 
> good test case for this won't involve variations of load - it should be able 
> to remain pretty simple.

The queries that we're seeing trigger it are SELECT queries with lots of 
INTERSECTs like this ugly one:

    SELECT m.name as manf
         , p.mfgno_stripped as mfgno
         , p.onhand
         , p.price
         , p.sdescr
         , pi.img
         , pm.categoryid
      INTO TEMP TABLE searchc8f9d43421c4d72570b9bb9288ff3c10
      FROM products p
      JOIN man m
        ON (p.man_id        = m.id)
      LEFT JOIN product_images pi
        ON (
             p.mfgno_stripped = pi.mfgno_stripped
         AND p.man_id         = pi.man_id
           )
      LEFT JOIN category.product_map pm
        ON (
             p.mfgno_stripped = pm.mfgno_stripped
         AND p.man_id         = pm.manid
           )
     WHERE (
             (upper(p.mfgno_stripped) LIKE '%ACER%')
         OR (upper(m.name) LIKE '%ACER%')
         OR (upper(p.sdescr) LIKE '%ACER%')
           )
INTERSECT
    SELECT m.name as manf
         , p.mfgno_stripped as mfgno
         , p.onhand
         , p.price
         , p.sdescr
         , pi.img
         , pm.categoryid
      FROM products p
      JOIN man m
        ON (p.man_id        = m.id)
      LEFT JOIN product_images pi
        ON (
             p.mfgno_stripped = pi.mfgno_stripped
         AND p.man_id         = pi.man_id
           )
      LEFT JOIN category.product_map pm
        ON (
             p.mfgno_stripped = pm.mfgno_stripped
         AND p.man_id         = pm.manid
           )
     WHERE (
             (upper(p.mfgno_stripped) LIKE '%NOTEBOOK%')
         OR (upper(m.name) LIKE '%NOTEBOOK%')
         OR (upper(p.sdescr) LIKE '%NOTEBOOK%')
           )
INTERSECT
    SELECT m.name as manf
         , p.mfgno_stripped as mfgno
         , p.onhand
         , p.price
         , p.sdescr
         , pi.img
         , pm.categoryid
      FROM products p
      JOIN man m
        ON (p.man_id        = m.id)
      LEFT JOIN product_images pi
        ON (
             p.mfgno_stripped = pi.mfgno_stripped
         AND p.man_id         = pi.man_id
           )
      LEFT JOIN category.product_map pm
        ON (
             p.mfgno_stripped = pm.mfgno_stripped
         AND p.man_id         = pm.manid
           )
     WHERE (
             (upper(p.mfgno_stripped) LIKE '%1GB%')
         OR (upper(m.name) LIKE '%1GB%')
         OR (upper(p.sdescr) LIKE '%1GB%')
           )
INTERSECT
    SELECT m.name as manf
         , p.mfgno_stripped as mfgno
         , p.onhand
         , p.price
         , p.sdescr
         , pi.img
         , pm.categoryid
      FROM products p
      JOIN man m
        ON (p.man_id        = m.id)
      LEFT JOIN product_images pi
        ON (
             p.mfgno_stripped = pi.mfgno_stripped
         AND p.man_id         = pi.man_id
           )
      LEFT JOIN category.product_map pm
        ON (
             p.mfgno_stripped = pm.mfgno_stripped
         AND p.man_id         = pm.manid
           )
     WHERE (
             (upper(p.mfgno_stripped) LIKE '%120GB%')
         OR (upper(m.name) LIKE '%120GB%')
         OR (upper(p.sdescr) LIKE '%120GB%')
           );

Note that these queries are old and will be replaced with some tsearch2 
functionality as soon as it makes its way through testing.

I think the deadlocks aren't load related but speed related.  That is, if the 
acquiring of all the locks by the execute script takes longer on a slower 
machine, the window of opportunity for one of these selects to cause a 
deadlock seems greater, no?  They do seem to happen on the slower machine more 
regularly than the faster one.

-- 
Jeff Frost, Owner 	<jeff at frostconsultingllc.com>
Frost Consulting, LLC 	http://www.frostconsultingllc.com/
Phone: 650-780-7908	FAX: 650-649-1954


More information about the Slony1-bugs mailing list