Troy Wolf troy at troywolf.com
Tue Feb 26 13:12:44 PST 2008
I have some more data that may shed some light for you gurus.

To recap, we had an episode where slony replication stopped--that is,
no SYNC events were being processed. At that time we ran a couple
queries to try to find out why. We did a select on pg_locks, and
unfortunately, as previously explained, I don't have that data now,
but will not make that mistake the next time. We also ran this query:

select * from pg_stat_activity where current_query != '<IDLE>';

I do have the results from that, and it shows at least the time
sequence of the activity. Based on this, it appears that Slony is
waiting for an index creation to complete. The rows below have been
cleaned to protect the innocent.

Process ID  | User      | Waiting   | Query Start           | Query
-------------------------------------------------------------------------------------------------------------------------
407         | postgres  | t         | (null)                | ANALYZE
myapp.a_large_app_table
27479       | appusr    | f         | 2008-02-21 19:43:02   | create
index a_large_app_table_ak1 on myapp.a_large_app_table(col1,col2)
tablespace my_table_space
7061        | slony     | t         | 2008-02-21 19:43:16   | listen
"_slony_Event";
7073        | slony     | t         | 2008-02-21 19:43:18   | commit
transaction;
7071        | slony     | t         | 2008-02-21 19:43:25   | notify
"_slony_Event"; notify "_slony_Confirm"; insert into
"_slony".sl_event(ev_origin,...
28045       | appusr    | t         | 2008-02-21 19:43:41   | commit
7072        | slony     | t         | 2008-02-21 19:47:45   | select
"_slony".cleanupEvent();

Important things to note -- I only have a single replication set, and
"a_large_app_table" is not in it. This large table does not have any
foreign key relationships to any other tables. Therefore, I do not
expect an index being created on this table--even if it takes days to
generate--would block slony. On top of that, Slony did not have any
data to replicate, so it was only doing it's bare minimum SYNC
logging, and apparently, a cleanupEvent(). Our attention was drawn to
this for two reasons. First, we have a "health monitoring" batch
script that simply checks the sl_event table for recent SYNCs and
alerts us if they do not exist. Second, process_id 28045 was blocked
waiting to commit some inserts, and that app has a health check that
alerted us. The table the app is trying to insert into is also not in
the replication set nor does it have any relationships. So we can't
imagine why it would be blocked by Slony or the index creation. That
app has never been blocked by anything before Slony, so the
off-the-top assumption is that it was blocked by Slony, but why would
Slony be blocked by an index creation on an unrelated table?

I don't know what else the data could suggest. This has happened
twice. The next time, I'll be better prepared to collect data.


More information about the Slony1-general mailing list