Tue Feb 26 13:12:44 PST 2008
- Previous message: [Slony1-general] failover problems with 3 nodes
- Next message: [Slony1-general] slonik_uninstall_nodes unsafe ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I have some more data that may shed some light for you gurus. To recap, we had an episode where slony replication stopped--that is, no SYNC events were being processed. At that time we ran a couple queries to try to find out why. We did a select on pg_locks, and unfortunately, as previously explained, I don't have that data now, but will not make that mistake the next time. We also ran this query: select * from pg_stat_activity where current_query != '<IDLE>'; I do have the results from that, and it shows at least the time sequence of the activity. Based on this, it appears that Slony is waiting for an index creation to complete. The rows below have been cleaned to protect the innocent. Process ID | User | Waiting | Query Start | Query ------------------------------------------------------------------------------------------------------------------------- 407 | postgres | t | (null) | ANALYZE myapp.a_large_app_table 27479 | appusr | f | 2008-02-21 19:43:02 | create index a_large_app_table_ak1 on myapp.a_large_app_table(col1,col2) tablespace my_table_space 7061 | slony | t | 2008-02-21 19:43:16 | listen "_slony_Event"; 7073 | slony | t | 2008-02-21 19:43:18 | commit transaction; 7071 | slony | t | 2008-02-21 19:43:25 | notify "_slony_Event"; notify "_slony_Confirm"; insert into "_slony".sl_event(ev_origin,... 28045 | appusr | t | 2008-02-21 19:43:41 | commit 7072 | slony | t | 2008-02-21 19:47:45 | select "_slony".cleanupEvent(); Important things to note -- I only have a single replication set, and "a_large_app_table" is not in it. This large table does not have any foreign key relationships to any other tables. Therefore, I do not expect an index being created on this table--even if it takes days to generate--would block slony. On top of that, Slony did not have any data to replicate, so it was only doing it's bare minimum SYNC logging, and apparently, a cleanupEvent(). Our attention was drawn to this for two reasons. First, we have a "health monitoring" batch script that simply checks the sl_event table for recent SYNCs and alerts us if they do not exist. Second, process_id 28045 was blocked waiting to commit some inserts, and that app has a health check that alerted us. The table the app is trying to insert into is also not in the replication set nor does it have any relationships. So we can't imagine why it would be blocked by Slony or the index creation. That app has never been blocked by anything before Slony, so the off-the-top assumption is that it was blocked by Slony, but why would Slony be blocked by an index creation on an unrelated table? I don't know what else the data could suggest. This has happened twice. The next time, I'll be better prepared to collect data.
- Previous message: [Slony1-general] failover problems with 3 nodes
- Next message: [Slony1-general] slonik_uninstall_nodes unsafe ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list