Craig A. James cjames at emolecules.com
Thu Feb 28 18:17:29 PST 2008
Jan Wieck wrote:
> On 2/25/2008 9:59 AM, Craig A. James wrote:
>> I sent the attached email ony to Jan by mistake, here it is, I hope 
>> Jan or somebody has an idea what's going on.
> 
> What I still don't understand is what that stored procedure moveSet() is 
> waiting for. Are there any pg_locks entries with granted='f' at all?

Forget the whole question, the mystery is solved.

A rogue process with a bug in it had created five million tables (that's right, five MILLION tables) in the database.  The process was creating about 50,000 tables per day, and had been running for almost three months.

Now we can't even do a single "pg_dump --table" because getting an exclusive lock seems to take forever and a day.  So we can't dump it, and we can't use Slony to move the node.  Fortunately, we already had the most important tables replicated via Slony, which was still working well, so we were able to just drop Slony replication and transfer the whole load to the backup database.  We'll simply discard the corrupted database and recreate it with Slony.

What's really amazing is that Postgres works at all with five million tables.  Performance had degraded just a bit, but not enough to trigger any alarms, we didn't even notice that anything was wrong for three months, until we tried to do some sysadmin (move a Slony master node) and couldn't.

Thanks for all your help.

Craig


More information about the Slony1-general mailing list