Brian Fehrle brianf at consistentstate.com
Mon Nov 22 08:45:52 PST 2010
Hi all,
	I have a slony master to slave set up, on slony 1.2.20 and postgres 8.4.2. It is set up so that the slave daemon also writes everything out to an archive file with the -x command on the slon daemon. Just a bit ago, I started getting this message in the slave daemon log:

2010-11-22 08:04:10 PST DEBUG2 remoteWorkerThread_1: SYNC 1317332 processing
2010-11-22 08:04:10 PST ERROR  remoteWorkerThread_1: Cannot open archive file /usr/local/pgsql/slon_logs/slony1_log_2_00000000000001092631.sql.tmp - Too many open files
2010-11-22 08:04:10 PST DEBUG2 remoteListenThread_1: queue event 1,1317688 SYNC
2010-11-22 08:04:31 PST DEBUG1 cleanupThread:    0.002 seconds for cleanupEvent()
2010-11-22 08:04:31 PST DEBUG1 cleanupThread:    0.002 seconds for delete logs
2010-11-22 08:04:31 PST DEBUG3 cleanupThread: minxid: 4780671
2010-11-22 08:04:31 PST DEBUG2 cleanupThread:    0.188 seconds for vacuuming
2010-11-22 08:05:10 PST DEBUG2 remoteWorkerThread_1: SYNC 1317332 processing
2010-11-22 08:05:10 PST ERROR  remoteWorkerThread_1: Cannot open archive file /usr/local/pgsql/slon_logs/slony1_log_2_00000000000001092631.sql.tmp - Too many open files	
2010-11-22 08:05:10 PST DEBUG2 remoteListenThread_1: queue event 1,1317689 SYNC


The file exists, has the same permissions as all my other archive files, but is empty. I restarted the slon daemons, but now I don't get any error messages from the daemon log, and just get:

(many many lines of simple entry - Finished number) 
2010-11-22 08:39:23 PST DEBUG4 simple entry - 25055376
2010-11-22 08:39:23 PST DEBUG4 Finished number: 25057024
2010-11-22 08:39:23 PST DEBUG4 simple entry - 25057867
2010-11-22 08:39:23 PST DEBUG4 simple entry - 25057024
2010-11-22 08:39:23 PST DEBUG2 slon: child terminated status: 11; pid: 24931, current worker pid: 24931
2010-11-22 08:39:23 PST DEBUG1 slon: restart of worker in 10 seconds

And it just loops. I also removed that empty file and a new one is created, but is also empty. 


Anyone know what could have caused this? The only think I know of that was going on was adding additional replication sets to the slony cluster at the time. I am thinking it may be more of a linux thing with too many open files rather than postgres/slony, as I haven't found anything related to slony/postgres when googling around for these errors.

Thanks in advance for any feedback,
- Brian F



More information about the Slony1-general mailing list