Fri Jan 16 09:06:34 PST 2009
- Previous message: [Slony1-general] adding a new node to an existing cluster
- Next message: [Slony1-general] BUG? Logshipping causes failure when slony waiting on older transactions to finish
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I consider things carefully before throwing out the "BUG" word. I believe I have uncovered a bug that is related to logshipping. This is not a show-stopper bug and does not cause missing data or data corruption or anything like that. However it is a repeatable pattern and makes my life with Slony more difficult than necessary. We are a growing organization with a rapidly changing schema. So we have a relatively frequent need to add new tables into the schema and also into replication. I developed a slonik shell script to make adding new tables easy. It creates a temporary set (ID = 999), adds the table to it, subscribes the new set, then merges set 999 into my real set (ID = 1). When the database is not busy, this works perfectly. Before I started using log shipping, we would often see the pattern that looks like lines 1 through 12 in the log below. We would then do our investigating to figure out what the older transaction was and eliminate it or simply let Slony wait for that older transaction to complete naturally. In either case, once the older transaction was out of the way, Slony would complete the merge successfully. After log shipping, we see the pattern you see below, and it will happen every time the merge is waiting behind older transactions. What you see below is the merge starts (line 1), then fails because there are older transactions in the way. Slony sleeps for 15 seconds and tries again. He fails again because the older transactions are still in the way. Slony sleeps for 30 seconds then tries again. This time, the older transaction(s) have cleared up, so Slony tries to continue with the merge request. However, he fails because he is unable to write to the log shipping archive file (line 16). The BUG, I believe, has something to do with Slony closing the file handle or otherwise losing access to the file when he hits the wait condition. There is nothing, in reality, wrong with that tmp file--no other reason Slony is unable to write to it. At this point, the way I "fix" this is to simply stop and restart the slon process on the subscriber node. (The node who creates the log shipping files.) I also have to then manually dropset my temp set (ID = 999) and start my merge over when the database is not so busy. By the way, my configuration uses log_level=1 to reduce noise in the log files. The lines below are all the lines in that time period. I modified the schema, table, and dir path name to protect the innocent. ========================================================================================================= 01: 2009-01-15 16:29:07 EST CONFIG storeSet: set_id=999 set_origin=1 set_comment='temporary merge set to add a table into an existing set' 02: 2009-01-15 16:29:07 EST CONFIG storeSubscribe: sub_set=999 sub_provider=1 sub_forward='t' 03: 2009-01-15 16:29:07 EST CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1 04: 2009-01-15 16:29:07 EST DEBUG1 copy_set 999 05: 2009-01-15 16:29:07 EST DEBUG1 remoteWorkerThread_1: connected to provider DB 06: 2009-01-15 16:29:07 EST WARN remoteWorkerThread_1: transactions earlier than XID 1077488892 are still in progress 07: 2009-01-15 16:29:07 EST WARN remoteWorkerThread_1: data copy for set 999 failed - sleep 15 seconds 08: 2009-01-15 16:29:22 EST DEBUG1 copy_set 999 09: 2009-01-15 16:29:22 EST DEBUG1 remoteWorkerThread_1: connected to provider DB 10: 2009-01-15 16:29:22 EST WARN remoteWorkerThread_1: transactions earlier than XID 1077488892 are still in progress 11: 2009-01-15 16:29:22 EST WARN remoteWorkerThread_1: data copy for set 999 failed - sleep 30 seconds 12: WARNING: there is no transaction in progress 13: 2009-01-15 16:29:52 EST DEBUG1 copy_set 999 14: 2009-01-15 16:29:52 EST DEBUG1 remoteWorkerThread_1: connected to provider DB 15: NOTICE: truncate of "my_schema"."my_new_table" succeeded 16: 2009-01-15 16:29:52 EST ERROR remoteWorkerThread_1: Cannot write to archive file /opt/slony/slony1_log_2_00000000000001428342.sql.tmp - not open 17: 2009-01-15 16:29:52 EST WARN remoteWorkerThread_1: data copy for set 999 failed - sleep 60 seconds 18: WARNING: there is no transaction in progress =========================================================================================================
- Previous message: [Slony1-general] adding a new node to an existing cluster
- Next message: [Slony1-general] BUG? Logshipping causes failure when slony waiting on older transactions to finish
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list