Tue Apr 29 17:23:00 PDT 2008
- Previous message: [Slony1-general] Slony version 1.2.14?
- Next message: re[Slony1-general] try: subscrptions lost after move set ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Jennifer, Thanks for the info! Indeed, when I query the sl_archive_tracking table, I see that the most recently applied SQL file is #44: shahaf_logshipped_slave=# select * from _shahaf_cluster.sl_archive_tracking; at_counter | at_created | at_applied ------------+---------------------------+---------------------------- 44 | 2008-04-29 13:41:26.20039 | 2008-04-29 13:56:24.255029 (1 row) So, it makes sense that the logshipped slave would want #45 next. To produce the SQL files I simply used the -a flag on the slon process for my regular slave, as described here: http://lists.slony.info/documentation/logshipping.html I have a directory (/data/tmp/shahaf/logshipping/SYNC) that now includes several hundred files like this: [root at idb SYNC]# ls | more slony1_log_10_00000000000000000001.sql slony1_log_10_00000000000000000002.sql slony1_log_10_00000000000000000003.sql slony1_log_10_00000000000000000004.sql slony1_log_10_00000000000000000005.sql And, this directory does in fact contain files for the segment where the problem occurred... <snip> slony1_log_10_00000000000000000043.sql slony1_log_10_00000000000000000044.sql slony1_log_10_00000000000000000045.sql slony1_log_10_00000000000000000046.sql slony1_log_10_00000000000000000047.sql <snip> Also, I verified that the file name and the contents are in sync. In other words, the file that ends in 045.sql includes the following: select "_shahaf_cluster".archiveTracking_offline('45', '2008-04-29 13:41:36.201339'); To apply these logs, I'm using the /usr/bin/slony_logshipper tool, also described here: http://lists.slony.info/documentation/logshipping.html So, either something is wrong with my setup, or there's something wrong with the slony_logshipper tool. To dig a little further, I tried an experiment. I started with a fresh shahaf_logshipped_slave DB and I applied the DB dump to it (the one I got from the slony1_dump.sh script). I then checked the sl_archive_tracking table and found that at_counter = 34. Then I manually applied the scripts from the SYNC directory by feeding them into psql one by one, starting with 035.sql. This worked. I was able to go as high as 050.sql and I didn't see the issue described below. Then I repeated my experiment by once again starting with a fresh DB on which I applied the dump. This time I went back to using the slony_logshipper tool to do the work of choosing and applying the scripts, one by one. And once it failed with the same issue, only this time it happened at #42/43 instead of #44/45. Here's what I see in the log file produced by the slony_logshipper tool: INFO 2008-04-29 17:18:07 > Processing archive file ./SYNC/slony1_log_10_00000000000000000040.sql INFO 2008-04-29 17:18:07 > NOTICE: Slony-I: Process archive with counter 40 created 2008-04-29 13:40:46.188066 INFO 2008-04-29 17:18:07 > Processing archive file ./SYNC/slony1_log_10_00000000000000000041.sql INFO 2008-04-29 17:18:07 > NOTICE: Slony-I: Process archive with counter 41 created 2008-04-29 13:40:56.191449 INFO 2008-04-29 17:18:07 > Processing archive file ./SYNC/slony1_log_10_00000000000000000042.sql WARN 2008-04-29 17:18:07 > skip archive with counter (r)V - already applied INFO 2008-04-29 17:18:07 > Processing archive file ./SYNC/slony1_log_10_00000000000000000043.sql ERROR 2008-04-29 17:18:07 > PGRES_FATAL_ERROR: ERROR: Slony-I: node is on archive counter 41, this archive log expects 42 Query was: select "_shahaf_cluster".archiveTracking_offline('43', '2008-04-29 13:41:16.196712'); WARN 2008-04-29 17:18:07 > waiting for resume So, something went sideways when the tool tried to process file #42. It decided to skip it (I'm not sure what "counter (r)V" is all about). As a result, #43 failed. How do you go about running the SQL scripts on your logshipped slave? Thanks, --S -----Original Message----- From: Jennifer Spencer [mailto:jennifer at sun.Stanford.EDU] Sent: Tuesday, April 29, 2008 4:39 PM To: Shahaf Abileah Cc: slony1-general at lists.slony.info Subject: Re: [Slony1-general] node is on archive counter 44, this archive log expects 45 Hi Shahaf, I have a similar setup, using log shipping. I have noticed that this particular error message is not as helpful as you might want. In fact, the log-shipper may actually want log #46 or #246. Just not the one you gave it (#44). I think the error message just gives you N+1, where N is the number of the log you tried to feed it. Your log-shipped slave has a table called _shahaf_cluster.sl_archive_tracking. It should have been created by the slony1_dump.sh script. That table will contain one row, and the "at counter" (first field there) is the log number it ingested last. It wants the at_counter + 1. So, in the slavedb, select * from _sharaf_cluster.sl_archive_tracking; Your slony logs should be numbered, monotonically increasing, and you should have that next log number in the directory where those logs live. If you have that correct log, have you tried applying it? Which script are you using to apply the logs, anyway? As to how it got "off" from the correct order, I am not sure. You'd have to check any logging programs you have, whose names are hopefully in the log-application script. -Jennifer Spencer Shahaf Abileah wrote: > Hello, > > > > I'm trying to use the slony_logshipper tool to update a log-shipped > slave. I kicked off the process and monitored the log file. After a > short time, I noticed this in the log: > > > > ERROR 2008-04-29 13:56:24 > PGRES_FATAL_ERROR: ERROR: Slony-I: node is > on archive counter 44, this archive log expects 45 > > Query was: select "_shahaf_cluster".archiveTracking_offline('46', > '2008-04-29 13:41:46.204843'); > > WARN 2008-04-29 13:56:24 > waiting for resume > > > > I can see the archiveTracking_offline function in the dump file I got > using slony1_dump.sh, the one I used to initialize my logshipped slave. > But I'm not familiar with the details of this replication scheme, so I'm > not sure why the archive counter would have gotten out of sync (whatever > that means). > > > > Can anyone offer suggestions on how to fix this issue or how to analyze > it further? > > > > Thanks, > > > > --S > > > > > > Shahaf Abileah | Lead Software Developer - Data Team > > shahaf at redfin.com <mailto:shahaf at redfin.com> | tel: 206.859.2869 | > cell: 206.331.2057 | www.redfin.com <http://www.redfin.com> > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Slony1-general mailing list > Slony1-general at lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general
- Previous message: [Slony1-general] Slony version 1.2.14?
- Next message: re[Slony1-general] try: subscrptions lost after move set ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list