From francescoboccacci at libero.it Thu Feb 2 05:30:18 2012 From: francescoboccacci at libero.it (francescoboccacci at libero.it) Date: Thu, 2 Feb 2012 14:30:18 +0100 (CET) Subject: [Slony1-general] change password Message-ID: <10925035.957191328189418573.JavaMail.defaultUser@defaultHost> Hi, I?m using a Postgresql database version: 8.4 and I have some tables in replica between two servers with slony 2.0.6. I need, for security issues, to change the password of the master database. Is there a way to change the password stored in the cluster without stopping the replica and start again from the beginning? Thank you Francesco From rajjmalhotra25 at yahoo.com Thu Feb 2 09:13:35 2012 From: rajjmalhotra25 at yahoo.com (Raj Malhotra) Date: Thu, 2 Feb 2012 09:13:35 -0800 (PST) Subject: [Slony1-general] What is the behavior of slony wait for event after lock set command. Message-ID: <1328202815.77163.YahooMailNeo@web122511.mail.ne1.yahoo.com> We are trying to do a swith over and it is taking a lot of time. We are following the following steps to perform switchover: 1. Lock Set 2. Wait for Event (timout = 300) 3. Move Set (timeout = 300) 4. Wait for Event (timout = 300) The alt perl script we have with slony source has the following sequence: 1. Lock Set 2. Sync 3. Wait for Event 4. Move Set What should be the correct sequence of event and how can I make switchover faster. Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120202/1bdeb137/attachment.htm From cbbrowne at afilias.info Thu Feb 2 11:27:25 2012 From: cbbrowne at afilias.info (Christopher Browne) Date: Thu, 2 Feb 2012 14:27:25 -0500 Subject: [Slony1-general] change password In-Reply-To: <10925035.957191328189418573.JavaMail.defaultUser@defaultHost> References: <10925035.957191328189418573.JavaMail.defaultUser@defaultHost> Message-ID: Sure, you can submit STORE PATH with the new password, and that will propagate across the cluster. (Likely several requests, one for each communications path.) It is considered better form to use a .pgpass file, so passwords are not visible in slony configuration altogether. You will also need to revise the configuration used to launch slon processes. Look for slon.conf in the documentation. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120202/61d1bb1f/attachment.htm From ssinger at ca.afilias.info Thu Feb 2 17:00:25 2012 From: ssinger at ca.afilias.info (Steve Singer) Date: Thu, 02 Feb 2012 20:00:25 -0500 Subject: [Slony1-general] Slony-I 2.1.1 & 1.2.23 released Message-ID: <4F2B31A9.2050602@ca.afilias.info> The Slony-I team is pleased to announce the release of Slony-I 2.1.1 a bug fix release for the 2.1 stream. and Slony-I 1.2.23 a bug fix release in the 1.2 stream. 2.1.1 ============= - Bug #260 :: Fixed issue when with FAILOVER command when the failed node has multiple sets. - Bug #246 :: Include path order changes - Bug #161 :: fix memory overrun in EXECUTE SCRIPT parser - Bug #247 :: slony_logshipper to handle TRUNCATE commands - Bug #249 :: Add parentheses to txid_current() in function for TRUNCATE logging - slonik_drop_table and slonik_drop_sequence no longer attempt to return -1 on an error (invalid as a slonik exit code in 2.1) - Bug #244 :: The CREATE SET command now requires a set id to be specified. - Bug #255 :: Fix serialization conflict issues when using PostgreSQL 9.1. - Bug #256 :: set_conf_option() has an extra elevel parameter on PG 9.2 - Bug #259 :: Fix TRUNCATE logging so it works with mixed case slony clusters. http://www.slony.info/downloads/2.1/source/slony1-2.1.1.tar.bz2 http://www.slony.info/downloads/2.1/source/slony1-2.1.1-docs.tar.bz2 1.2.23 =========== - Bug #195 - make slon_quote_* functions immutable - Bug #209 - dollar quoting doesn't work on PG 7.4 - Bug #224 - PKEYEDTABLES misspelled in altperl script - Bug #236 - fix misformatting of log string for timestamp - Bug #239 - Fix FAILOVER on PG 9.0 by not querying pg_listener http://www.slony.info/downloads/1.2/source/slony1-1.2.23.tar.bz2 http://www.slony.info/downloads/1.2/source/slony1-1.2.23-docs.tar.bz2 Unless additional maintainers from the community step up 1.2.23 is likely to be the last 1.2.x release. From ssinger at ca.afilias.info Thu Feb 2 17:16:02 2012 From: ssinger at ca.afilias.info (Steve Singer) Date: Thu, 02 Feb 2012 20:16:02 -0500 Subject: [Slony1-general] Slony 2.0.x and PostgreSQL 9.1 Message-ID: <4F2B3552.9000007@ca.afilias.info> Those of you paying attention noticed that I just released 1.2.23 and 2.1.1 but not 2.0.8 We had applied the fix for bug #255 the 'serialization conflicts with PostgreSQL 9.1' on the 2.0 branch in addition to the 2.1 branch. When testing 2.0.8 RC2 I noticed that once in a while the disorder - "MoveSet" test in the clustertest framework gave me failures where the nodes could end up with different data. The MoveSet test does 11 move sets between nodes in the cluster. If I run the MoveSet test in a loop 10 times I would typicaly get one failed test run. On the version before the bug #255 commit I was never able to reproduce that failure on many runs. I have also not been able to reproduce that failure on the 2.1.1 with which includes a fix for #255. (which is why I released 2.1.1) (All of the above is with PostgreSQL 9.1). My inclination is to take out the fix for #255 from 2.0.8 and also put back in the warning about 9.1 "might be unsupported" that you would see with 2.0.7. We can also leave the bug #255 fix in and put the warning back. If someone can figure out why this is failing in 2.0 then we do talk about addressing that. I can't rule out it being a problem in the test itself, but nothing pops out at me (A version of the disorder tests back-ported to 2.0 can be found at https://github.com/ssinger/slony1-engine/tree/REL_2_0_STABLE-clustertest) From francescoboccacci at libero.it Fri Feb 3 01:31:40 2012 From: francescoboccacci at libero.it (francescoboccacci at libero.it) Date: Fri, 3 Feb 2012 10:31:40 +0100 (CET) Subject: [Slony1-general] R: Re: change password Message-ID: <32495585.22084061328261500476.JavaMail.defaultUser@defaultHost> Thanks , i'll try and i let you know. ----Messaggio originale---- Da: cbbrowne at afilias.info Data: 02/02/2012 20.27 A: "francescoboccacci at libero.it" Cc: Ogg: Re: [Slony1-general] change password Sure, you can submit STORE PATH with the new password, and that will propagate across the cluster. (Likely several requests, one for each communications path.) It is considered better form to use a .pgpass file, so passwords are not visible in slony configuration altogether. You will also need to revise the configuration used to launch slon processes. Look for slon.conf in the documentation. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120203/5f3460f5/attachment.htm From efraindector at motumweb.com Wed Feb 15 11:28:59 2012 From: efraindector at motumweb.com (=?utf-8?Q?Efra=C3=ADn_D=C3=A9ctor?=) Date: Wed, 15 Feb 2012 13:28:59 -0600 Subject: [Slony1-general] Cascading replication using altperl scripts. Message-ID: Hello list: My name is Efrain and this week I?ve been using Slony-I, with the help of the documentation I?ve been able to setup replication between 2 nodes using slonik scripts and also with altperl scripts. For me, using altperl scritps is more comfortable but now I?m stucked trying to set cascading replication. I have this in mind: A1 ?> B1 ?> B2 Meaning that A1 would be the master to B1 but also B1 would be master of B2. My slon_tools.conf looks like this: add_node(node => 1, host => '192.168.1.231', dbname => 'prueba', port => 5432, user => 'postgres', password => '1'); add_node(node => 2, host => '192.168.1.232', dbname => 'prueba', port => 5432, noforward => allow, user => 'pgsql', password => '2'); add_node(node => 3, parent => 2, host => '192.168.1.233', dbname => 'prueba', port => 5432, user => 'pgsql', password => '3'); I understand that B1 shall be configured to accept forward but I don?t now how to do this. Any help would be really appreciated. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120215/0e47fe48/attachment.htm From smccloud at geo-comm.com Thu Feb 16 07:59:05 2012 From: smccloud at geo-comm.com (Shaun McCloud) Date: Thu, 16 Feb 2012 15:59:05 +0000 Subject: [Slony1-general] Slony not replicating changes Message-ID: <7742DD496427B743BC8B7BBF6D380BA0373B683F@EXCHANGE10.geo-comm.local> Hello, I'm working on a project where we are using pg_dump to get the data out of one database and then using psql to load it into another. The second database is the master db in a Slony replication cluster. However, no matter what options I specify in pg_dump the changes are not being replicated. I've use a standard dump which uses COPY, a --inserts dump which uses insert into & a --insert-columns dump which adds column names to the insert into statements. However, if I edit the data in pgAdmin III using the view option for the table, that change is replicated just fine. What is the difference between editing the data that way in pgAdmin III and using psql to load a dump file? Shaun McCloud - Associate Software Developer Geo-Comm, Inc 601 W. Saint Germain St., Saint Cloud, MN 56301 Office: 320.240.0040 Fax: 320.240.2389 Toll Free: 888.436.2666 click here to visit www.geo-comm.com [cid:image001.jpg at 01CCEC91.9D9BFAC0] [cid:image002.jpg at 01CCEC91.9D9BFAC0] P Think before you print! Microsoft Certified Desktop Support Technician (MCDST) Do or do not, there is no try. -Yoda -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120216/63620cd7/attachment.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 1129 bytes Desc: image001.jpg Url : http://lists.slony.info/pipermail/slony1-general/attachments/20120216/63620cd7/attachment.jpg -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 1068 bytes Desc: image002.jpg Url : http://lists.slony.info/pipermail/slony1-general/attachments/20120216/63620cd7/attachment-0001.jpg From glynastill at yahoo.co.uk Thu Feb 16 08:38:12 2012 From: glynastill at yahoo.co.uk (Glyn Astill) Date: Thu, 16 Feb 2012 16:38:12 +0000 (GMT) Subject: [Slony1-general] Slony not replicating changes In-Reply-To: <7742DD496427B743BC8B7BBF6D380BA0373B683F@EXCHANGE10.geo-comm.local> References: <7742DD496427B743BC8B7BBF6D380BA0373B683F@EXCHANGE10.geo-comm.local> Message-ID: <1329410292.58988.YahooMailNeo@web171408.mail.ir2.yahoo.com> Is there any possibility you made your dump with --disable-triggers ? >________________________________ > From: Shaun McCloud >To: "slony1-general at lists.slony.info" >Sent: Thursday, 16 February 2012, 15:59 >Subject: [Slony1-general] Slony not replicating changes > > > >Hello, >? >I?m working on a project where we are using pg_dump to get the data out of one database and then using psql to load it into another.? The second database is the master db in a Slony replication cluster.? However, no matter what options I specify in pg_dump the changes are not being replicated.? I?ve use a standard dump which uses COPY, a --inserts dump which uses insert into & a --insert-columns dump which adds column names to the insert into statements.? However, if I edit the data in pgAdmin III using the view option for the table, that change is replicated just fine.? What is the difference between editing the data that way in pgAdmin III and using psql to load a dump file? >? >Shaun McCloud ? Associate Software Developer >Geo-Comm, Inc >601 W. Saint Germain St., Saint Cloud, MN 56301 >Office: 320.240.0040 Fax: 320.240.2389 Toll Free: 888.436.2666 >click here to visit www.geo-comm.com >PThink before you print! >Microsoft Certified Desktop Support Technician (MCDST) >Do or do not, there is no try. >??? -Yoda >? >_______________________________________________ >Slony1-general mailing list >Slony1-general at lists.slony.info >http://lists.slony.info/mailman/listinfo/slony1-general > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120216/faf82257/attachment.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 1129 bytes Desc: image001.jpg Url : http://lists.slony.info/pipermail/slony1-general/attachments/20120216/faf82257/attachment.jpg -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 1068 bytes Desc: image002.jpg Url : http://lists.slony.info/pipermail/slony1-general/attachments/20120216/faf82257/attachment-0001.jpg From smccloud at geo-comm.com Fri Feb 17 06:30:24 2012 From: smccloud at geo-comm.com (Shaun McCloud) Date: Fri, 17 Feb 2012 14:30:24 +0000 Subject: [Slony1-general] Slony not replicating changes Message-ID: <7742DD496427B743BC8B7BBF6D380BA0373D145F@EXCHANGE10.geo-comm.local> Glyn, My code is not creating the dump with --disable-triggers. So far, I have only been able to get messages to replicate if I use the view option in pgAdmin III or ESRI's ArcCatalog to edit the data (it is GIS data stored with PostGIS types). Shaun McCloud - Associate Software Developer Geo-Comm, Inc 601 W. Saint Germain St., Saint Cloud, MN 56301 Office: 320.240.0040 Fax: 320.240.2389 Toll Free: 888.436.2666 click here to visit www.geo-comm.com [cid:image001.jpg at 01CCED4E.64C7F600] [cid:image002.jpg at 01CCED4E.64C7F600] P Think before you print! Microsoft Certified Desktop Support Technician (MCDST) Do or do not, there is no try. -Yoda -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120217/5a5dd9c7/attachment.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 1129 bytes Desc: image001.jpg Url : http://lists.slony.info/pipermail/slony1-general/attachments/20120217/5a5dd9c7/attachment.jpg -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 1068 bytes Desc: image002.jpg Url : http://lists.slony.info/pipermail/slony1-general/attachments/20120217/5a5dd9c7/attachment-0001.jpg From greg at endpoint.com Fri Feb 17 08:44:41 2012 From: greg at endpoint.com (Greg Sabino Mullane) Date: Fri, 17 Feb 2012 11:44:41 -0500 Subject: [Slony1-general] Slony not replicating changes In-Reply-To: <7742DD496427B743BC8B7BBF6D380BA0373D145F@EXCHANGE10.geo-comm.local> References: <7742DD496427B743BC8B7BBF6D380BA0373D145F@EXCHANGE10.geo-comm.local> Message-ID: <20120217164441.GM2889@tinybird.home> On Fri, Feb 17, 2012 at 02:30:24PM +0000, Shaun McCloud wrote: > My code is not creating the dump with --disable-triggers. There are very few things it could be then. If it works in other situations, that means the Slony triggers are enabled and working on the tables, which rules out one possibility. Open the pg_dump output in a text editor and make sure it is using INSERTs, and that there is nothing funny at the top of the script (e.g. changing the session_replication_role or disabling triggers). Also, I'm assuming this is a --data-only pg_dump? Finally, make sure your are loading the output into the correct database and that no search_path/schema tweaks could be affecting things. If all else fails, see if you can create a simple test case (e.g. pg_dump a single small table) and post it here. -- Greg Sabino Mullane greg at endpoint.com End Point Corporation PGP Key: 0x14964AC8 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 163 bytes Desc: not available Url : http://lists.slony.info/pipermail/slony1-general/attachments/20120217/48b3a139/attachment.pgp From smccloud at geo-comm.com Fri Feb 17 08:52:06 2012 From: smccloud at geo-comm.com (Shaun McCloud) Date: Fri, 17 Feb 2012 16:52:06 +0000 Subject: [Slony1-general] Slony not replicating changes In-Reply-To: <20120217164441.GM2889@tinybird.home> References: <7742DD496427B743BC8B7BBF6D380BA0373D145F@EXCHANGE10.geo-comm.local> <20120217164441.GM2889@tinybird.home> Message-ID: <7742DD496427B743BC8B7BBF6D380BA0373D15E1@EXCHANGE10.geo-comm.local> Greg, Due to performance reasons we are using a COPY command now (INSERT INTO was just for testing). With some of our tables having 5 milling + records we want it to go as fast as. You are correct in assuming it is a --data-only dump. Before the first COPY statement the dump contains the following. SET client_encoding = 'UTF8'; SET standard_conforming_strings = off; SET check_function_bodies = false; SET client_min_messages = warning; SET escape_string_warning = off; SET search_path = sde, pg_catalog; I have had the data replicate correctly once when loading the data for a single table using COPY, but not since then. I get the following statistics for the slave node. Last event - 465 Last event timestamp - 2/17/2012 10:49:46 Last acknowledged - 56 Last ack timestamp - 2/17/2012 09:20:05 Last response time - 0.016 s Outstand acks - 409 No ack for - 01:29:41.127 Hanging event - 57 Command - SYNC Also, I'm not sure if it matters; but we are using two engines for one Slony-I service. One for mostly static data and one for data that can change quite often. Each engine has its own DB and the data that can change quite often is always replicating fine, but it is always edited using ESRI's Arc Map at some point in the process. Shaun McCloud ? Associate Software Developer Geo-Comm, Inc 601 W. Saint Germain St., Saint Cloud, MN 56301 Office: 320.240.0040 Fax: 320.240.2389 Toll Free: 888.436.2666 click here to visit www.geo-comm.com ? Think before you print! Microsoft Certified Desktop Support Technician (MCDST) Do or do not, there is no try. -Yoda -----Original Message----- From: Greg Sabino Mullane [mailto:greg at endpoint.com] Sent: Friday, February 17, 2012 10:45 To: Shaun McCloud Cc: slony1-general at lists.slony.info Subject: Re: [Slony1-general] Slony not replicating changes On Fri, Feb 17, 2012 at 02:30:24PM +0000, Shaun McCloud wrote: > My code is not creating the dump with --disable-triggers. There are very few things it could be then. If it works in other situations, that means the Slony triggers are enabled and working on the tables, which rules out one possibility. Open the pg_dump output in a text editor and make sure it is using INSERTs, and that there is nothing funny at the top of the script (e.g. changing the session_replication_role or disabling triggers). Also, I'm assuming this is a --data-only pg_dump? Finally, make sure your are loading the output into the correct database and that no search_path/schema tweaks could be affecting things. If all else fails, see if you can create a simple test case (e.g. pg_dump a single small table) and post it here. -- Greg Sabino Mullane greg at endpoint.com End Point Corporation PGP Key: 0x14964AC8 From greg at endpoint.com Sun Feb 19 04:40:37 2012 From: greg at endpoint.com (Greg Sabino Mullane) Date: Sun, 19 Feb 2012 07:40:37 -0500 Subject: [Slony1-general] Slony not replicating changes In-Reply-To: <7742DD496427B743BC8B7BBF6D380BA0373D15E1@EXCHANGE10.geo-comm.local> References: <7742DD496427B743BC8B7BBF6D380BA0373D145F@EXCHANGE10.geo-comm.local> <20120217164441.GM2889@tinybird.home> <7742DD496427B743BC8B7BBF6D380BA0373D15E1@EXCHANGE10.geo-comm.local> Message-ID: <20120219124036.GV2889@tinybird.home> On Fri, Feb 17, 2012 at 04:52:06PM +0000, Shaun McCloud wrote: > Before the first COPY statement the dump contains the following. > > SET client_encoding = 'UTF8'; > SET standard_conforming_strings = off; > SET check_function_bodies = false; > SET client_min_messages = warning; > SET escape_string_warning = off; > SET search_path = sde, pg_catalog; Nothing unusual there. > I have had the data replicate correctly once when loading the data for > a single table using COPY, but not since then. Well I cannot see any other obvious problems, so I would suggest seeing if you can get that single table COPY pg_dump to work again, then expand the scope (if working) or narrow the scope (if not). > Also, I'm not sure if it matters; but we are using two engines for one > Slony-I service. One for mostly static data and one for data that can > change quite often. Each engine has its own DB and the data that can > change quite often is always replicating fine, but it is always edited > using ESRI's Arc Map at some point in the process. Ah, that could certainly be a factor - can you explain the layout a bit more to the list? -- Greg Sabino Mullane greg at endpoint.com End Point Corporation PGP Key: 0x14964AC8 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 163 bytes Desc: not available Url : http://lists.slony.info/pipermail/slony1-general/attachments/20120219/07627f89/attachment.pgp From ngramsky at cs.umd.edu Mon Feb 20 08:45:05 2012 From: ngramsky at cs.umd.edu (NewToSlony) Date: Mon, 20 Feb 2012 08:45:05 -0800 (PST) Subject: [Slony1-general] Syntax error? Message-ID: <33357837.post@talk.nabble.com> Hi, I'm trying to use the example in the 2.1.1 documentation but I keep getting a syntax error. I've noted that the doc shows cluster name = $CLUSTERNAME; is a valid syntax but unless I change the $ to a @ I get a syntax error, thus as I'm new and unsure I don't know if there are errors in the documentation or something I am missing. Never the less, my script as as follows: #!/bin/sh CLUSTERNAME= slony_example; /opt/local/lib/postgresql90/bin/slonik <<_EOL_ define CLUSTERNAME slony_example; cluster name = @CLUSTERNAME; node 1 admin conninfo = 'dbname=my_primary host=localhost user=user'; node 2 admin conninfo = 'dbname=my_rep host=localhost user=user'; #-- # init the first node. Its id MUST be 1. This creates the schema # _$CLUSTERNAME containing all replication system specific database # objects. #-- init cluster ( id=1, comment='Master Node'); #-- # Slony-I organizes tables into sets. The smallest unit a node can # subscribe is a set. The following commands create one set containing # all 4 pgbench tables. The master or origin of the set is node 1. #-- create set (id=1, origin=1, comment='All pgbench tables'); set add table (set id=1, origin=1, id=1, fully qualified name='public.pgbench_accounts', comment='accounts table'); set add table (set id=1, origin=1, id=2, fully qualified name='public.pgbench_branches', comment='branches table'); set add table (set id=1, origin=1, id=3, fully qualified name='public.pgbench_tellers', comment='tellers table'); set add table (set id=1, origin=1, id=4, fully qualified name='public.pgbench_history', comment='history table'); #-- # Create the second node (the slave) tell the 2 nodes how to connect to Slony-I 2.1.1 Documentation 10 / 163 # each other and how they should listen for events. #-- store node (id=2, comment = 'Slave node', event node=1); store path (server = 1, client = 2, conninfo='dbname=my_primary host=localhost user=user'); store path (server = 2, client = 1, conninfo='dbname=my_rep host=localhost user=user'); _EOF_ Yet I get the following syntax error: /tmp/slonik_example.sh: line 3: slony_example: command not found :24: ERROR: syntax error at or near _EOF_ -- View this message in context: http://old.nabble.com/Syntax-error--tp33357837p33357837.html Sent from the Slony-I -- General mailing list archive at Nabble.com. From efraindector at motumweb.com Mon Feb 20 09:56:23 2012 From: efraindector at motumweb.com (=?iso-8859-1?Q?Efra=EDn_D=E9ctor?=) Date: Mon, 20 Feb 2012 11:56:23 -0600 Subject: [Slony1-general] Syntax error? In-Reply-To: <33357837.post@talk.nabble.com> References: <33357837.post@talk.nabble.com> Message-ID: <4E6FA67E54034BA294B3DB3660A34B8D@CMOTUM25PC> Delete the _EOF_ at the end of the script. -----Mensaje original----- From: NewToSlony Sent: Monday, February 20, 2012 10:45 AM To: slony1-general at lists.slony.info Subject: [Slony1-general] Syntax error? Hi, I'm trying to use the example in the 2.1.1 documentation but I keep getting a syntax error. I've noted that the doc shows cluster name = $CLUSTERNAME; is a valid syntax but unless I change the $ to a @ I get a syntax error, thus as I'm new and unsure I don't know if there are errors in the documentation or something I am missing. Never the less, my script as as follows: #!/bin/sh CLUSTERNAME= slony_example; /opt/local/lib/postgresql90/bin/slonik <<_EOL_ define CLUSTERNAME slony_example; cluster name = @CLUSTERNAME; node 1 admin conninfo = 'dbname=my_primary host=localhost user=user'; node 2 admin conninfo = 'dbname=my_rep host=localhost user=user'; #-- # init the first node. Its id MUST be 1. This creates the schema # _$CLUSTERNAME containing all replication system specific database # objects. #-- init cluster ( id=1, comment='Master Node'); #-- # Slony-I organizes tables into sets. The smallest unit a node can # subscribe is a set. The following commands create one set containing # all 4 pgbench tables. The master or origin of the set is node 1. #-- create set (id=1, origin=1, comment='All pgbench tables'); set add table (set id=1, origin=1, id=1, fully qualified name='public.pgbench_accounts', comment='accounts table'); set add table (set id=1, origin=1, id=2, fully qualified name='public.pgbench_branches', comment='branches table'); set add table (set id=1, origin=1, id=3, fully qualified name='public.pgbench_tellers', comment='tellers table'); set add table (set id=1, origin=1, id=4, fully qualified name='public.pgbench_history', comment='history table'); #-- # Create the second node (the slave) tell the 2 nodes how to connect to Slony-I 2.1.1 Documentation 10 / 163 # each other and how they should listen for events. #-- store node (id=2, comment = 'Slave node', event node=1); store path (server = 1, client = 2, conninfo='dbname=my_primary host=localhost user=user'); store path (server = 2, client = 1, conninfo='dbname=my_rep host=localhost user=user'); _EOF_ Yet I get the following syntax error: /tmp/slonik_example.sh: line 3: slony_example: command not found :24: ERROR: syntax error at or near _EOF_ -- View this message in context: http://old.nabble.com/Syntax-error--tp33357837p33357837.html Sent from the Slony-I -- General mailing list archive at Nabble.com. _______________________________________________ Slony1-general mailing list Slony1-general at lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general From smccloud at geo-comm.com Tue Feb 21 06:25:39 2012 From: smccloud at geo-comm.com (Shaun McCloud) Date: Tue, 21 Feb 2012 14:25:39 +0000 Subject: [Slony1-general] Slony not replicating changes In-Reply-To: <20120219124036.GV2889@tinybird.home> References: <7742DD496427B743BC8B7BBF6D380BA0373D145F@EXCHANGE10.geo-comm.local> <20120217164441.GM2889@tinybird.home> <7742DD496427B743BC8B7BBF6D380BA0373D15E1@EXCHANGE10.geo-comm.local> <20120219124036.GV2889@tinybird.home> Message-ID: <7742DD496427B743BC8B7BBF6D380BA0373D3A7C@EXCHANGE10.geo-comm.local> Greg, I populated one table with 100000 bogus records and that worked fine. If I add a second table it stops working. The setup is as follows. We use ESRI replication technology to get data from one ArcSDE instance into a PostgreSQL + PostGIS ArcSDE instance on our server. We then do a data only pg_dump on that database and a psql load on our "live" database. Slony is then supposed to replicate the data out to the "live" database on our other servers. The problem is occurring on the last step :( Shaun McCloud ? Associate Software Developer Geo-Comm, Inc 601 W. Saint Germain St., Saint Cloud, MN 56301 Office: 320.240.0040 Fax: 320.240.2389 Toll Free: 888.436.2666 click here to visit www.geo-comm.com ? Think before you print! Microsoft Certified Desktop Support Technician (MCDST) Do or do not, there is no try. -Yoda -----Original Message----- From: Greg Sabino Mullane [mailto:greg at endpoint.com] Sent: Sunday, February 19, 2012 06:41 To: Shaun McCloud Cc: slony1-general at lists.slony.info Subject: Re: [Slony1-general] Slony not replicating changes On Fri, Feb 17, 2012 at 04:52:06PM +0000, Shaun McCloud wrote: > Before the first COPY statement the dump contains the following. > > SET client_encoding = 'UTF8'; > SET standard_conforming_strings = off; SET check_function_bodies = > false; SET client_min_messages = warning; SET escape_string_warning = > off; SET search_path = sde, pg_catalog; Nothing unusual there. > I have had the data replicate correctly once when loading the data for > a single table using COPY, but not since then. Well I cannot see any other obvious problems, so I would suggest seeing if you can get that single table COPY pg_dump to work again, then expand the scope (if working) or narrow the scope (if not). > Also, I'm not sure if it matters; but we are using two engines for one > Slony-I service. One for mostly static data and one for data that can > change quite often. Each engine has its own DB and the data that can > change quite often is always replicating fine, but it is always edited > using ESRI's Arc Map at some point in the process. Ah, that could certainly be a factor - can you explain the layout a bit more to the list? -- Greg Sabino Mullane greg at endpoint.com End Point Corporation PGP Key: 0x14964AC8 From ngramsky at cs.umd.edu Tue Feb 21 08:41:15 2012 From: ngramsky at cs.umd.edu (NewToSlony) Date: Tue, 21 Feb 2012 08:41:15 -0800 (PST) Subject: [Slony1-general] Cannot find slony1_funcs on macos despite lib in place Message-ID: <33365133.post@talk.nabble.com> Trying to run the example slony config script in the documentation but getting the following error: postgres$ /tmp/slonik_example.sh :8: PGRES_FATAL_ERROR load '$libdir/slony1_funcs'; - ERROR: could not access file "$libdir/slony1_funcs": No such file or directory :8: Error: the extension for the Slony-I C functions cannot be loaded in database 'dbname=my_primary host=localhost user=warfish password=coalitions' Yet the LIBDIR variable is set correctly: postgres$ ./pg_config BINDIR = /opt/local/lib/postgresql90/bin DOCDIR = /opt/local/share/doc/postgresql HTMLDIR = /opt/local/share/doc/postgresql INCLUDEDIR = /opt/local/include/postgresql90 PKGINCLUDEDIR = /opt/local/include/postgresql90 INCLUDEDIR-SERVER = /opt/local/include/postgresql90/server LIBDIR = /opt/local/lib/postgresql90 And the lib is present: ls -l /opt/local/lib/postgresql90/slony1_funcs.so -rwxr-xr-x 1 root admin 34944 Feb 17 16:20 /opt/local/lib/postgresql90/slony1_funcs.so Script is as follows: cat /tmp/slonik_example.sh #!/bin/sh CLUSTERNAME=slony_example; /opt/local/lib/postgresql90/bin/slonik <<_EOF_ define CLUSTERNAME slony_example; cluster name = @CLUSTERNAME; node 1 admin conninfo = 'dbname=my_primary host=localhost user=user1 password=pw'; node 2 admin conninfo = 'dbname=my_rep host=localhost user=user1 password=pw'; #-- # init the first node. Its id MUST be 1. This creates the schema # _$CLUSTERNAME containing all replication system specific database # objects. #-- init cluster ( id=1, comment='Master Node'); #-- # Slony-I organizes tables into sets. The smallest unit a node can # subscribe is a set. The following commands create one set containing # all 4 pgbench tables. The master or origin of the set is node 1. #-- create set (id=1, origin=1, comment='All pgbench tables'); set add table (set id=1, origin=1, id=1, fully qualified name='public.pgbench_accounts', comment='accounts table'); set add table (set id=1, origin=1, id=2, fully qualified name='public.pgbench_branches', comment='branches table'); set add table (set id=1, origin=1, id=3, fully qualified name='public.pgbench_tellers', comment='tellers table'); set add table (set id=1, origin=1, id=4, fully qualified name='public.pgbench_history', comment='history table'); #-- # Create the second node (the slave) tell the 2 nodes how to connect to Slony-I 2.1.1 Documentation 10 / 163 # each other and how they should listen for events. #-- store node (id=2, comment = 'Slave node', event node=1); store path (server = 1, client = 2, conninfo='dbname=my_primary host=localhost user=user1 password=pw'); store path (server = 2, client = 1, conninfo='dbname=my_rep host=localhost user=user1 password=pw'); _EOF_ Read in the doc that likely the lib is not in the correct place or $libdir is not set correctly, but everything looks to be in place. Am I missing something else I am not aware of? -- View this message in context: http://old.nabble.com/Cannot-find-slony1_funcs-on-macos-despite-lib-in-place-tp33365133p33365133.html Sent from the Slony-I -- General mailing list archive at Nabble.com. From dbrb2002-sql at yahoo.com Tue Feb 21 19:30:18 2012 From: dbrb2002-sql at yahoo.com (Brian Trudal) Date: Tue, 21 Feb 2012 19:30:18 -0800 (PST) Subject: [Slony1-general] Replication Lag - fetch 500 from LOG Message-ID: <1329881418.14948.YahooMailNeo@web31807.mail.mud.yahoo.com> All of a sudden since a day the lag is increasing and server is not getting caught; and all I can see if "fetch 500 from LOG" from pg_stat_activity.. restarted slony across and also killed all sessions in "in transaction"; and still no luck. Anything that am missing ? I really appreciate some's answer... Its on PG 8.4 + slony 2.0.4 Thanks Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120221/6a809324/attachment.htm From dbrb2002-sql at yahoo.com Tue Feb 21 22:04:31 2012 From: dbrb2002-sql at yahoo.com (Brian Trudal) Date: Tue, 21 Feb 2012 22:04:31 -0800 (PST) Subject: [Slony1-general] Replication Lag - fetch 500 from LOG In-Reply-To: <1329881418.14948.YahooMailNeo@web31807.mail.mud.yahoo.com> References: <1329881418.14948.YahooMailNeo@web31807.mail.mud.yahoo.com> Message-ID: <1329890671.982.YahooMailNeo@web31805.mail.mud.yahoo.com> Some slony log info, anyone out there with slony experience ? slony 10.1.1.2(35428) 2012-02-21 12:56:04 PST 0 LOG:? duration: 502.162 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 12:56:44 PST 0 LOG:? duration: 547.304 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 12:56:56 PST 0 LOG:? duration: 595.473 ms? statement: fetch 500 from LOG; slony 10.1.1.3(52676) 2012-02-21 12:57:27 PST 0 LOG:? duration: 1249.432 ms? statement: select "_slony1".cleanupEvent('10 minutes'::interval, 'false'::boolean);???????? slony 10.1.1.2(35428) 2012-02-21 12:57:39 PST 0 LOG:? duration: 521.751 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 12:58:05 PST 0 LOG:? duration: 663.164 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 12:58:33 PST 0 LOG:? duration: 584.578 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 12:58:53 PST 0 LOG:? duration: 524.076 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 12:59:22 PST 0 LOG:? duration: 610.761 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 12:59:42 PST 0 LOG:? duration: 629.949 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:00:10 PST 0 LOG:? duration: 509.607 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:00:48 PST 0 LOG:? duration: 519.203 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:01:00 PST 0 LOG:? duration: 527.713 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:01:09 PST 0 LOG:? duration: 616.263 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:01:17 PST 0 LOG:? duration: 520.641 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:02:13 PST 0 LOG:? duration: 519.352 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:02:49 PST 0 LOG:? duration: 505.472 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:03:52 PST 0 LOG:? duration: 601.902 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:04:50 PST 0 LOG:? duration: 643.636 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:06:03 PST 0 LOG:? duration: 513.065 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:06:29 PST 0 LOG:? duration: 648.466 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:07:08 PST 0 LOG:? duration: 645.737 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:07:56 PST 0 LOG:? duration: 559.368 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:08:08 PST 0 LOG:? duration: 524.154 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:08:20 PST 0 LOG:? duration: 594.628 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:08:34 PST 0 LOG:? duration: 515.906 ms? statement: fetch 500 from LOG; slony 10.1.1.2(35428) 2012-02-21 13:08:48 PST 0 LOG:? duration: 535.204 ms? statement: fetch 500 from LOG; slony 10.1.1.3(52676) 2012-02-21 13:08:49 PST 0 LOG:? duration: 582.382 ms? statement: select "_slony1".cleanupEvent('10 minutes'::interval, 'false'::boolean); ________________________________ From: Brian Trudal To: "slony1-general at lists.slony.info" Sent: Tuesday, February 21, 2012 7:30 PM Subject: [Slony1-general] Replication Lag - fetch 500 from LOG All of a sudden since a day the lag is increasing and server is not getting caught; and all I can see if "fetch 500 from LOG" from pg_stat_activity.. restarted slony across and also killed all sessions in "in transaction"; and still no luck. Anything that am missing ? I really appreciate some's answer... Its on PG 8.4 + slony 2.0.4 Thanks Brian _______________________________________________ Slony1-general mailing list Slony1-general at lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120221/5f392a97/attachment.htm From Ger.Timmens at adyen.com Wed Feb 22 04:05:22 2012 From: Ger.Timmens at adyen.com (Ger Timmens) Date: Wed, 22 Feb 2012 13:05:22 +0100 Subject: [Slony1-general] Replication based on content In-Reply-To: References: Message-ID: <4F44DA02.40102@adyen.com> Hi, We are currently using slony 2.1.1/postgresql 9.1.2. We were wondering if it will be possible in future versions of slony to replicate depending on the contents of some fields in a table record. So in our current implemenation we have table SomeTable ( name varchar, ......); on the master. This table is succesfully replicated to several slaves (slave1, slave2, ......). What we would like is depending on the value of 'name', replicate the record, to e.g. slave1 if (name like 'a%'), slave2 if (name like 'b%), etc... Are there any thoughts for slony being capable of doing this ? How can we help developing this ? Are there other replication solutions that are capable of doing this ? Thanks! Ger Timmens -- Ger Timmens Adyen - Payments Made Easy http://www.adyen.com Visiting Address: Kantoorgebouw Nijenburg Mail Address: Simon Carmiggelstraat 6-50, 5th floor P.O. Box 10095 1011 DJ Amsterdam 1001 EB Amsterdam The Netherlands The Netherlands Direct +31.20.240.1248 Office +31.20.240.1240Hi Mobile +31.62.483.8468 Email ger.timmens at adyen.com From rod at iol.ie Wed Feb 22 04:08:34 2012 From: rod at iol.ie (Raymond O'Donnell) Date: Wed, 22 Feb 2012 12:08:34 +0000 Subject: [Slony1-general] Replication based on content In-Reply-To: <4F44DA02.40102@adyen.com> References: <4F44DA02.40102@adyen.com> Message-ID: <4F44DAC2.90107@iol.ie> On 22/02/2012 12:05, Ger Timmens wrote: > Hi, > > We are currently using slony 2.1.1/postgresql 9.1.2. > > We were wondering if it will be possible in future versions of slony to > replicate > depending on the contents of some fields in a table record. > So in our current implemenation we have > > table SomeTable ( name varchar, ......); > > on the master. This table is succesfully replicated to several slaves > (slave1, slave2, ......). > > What we would like is depending on the value of 'name', replicate the > record, > to e.g. > > slave1 if (name like 'a%'), > slave2 if (name like 'b%), > > etc... > > Are there any thoughts for slony being capable of doing this ? > How can we help developing this ? Are there other replication solutions > that are capable of doing this ? Just wondering if you could do this by partitioning first, and then using Slony to replicate the partitions? Ray. -- Raymond O'Donnell :: Galway :: Ireland rod at iol.ie From vivek at khera.org Wed Feb 22 04:28:30 2012 From: vivek at khera.org (Vick Khera) Date: Wed, 22 Feb 2012 07:28:30 -0500 Subject: [Slony1-general] Replication Lag - fetch 500 from LOG In-Reply-To: <1329881418.14948.YahooMailNeo@web31807.mail.mud.yahoo.com> References: <1329881418.14948.YahooMailNeo@web31807.mail.mud.yahoo.com> Message-ID: On Tue, Feb 21, 2012 at 10:30 PM, Brian Trudal wrote: > Anything that am missing ? Has your query (mostly write) load changed? Did one of your disks in your array die? You should poke around to see if you're saturating either your network or one of the servers' disk I/O bandwidth. From vivek at khera.org Wed Feb 22 04:29:27 2012 From: vivek at khera.org (Vick Khera) Date: Wed, 22 Feb 2012 07:29:27 -0500 Subject: [Slony1-general] Replication based on content In-Reply-To: <4F44DAC2.90107@iol.ie> References: <4F44DA02.40102@adyen.com> <4F44DAC2.90107@iol.ie> Message-ID: On Wed, Feb 22, 2012 at 7:08 AM, Raymond O'Donnell wrote: >> Are there any thoughts for slony being capable of doing this ? >> How can we help developing this ? Are there other replication solutions >> that are capable of doing this ? > > Just wondering if you could do this by partitioning first, and then > using Slony to replicate the partitions? Exactly what I was thinking. Replicate the partitions as you need in different sets to different destinations. From ssinger at ca.afilias.info Wed Feb 22 05:32:23 2012 From: ssinger at ca.afilias.info (Steve Singer) Date: Wed, 22 Feb 2012 08:32:23 -0500 Subject: [Slony1-general] Replication Lag - fetch 500 from LOG In-Reply-To: <1329890671.982.YahooMailNeo@web31805.mail.mud.yahoo.com> References: <1329881418.14948.YahooMailNeo@web31807.mail.mud.yahoo.com> <1329890671.982.YahooMailNeo@web31805.mail.mud.yahoo.com> Message-ID: <4F44EE67.2010603@ca.afilias.info> On 12-02-22 01:04 AM, Brian Trudal wrote: > Some slony log info, anyone out there with slony experience ? > > slony 10.1.1.2(35428) 2012-02-21 12:56:04 PST 0 LOG: duration: 502.162 > ms statement: fetch 500 from LOG; You might want to read up on slony bug #167. If replication fall behind (for whatever reason) versions of slony prior to 2.1 can exhibit the behaviour your describing until they have caught up. > slony 10.1.1.2(35428) 2012-02-21 12:56:44 PST 0 LOG: duration: 547.304 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 12:56:56 PST 0 LOG: duration: 595.473 > ms statement: fetch 500 from LOG; > slony 10.1.1.3(52676) 2012-02-21 12:57:27 PST 0 LOG: duration: 1249.432 > ms statement: select "_slony1".cleanupEvent('10 minutes'::interval, > 'false'::boolean); > slony 10.1.1.2(35428) 2012-02-21 12:57:39 PST 0 LOG: duration: 521.751 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 12:58:05 PST 0 LOG: duration: 663.164 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 12:58:33 PST 0 LOG: duration: 584.578 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 12:58:53 PST 0 LOG: duration: 524.076 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 12:59:22 PST 0 LOG: duration: 610.761 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 12:59:42 PST 0 LOG: duration: 629.949 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:00:10 PST 0 LOG: duration: 509.607 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:00:48 PST 0 LOG: duration: 519.203 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:01:00 PST 0 LOG: duration: 527.713 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:01:09 PST 0 LOG: duration: 616.263 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:01:17 PST 0 LOG: duration: 520.641 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:02:13 PST 0 LOG: duration: 519.352 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:02:49 PST 0 LOG: duration: 505.472 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:03:52 PST 0 LOG: duration: 601.902 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:04:50 PST 0 LOG: duration: 643.636 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:06:03 PST 0 LOG: duration: 513.065 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:06:29 PST 0 LOG: duration: 648.466 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:07:08 PST 0 LOG: duration: 645.737 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:07:56 PST 0 LOG: duration: 559.368 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:08:08 PST 0 LOG: duration: 524.154 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:08:20 PST 0 LOG: duration: 594.628 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:08:34 PST 0 LOG: duration: 515.906 > ms statement: fetch 500 from LOG; > slony 10.1.1.2(35428) 2012-02-21 13:08:48 PST 0 LOG: duration: 535.204 > ms statement: fetch 500 from LOG; > slony 10.1.1.3(52676) 2012-02-21 13:08:49 PST 0 LOG: duration: 582.382 > ms statement: select "_slony1".cleanupEvent('10 minutes'::interval, > 'false'::boolean); > > ** > ------------------------------------------------------------------------ > *From:* Brian Trudal > *To:* "slony1-general at lists.slony.info" > *Sent:* Tuesday, February 21, 2012 7:30 PM > *Subject:* [Slony1-general] Replication Lag - fetch 500 from LOG > > All of a sudden since a day the lag is increasing and server is not > getting caught; and all I can see if "fetch 500 from LOG" from > pg_stat_activity.. restarted slony across and also killed all sessions > in "in transaction"; and still no luck. > > Anything that am missing ? > > I really appreciate some's answer... > > Its on PG 8.4 + slony 2.0.4 > > Thanks > Brian > > _______________________________________________ > Slony1-general mailing list > Slony1-general at lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general > > > > > _______________________________________________ > Slony1-general mailing list > Slony1-general at lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general From ssinger at ca.afilias.info Wed Feb 22 06:10:15 2012 From: ssinger at ca.afilias.info (Steve Singer) Date: Wed, 22 Feb 2012 09:10:15 -0500 Subject: [Slony1-general] [bug] config variable "quit_sync_finalsync" missing in slony 2.1.0 ? In-Reply-To: References: Message-ID: <4F44F747.4050301@ca.afilias.info> On 12-01-22 05:08 AM, Brian Fehrle wrote: > Hi all, > I'm trying to get a slony slave that is very behind synced up, and I'm > running into an issue where the sync query is overloading PostgreSQL's > So I tried to use the variables 'quit_sync_provider' and > 'quit_sync_finalsync' as described in the docs: > http://slony.info/documentation/2.1/slon-config-interval.html > > In my slon log output, I'm getting the following error: > WARN conf option quit_sync_finalsync not foundUnrecognized > configuration parameter "quit_sync_finalsync" > > So I did some digging and I think I found a missing block of code in the > 2.1.0 version of confoptions.c I've not forgotten about this, it just took me a while to get to it. Even if I add that block back into confoptions.c I am seeing some issues. 1) quit_sync_finalsync is an int, not an int64, most of our event sequence numbers exceed the space available for a 32 bit int. We don't have a SLON_C_INT64 array + processing code in confoptions.c, we would. 2) Even with a smaller quit sequence number slon is segfaulting in the call to slon_log(FATAL). I am not sure why. 3) If slon didn't segfault I think it would keep restarting itself and exiting due to the slon_retry() call, I don't think this is what the user intends to happen. Getting this feature working seems a bit more complicated than just adding in the confoption block. > > In slony 2.0.7 - confoptions.c we have the following block: > ----------------------------------------------- > { > { > (const char *) "quit_sync_finalsync", > gettext_noop("SYNC number at which slon should abort"), > gettext_noop("We want to terminate slon when the worker > thread reaches a certain SYNC number " > "against a certain provider. This is the SYNC number... "), > SLON_C_INT > }, > &quit_sync_finalsync, > 0, > 0, > 2147483647 > }, > ----------------------------------------------- > > However, when I look at confoptions.c , it does not have this block at > all. Both versions have a block that grabs "quit_sync_provider". And the > remote_worker.c references this variable several times, so it should be > expecting to receive it. > > - Brian F > > > > _______________________________________________ > Slony1-general mailing list > Slony1-general at lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general From cbbrowne at afilias.info Wed Feb 22 08:32:02 2012 From: cbbrowne at afilias.info (Christopher Browne) Date: Wed, 22 Feb 2012 11:32:02 -0500 Subject: [Slony1-general] Replication based on content In-Reply-To: <4F44DA02.40102@adyen.com> References: <4F44DA02.40102@adyen.com> Message-ID: On Wed, Feb 22, 2012 at 7:05 AM, Ger Timmens wrote: > We were wondering if it will be possible ?in future versions of slony to > replicate > depending on the contents of some fields in a table record. > So in our current implemenation we have > > table SomeTable ( name varchar, ......); > > on the master. This table is succesfully replicated to several slaves > (slave1, slave2, ......). > > What we would like is depending on the value of 'name', replicate the > record, > to e.g. > > slave1 if (name like 'a%'), > slave2 if (name like 'b%), Changes committed into master in the last week provide the underpinnings to make it a lot easier to implement this sort of thing in the future. The format of log records have changed, so that instead of the log table capturing literal portions of SQL statements, it captures an array indicating the data. This means that, in the next version of Slony, splitting apart the log data to perform special logic based on the data no longer requires parsing SQL, instead, data can be accessed more directly from the array. This isn't particularly well documented at this point, which is something I probably ought to add to my ToDo list to rectify. It's plausible that in a subsequent version of Slony, we'll introduce some sort of "hook" to provide an intentional way of injecting the sort of logic that you're talking about. "Plausible" here means that since nobody has thought very hard about it, nobody has arrived at any particularly elegant way to fit it in. I'd think it desirable to NOT add a hook until there's an idea of how to do it well. Of course, once someone arrives at a reasonably elegant way of describing the plumbing, it's entirely possible that this would prove pretty easy to implement. From filip.rembialkowski at gmail.com Wed Feb 22 10:35:06 2012 From: filip.rembialkowski at gmail.com (=?UTF-8?B?RmlsaXAgUmVtYmlhxYJrb3dza2k=?=) Date: Wed, 22 Feb 2012 19:35:06 +0100 Subject: [Slony1-general] Replication based on content In-Reply-To: <4F44DA02.40102@adyen.com> References: <4F44DA02.40102@adyen.com> Message-ID: <4F45355A.4060308@gmail.com> At 2012-02-22 13:05, Ger Timmens wrote: > Hi, > > We are currently using slony 2.1.1/postgresql 9.1.2. > > We were wondering if it will be possible in future versions of slony to > replicate > depending on the contents of some fields in a table record. > So in our current implemenation we have > > table SomeTable ( name varchar, ......); > > on the master. This table is succesfully replicated to several slaves > (slave1, slave2, ......). > > What we would like is depending on the value of 'name', replicate the > record, > to e.g. > > slave1 if (name like 'a%'), > slave2 if (name like 'b%), > > etc... > > Are there any thoughts for slony being capable of doing this ? > How can we help developing this ? Are there other replication solutions > that are capable of doing this ? > Some time ago I had to implement such thing. I did it by - on Master and SlaveA, create "shadow table" SomeTablePartA , identical to SomeTable but holding only its well-defined subset. - on Master, create triggers on SomeTable, to maintain the subset in SomeTablePartA - add SomeTablePartA to replication set - so it is replicated like any other table to SlaveA - on SlaveA database, create a trigger on SomeTablePartA (trigger active on slave - must be added to Slony config), which populated changes back to SomeTable on SlaveA. From smccloud at geo-comm.com Wed Feb 22 13:01:00 2012 From: smccloud at geo-comm.com (Shaun McCloud) Date: Wed, 22 Feb 2012 21:01:00 +0000 Subject: [Slony1-general] Slony not replicating changes In-Reply-To: <20120219124036.GV2889@tinybird.home> References: <7742DD496427B743BC8B7BBF6D380BA0373D145F@EXCHANGE10.geo-comm.local> <20120217164441.GM2889@tinybird.home> <7742DD496427B743BC8B7BBF6D380BA0373D15E1@EXCHANGE10.geo-comm.local> <20120219124036.GV2889@tinybird.home> Message-ID: <7742DD496427B743BC8B7BBF6D380BA0373E3972@EXCHANGE10.geo-comm.local> I have tried to migrate from PostgreSQL 8.3 and its associated version of Slony to PostgreSQL 8.4 and Slony 2.0.4. However, even following the directions at http://www.pgadmin.org/docs/dev/slony-example.html I cannot get any data to replicate, the slave node(s). Everything appears to work great and according to the steps it should be working. But the slave node(s) never get a Replication Set and the associated data in it. Does anyone know of a tutorial similar to http://www.enterprisedb.com/resources-community/tutorials-quickstarts/all-platforms/how-setup-slony-i-replication-postgres-plus for Slony 2.0.4? Shaun McCloud ? Associate Software Developer Geo-Comm, Inc 601 W. Saint Germain St., Saint Cloud, MN 56301 Office: 320.240.0040 Fax: 320.240.2389 Toll Free: 888.436.2666 click here to visit www.geo-comm.com ? Think before you print! Microsoft Certified Desktop Support Technician (MCDST) Do or do not, there is no try. -Yoda -----Original Message----- From: Greg Sabino Mullane [mailto:greg at endpoint.com] Sent: Sunday, February 19, 2012 06:41 To: Shaun McCloud Cc: slony1-general at lists.slony.info Subject: Re: [Slony1-general] Slony not replicating changes On Fri, Feb 17, 2012 at 04:52:06PM +0000, Shaun McCloud wrote: > Before the first COPY statement the dump contains the following. > > SET client_encoding = 'UTF8'; > SET standard_conforming_strings = off; SET check_function_bodies = > false; SET client_min_messages = warning; SET escape_string_warning = > off; SET search_path = sde, pg_catalog; Nothing unusual there. > I have had the data replicate correctly once when loading the data for > a single table using COPY, but not since then. Well I cannot see any other obvious problems, so I would suggest seeing if you can get that single table COPY pg_dump to work again, then expand the scope (if working) or narrow the scope (if not). > Also, I'm not sure if it matters; but we are using two engines for one > Slony-I service. One for mostly static data and one for data that can > change quite often. Each engine has its own DB and the data that can > change quite often is always replicating fine, but it is always edited > using ESRI's Arc Map at some point in the process. Ah, that could certainly be a factor - can you explain the layout a bit more to the list? -- Greg Sabino Mullane greg at endpoint.com End Point Corporation PGP Key: 0x14964AC8 From venu_anuganti at yahoo.com Wed Feb 22 19:54:59 2012 From: venu_anuganti at yahoo.com (Venu Anuganti) Date: Wed, 22 Feb 2012 19:54:59 -0800 (PST) Subject: [Slony1-general] Replication Lag - fetch 500 from LOG In-Reply-To: References: <1329881418.14948.YahooMailNeo@web31807.mail.mud.yahoo.com> Message-ID: <1329969299.91172.YahooMailNeo@web31816.mail.mud.yahoo.com> Hi Nothing of that; but we had one of the slave down for last 8 days and still in subscribe list. Do you think that has an effect on the other nodes ? st_origin | st_received | st_last_event |????? st_last_event_ts????? | st_last_received |??? st_last_received_ts???? | st_last_received_event_ts? | st_lag_num_events |????? st_lag_time?????? -----------+-------------+---------------+----------------------------+------------------+----------------------------+----------------------------+-------------------+------------------------ ???????? 45 |????????? 60 |??? 5001958753 | 2012-02-22 21:32:02.555943 |?????? 5001566396 | 2012-02-13 19:09:04.346031 | 2012-02-13 12:28:53.968515 |??????????? 392357 | 9 days 09:03:09.219735 ???????? 45 |????????? 67 |??? 5001958753 | 2012-02-22 21:32:02.555943 |?????? 5001949595 | 2012-02-22 20:47:55.672691 | 2012-02-22 16:18:10.759348 |????????????? 9158 | 05:13:52.428902 ???????? 45 |????????? 66 |??? 5001958753 | 2012-02-22 21:32:02.555943 |?????? 5001930109 | 2012-02-22 20:46:02.378948 | 2012-02-22 04:19:30.589199 |???????????? 28644 | 17:12:32.599051 ________________________________ From: Vick Khera To: "slony1-general at lists.slony.info" Sent: Wednesday, February 22, 2012 4:28 AM Subject: Re: [Slony1-general] Replication Lag - fetch 500 from LOG On Tue, Feb 21, 2012 at 10:30 PM, Brian Trudal wrote: > Anything that am missing ? Has your query (mostly write) load changed?? Did one of your disks in your array die?? You should poke around to see if you're saturating either your network or one of the servers' disk I/O bandwidth. _______________________________________________ Slony1-general mailing list Slony1-general at lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120222/a512a8fc/attachment.htm From vivek at khera.org Wed Feb 22 20:12:22 2012 From: vivek at khera.org (Vick Khera) Date: Wed, 22 Feb 2012 23:12:22 -0500 Subject: [Slony1-general] Replication Lag - fetch 500 from LOG In-Reply-To: <1329967993.15574.YahooMailNeo@web31802.mail.mud.yahoo.com> References: <1329881418.14948.YahooMailNeo@web31807.mail.mud.yahoo.com> <1329967993.15574.YahooMailNeo@web31802.mail.mud.yahoo.com> Message-ID: <-2089944524760419074@unknownmsgid> On Feb 22, 2012, at 10:33 PM, Venu Anuganti wrote: > Nothing of that; but we had one of the slave down for last 8 days and still in subsribe list. Do you think that has an effect on this For sure. Your change log is really big, and thus slow to scan for new changes. You need to drop that node. Also please keep replies on list. From dbrb2002-sql at yahoo.com Wed Feb 22 22:22:45 2012 From: dbrb2002-sql at yahoo.com (Brian Trudal) Date: Wed, 22 Feb 2012 22:22:45 -0800 (PST) Subject: [Slony1-general] Cleanup after drop node Message-ID: <1329978165.60569.YahooMailNeo@web31806.mail.mud.yahoo.com> Ok, finally I dropped a node from cluster; and I still see other nodes are still not able to keep up as "Fetch 500 from LOG;" takes a long time... Is there any post cleanup needed, so that all logs/events can be purged for dropped node ? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120222/625438ba/attachment.htm From dbrb2002-sql at yahoo.com Wed Feb 22 22:57:51 2012 From: dbrb2002-sql at yahoo.com (Brian Trudal) Date: Wed, 22 Feb 2012 22:57:51 -0800 (PST) Subject: [Slony1-general] Cleanup after drop node In-Reply-To: <1329978165.60569.YahooMailNeo@web31806.mail.mud.yahoo.com> References: <1329978165.60569.YahooMailNeo@web31806.mail.mud.yahoo.com> Message-ID: <1329980271.41602.YahooMailNeo@web31805.mail.mud.yahoo.com> Here is the issue.. 2012-02-23 00:47:24 CSTDEBUG2 remoteWorkerThread_7: Received event #7 from 5000000078 type:DROP_NODE 2012-02-23 00:47:24 CSTFATAL? enableNode: unknown node ID 10 2012-02-23 00:47:24 CSTDEBUG2 slon_retry() from pid=5177 2012-02-23 00:47:24 CSTINFO?? slon: retry requested 2012-02-23 00:47:24 CSTINFO?? slon: notify worker process to shutdown Can I safely delete that event from sl_event with ev_type='DROP_NODE' ? ________________________________ From: Brian Trudal To: "slony1-general at lists.slony.info" Sent: Wednesday, February 22, 2012 10:22 PM Subject: [Slony1-general] Cleanup after drop node Ok, finally I dropped a node from cluster; and I still see other nodes are still not able to keep up as "Fetch 500 from LOG;" takes a long time... Is there any post cleanup needed, so that all logs/events can be purged for dropped node ? _______________________________________________ Slony1-general mailing list Slony1-general at lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120222/430aec9d/attachment.htm From ssinger at ca.afilias.info Thu Feb 23 06:05:14 2012 From: ssinger at ca.afilias.info (Steve Singer) Date: Thu, 23 Feb 2012 09:05:14 -0500 Subject: [Slony1-general] Cleanup after drop node In-Reply-To: <1329978165.60569.YahooMailNeo@web31806.mail.mud.yahoo.com> References: <1329978165.60569.YahooMailNeo@web31806.mail.mud.yahoo.com> Message-ID: <4F46479A.4030603@ca.afilias.info> On 12-02-23 01:22 AM, Brian Trudal wrote: > Ok, finally I dropped a node from cluster; and I still see other nodes > are still not able to keep up as "Fetch 500 from LOG;" takes a long time... > > Is there any post cleanup needed, so that all logs/events can be purged > for dropped node ? > How far behind are your other nodes now? Did the slow log selections due to the that node being behind for 8 days make your other nodes behind as well? Do not manually delete events from sl_event I don't see that helping things. > > > _______________________________________________ > Slony1-general mailing list > Slony1-general at lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general From francescoboccacci at libero.it Thu Feb 23 07:38:04 2012 From: francescoboccacci at libero.it (francescoboccacci at libero.it) Date: Thu, 23 Feb 2012 16:38:04 +0100 (CET) Subject: [Slony1-general] New slave Message-ID: <28401013.47991330011484193.JavaMail.defaultUser@defaultHost> Hi, Currently I have a slony replica with one master and one slave. I need to have an additional slave on my system. I cannot stop the master to create a backup (it requires a lot of time), so my question is: is it possible with slony to populate the new slave database and sync it? With the single master slave I have till now, I started the cluster after populating the slave database with the same data of the master and it works fine. Please let me know. Thank you, Francesco From cbbrowne at afilias.info Thu Feb 23 08:46:33 2012 From: cbbrowne at afilias.info (Christopher Browne) Date: Thu, 23 Feb 2012 11:46:33 -0500 Subject: [Slony1-general] New slave In-Reply-To: <28401013.47991330011484193.JavaMail.defaultUser@defaultHost> References: <28401013.47991330011484193.JavaMail.defaultUser@defaultHost> Message-ID: On Thu, Feb 23, 2012 at 10:38 AM, francescoboccacci at libero.it wrote: > Hi, > Currently I have a slony replica with one master and one slave. I need to have > an additional slave on my system. > I cannot stop the master to create a backup (it requires a lot of time), so my > question is: is it possible with slony to populate the new slave database and > sync it? > With the single master slave I have till now, I started the cluster after > populating the slave database with the same data of the master and it works > fine. Well, when you populated the *first* slave database, that didn't require any outage on the origin node, did it? You simply need to: a) Populate the new database with the *schema* from the master. There is no need to capture the data; Slony will delete the data anyways. "pg_dump -s" should do the job. You'll want to exclude the schema that Slony creates, or drop it from the new database, if it got loaded b) STORE NODE to indicate the new node c) Start up slon for the new node d) SUBSCRIBE SET against each set that you want on the new node None of those steps involve an outage of the "master" database. From greg at endpoint.com Thu Feb 23 09:40:16 2012 From: greg at endpoint.com (Greg Sabino Mullane) Date: Thu, 23 Feb 2012 12:40:16 -0500 Subject: [Slony1-general] Slony not replicating changes In-Reply-To: <7742DD496427B743BC8B7BBF6D380BA0373D3A7C@EXCHANGE10.geo-comm.local> References: <7742DD496427B743BC8B7BBF6D380BA0373D145F@EXCHANGE10.geo-comm.local> <20120217164441.GM2889@tinybird.home> <7742DD496427B743BC8B7BBF6D380BA0373D15E1@EXCHANGE10.geo-comm.local> <20120219124036.GV2889@tinybird.home> <7742DD496427B743BC8B7BBF6D380BA0373D3A7C@EXCHANGE10.geo-comm.local> Message-ID: <20120223174016.GD2889@tinybird.home> On Tue, Feb 21, 2012 at 02:25:39PM +0000, Shaun McCloud wrote: > Greg, I populated one table with 100000 bogus records and that > worked fine. If I add a second table it stops working. > > The setup is as follows. We use ESRI replication technology to get > data from one ArcSDE instance into a PostgreSQL + PostGIS ArcSDE > instance on our server. We then do a data only pg_dump on that > database and a psql load on our "live" database. Slony is then > supposed to replicate the data out to the "live" database on our > other servers. The problem is occurring on the last step :( Yeah, can't think of what else it might be. Probably the only thing to do at this point, unless others on the list can think of things, is to edit one of the dumps to limit to a few rows, load it via psql, then check the slony tables (e.g. sl_log_1) to see if the triggers fired and how far it got. In other words, where is it breaking down? My guess is that the triggers are not populating the Slony tables for some reason, but there are other steps in which it could break down. -- Greg Sabino Mullane greg at endpoint.com End Point Corporation PGP Key: 0x14964AC8 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 163 bytes Desc: not available Url : http://lists.slony.info/pipermail/slony1-general/attachments/20120223/26962561/attachment.pgp From smccloud at geo-comm.com Thu Feb 23 09:40:53 2012 From: smccloud at geo-comm.com (Shaun McCloud) Date: Thu, 23 Feb 2012 17:40:53 +0000 Subject: [Slony1-general] Slony not replicating changes In-Reply-To: <20120223174016.GD2889@tinybird.home> References: <7742DD496427B743BC8B7BBF6D380BA0373D145F@EXCHANGE10.geo-comm.local> <20120217164441.GM2889@tinybird.home> <7742DD496427B743BC8B7BBF6D380BA0373D15E1@EXCHANGE10.geo-comm.local> <20120219124036.GV2889@tinybird.home> <7742DD496427B743BC8B7BBF6D380BA0373D3A7C@EXCHANGE10.geo-comm.local> <20120223174016.GD2889@tinybird.home> Message-ID: <7742DD496427B743BC8B7BBF6D380BA0373E431F@EXCHANGE10.geo-comm.local> Greg, I managed to get 2.0.4 running on PostgreSQL 8.4. Using this setup the changes are replicating out. It must have been something in Slony 1.x & PostgreSQL 8.3 that was the issue. Shaun McCloud ? Associate Software Developer Geo-Comm, Inc 601 W. Saint Germain St., Saint Cloud, MN 56301 Office: 320.240.0040 Fax: 320.240.2389 Toll Free: 888.436.2666 click here to visit www.geo-comm.com ? Think before you print! Microsoft Certified Desktop Support Technician (MCDST) Do or do not, there is no try. -Yoda -----Original Message----- From: Greg Sabino Mullane [mailto:greg at endpoint.com] Sent: Thursday, February 23, 2012 11:40 To: Shaun McCloud Cc: slony1-general at lists.slony.info Subject: Re: [Slony1-general] Slony not replicating changes On Tue, Feb 21, 2012 at 02:25:39PM +0000, Shaun McCloud wrote: > Greg, I populated one table with 100000 bogus records and that worked > fine. If I add a second table it stops working. > > The setup is as follows. We use ESRI replication technology to get > data from one ArcSDE instance into a PostgreSQL + PostGIS ArcSDE > instance on our server. We then do a data only pg_dump on that > database and a psql load on our "live" database. Slony is then > supposed to replicate the data out to the "live" database on our other > servers. The problem is occurring on the last step :( Yeah, can't think of what else it might be. Probably the only thing to do at this point, unless others on the list can think of things, is to edit one of the dumps to limit to a few rows, load it via psql, then check the slony tables (e.g. sl_log_1) to see if the triggers fired and how far it got. In other words, where is it breaking down? My guess is that the triggers are not populating the Slony tables for some reason, but there are other steps in which it could break down. -- Greg Sabino Mullane greg at endpoint.com End Point Corporation PGP Key: 0x14964AC8 From greg at endpoint.com Thu Feb 23 09:44:56 2012 From: greg at endpoint.com (Greg Sabino Mullane) Date: Thu, 23 Feb 2012 12:44:56 -0500 Subject: [Slony1-general] Slony not replicating changes In-Reply-To: <7742DD496427B743BC8B7BBF6D380BA0373E3972@EXCHANGE10.geo-comm.local> References: <7742DD496427B743BC8B7BBF6D380BA0373D145F@EXCHANGE10.geo-comm.local> <20120217164441.GM2889@tinybird.home> <7742DD496427B743BC8B7BBF6D380BA0373D15E1@EXCHANGE10.geo-comm.local> <20120219124036.GV2889@tinybird.home> <7742DD496427B743BC8B7BBF6D380BA0373E3972@EXCHANGE10.geo-comm.local> Message-ID: <20120223174456.GE2889@tinybird.home> > Everything appears to work great and according to the steps it > should be working. But the slave node(s) never get a Replication > Set and the associated data in it. Is there anything in the Slony logs? -- Greg Sabino Mullane greg at endpoint.com End Point Corporation PGP Key: 0x14964AC8 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 163 bytes Desc: not available Url : http://lists.slony.info/pipermail/slony1-general/attachments/20120223/0812f630/attachment.pgp From dbrb2002-sql at yahoo.com Thu Feb 23 13:05:49 2012 From: dbrb2002-sql at yahoo.com (Brian Trudal) Date: Thu, 23 Feb 2012 13:05:49 -0800 (PST) Subject: [Slony1-general] sl_status empty Message-ID: <1330031149.40657.YahooMailNeo@web31802.mail.mud.yahoo.com> Hi When I restarted slony, then all of a sudden my sl_status is empty and nothing is getting replicated in the subscriber (node 37) nor its forwarding. All I can see in the logs... 2012-02-23 14:58:20 CSTCONFIG main: done 2012-02-23 14:58:20 CSTCONFIG slon: child terminated status: 0; pid: 5254, current worker pid: 5254 2012-02-23 14:58:20 CSTCONFIG slon: restart of worker 2012-02-23 14:58:20 CSTCONFIG main: slon version 2.0.4 starting up ... 2012-02-23 14:58:55 CSTWARN?? remoteWorker_wakeup: node 34 - no worker thread 2012-02-23 14:58:55 CSTDEBUG2 sched_wakeup_node(): no_id=34 (0 threads + worker signaled) 2012-02-23 14:58:55 CSTCONFIG storeSubscribe: sub_set=2 sub_provider=36 sub_forward='t' 2012-02-23 14:58:55 CSTWARN?? remoteWorker_wakeup: node 36 - no worker thread ... finally it dies with dup error.. 2012-02-23 14:58:55 CSTFATAL? localListenThread: "select "_application".cleanupNodelock(); insert into "_application".sl_nodelock values (??? 37, 0, "pg_catalog".pg_backend_pid()); " - ERROR:? duplicate key value violates unique constraint "sl_nodelock-pkey" 2012-02-23 14:58:55 CSTDEBUG2 slon_abort() from pid=12679 2012-02-23 14:58:55 CSTINFO?? slon: shutdown requested 2012-02-23 14:58:55 CSTINFO?? slon: notify worker process to shutdown 2012-02-23 14:59:15 CSTINFO?? slon: child termination timeout - kill child Any help on how to fix this will be appreciated.. other nodes are fine in the cluster. Thanks Venu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120223/30221b6a/attachment.htm From vivek at khera.org Thu Feb 23 16:45:56 2012 From: vivek at khera.org (Vick Khera) Date: Thu, 23 Feb 2012 19:45:56 -0500 Subject: [Slony1-general] Cleanup after drop node In-Reply-To: <1329978165.60569.YahooMailNeo@web31806.mail.mud.yahoo.com> References: <1329978165.60569.YahooMailNeo@web31806.mail.mud.yahoo.com> Message-ID: On Thu, Feb 23, 2012 at 1:22 AM, Brian Trudal wrote: > Is there any post cleanup needed, so that all logs/events can be purged for > dropped node ? How long have you waited to see if slony cleans up all by itself? I've seen it take several hours some times. From ulas.albayrak at gmail.com Fri Feb 24 05:21:38 2012 From: ulas.albayrak at gmail.com (Ulas Albayrak) Date: Fri, 24 Feb 2012 14:21:38 +0100 Subject: [Slony1-general] Slony replication stops right after start Slony replication stops right after start Message-ID: Hi, I have been trying to set up a small Slony cluster (only 2 nodes) for the last 2 days but I can't get it to work. Everytime I get the same result: The replication starts of fine. Slony start copying, trying to get all the tables in the subscribing node up to speed. But somewhere along the way the 2nd node stops getting updates. Slony replicates all the data in a specific table up to a specific point in time and then no more. And this time seems to coincide with when the copying of data for that specific table started. An example to illustrate the scenario: Let's say I have set up the whole replication system and then at 12:00 I start the actual replication. Around 12:05 copying of table A from node 1 to node 2 starts. It finishes but only the data that was received before 12:05 get copied to node 2. Then at 12:10 copying of table B starts. Same thing here: Slony copies all the data that was received before 12:10 to node 2. And this is the same for all tables. The logs for the slon deamons show: Origin node: NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21942 CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" PL/pgSQL function "cleanupevent" line 83 at PERFORM NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21945 CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" PL/pgSQL function "cleanupevent" line 83 at PERFORM NOTICE: Slony-I: Logswitch to sl_log_2 initiated CONTEXT: SQL statement "SELECT "_fleetcluster".logswitch_start()" PL/pgSQL function "cleanupevent" line 101 at PERFORM 2012-02-24 12:17:39 CETINFO cleanupThread: 0.019 seconds for cleanupEvent() NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21949 CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" PL/pgSQL function "cleanupevent" line 83 at PERFORM NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=23779 CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" PL/pgSQL function "cleanupevent" line 83 at PERFORM Subscribing node: 2012-02-24 13:20:23 CETINFO remoteWorkerThread_1: SYNC 5000000856 done in 0.012 seconds 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 1 with 9 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 2 with 15 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 3 with 4 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 4 with 6 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 5 with 3 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 6 with 4 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 7 with 3 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 8 with 23 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 9 with 8 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: SYNC 5000000857 done in 0.014 seconds 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 1 with 9 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 2 with 15 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 3 with 4 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 4 with 6 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 5 with 3 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 6 with 4 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 7 with 3 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 8 with 23 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 9 with 8 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: SYNC 5000000858 done in 0.011 seconds Have anyone experienced this before or have any idea what could be causing this? -- Ulas Albayrak ulas.albayrak at gmail.com From ulas.albayrak at gmail.com Fri Feb 24 05:12:29 2012 From: ulas.albayrak at gmail.com (Ulas Albayrak) Date: Fri, 24 Feb 2012 14:12:29 +0100 Subject: [Slony1-general] Slony replication stops right after start Message-ID: Hi, I have been trying to set up a small Slony cluster (only 2 nodes) for the last 2 days but I can't get it to work. Everytime I get the same result: The replication starts of fine. Slony start copying, trying to get all the tables in the subscribing node up to speed. But somewhere along the way the 2nd node stops getting updates. Slony replicates all the data in a specific table up to a specific point in time and then no more. And this time seems to coincide with when the copying of data for that specific table started. An example to illustrate the scenario: Let's say I have set up the whole replication system and then at 12:00 I start the actual replication. Around 12:05 copying of table A from node 1 to node 2 starts. It finishes but only the data that was received before 12:05 get copied to node 2. Then at 12:10 copying of table B starts. Same thing here: Slony copies all the data that was received before 12:10 to node 2. And this is the same for all tables. The logs for the slon deamons show: Origin node: NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21942 CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" PL/pgSQL function "cleanupevent" line 83 at PERFORM NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21945 CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" PL/pgSQL function "cleanupevent" line 83 at PERFORM NOTICE: Slony-I: Logswitch to sl_log_2 initiated CONTEXT: SQL statement "SELECT "_fleetcluster".logswitch_start()" PL/pgSQL function "cleanupevent" line 101 at PERFORM 2012-02-24 12:17:39 CETINFO cleanupThread: 0.019 seconds for cleanupEvent() NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21949 CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" PL/pgSQL function "cleanupevent" line 83 at PERFORM NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=23779 CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" PL/pgSQL function "cleanupevent" line 83 at PERFORM Subscribing node: 2012-02-24 13:20:23 CETINFO remoteWorkerThread_1: SYNC 5000000856 done in 0.012 seconds 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 1 with 9 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 2 with 15 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 3 with 4 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 4 with 6 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 5 with 3 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 6 with 4 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 7 with 3 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 8 with 23 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 9 with 8 table(s) from provider 1 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: SYNC 5000000857 done in 0.014 seconds 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 1 with 9 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 2 with 15 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 3 with 4 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 4 with 6 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 5 with 3 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 6 with 4 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 7 with 3 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 8 with 23 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 9 with 8 table(s) from provider 1 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: SYNC 5000000858 done in 0.011 seconds Have anyone experienced this before or have any idea what could be causing this? -- Ulas Albayrak ulas.albayrak at gmail.com From ssinger at ca.afilias.info Fri Feb 24 06:34:18 2012 From: ssinger at ca.afilias.info (Steve Singer) Date: Fri, 24 Feb 2012 09:34:18 -0500 Subject: [Slony1-general] Slony replication stops right after start Slony replication stops right after start In-Reply-To: References: Message-ID: <4F479FEA.5080606@ca.afilias.info> On 12-02-24 08:21 AM, Ulas Albayrak wrote: You didn't say what version of slony you are using with which version of postgresql. I don't see anything in the logs you posted about the slon for the origin node generating sync events. At DEBUG2 or higher (at least ons some versions of slony) you should be getting "syncThread: new sl_action_seq %s " type messages in the log for the slon origin. Are new SYNC events being generated in the origin sl_event table with ev_origin=$originid? Many versions of slony require an exclusive lock on sl_event to generate sync events. Do you have something preventing this? (ie look in pg_locks to see if the slony sync connection is waiting on a lock). > Hi, > > I have been trying to set up a small Slony cluster (only 2 nodes) for > the last 2 days but I can't get it to work. Everytime I get the same > result: The replication starts of fine. Slony start copying, trying to > get all the tables in the subscribing node up to speed. But somewhere > along the way the 2nd node stops getting updates. Slony replicates all > the data in a specific table up to a specific point in time and then > no more. And this time seems to coincide with when the copying of data > for that specific table started. > > An example to illustrate the scenario: > > Let's say I have set up the whole replication system and then at 12:00 > I start the actual replication. Around 12:05 copying of table A from > node 1 to node 2 starts. It finishes but only the data that was > received before 12:05 get copied to node 2. Then at 12:10 copying of > table B starts. Same thing here: Slony copies all the data that was > received before 12:10 to node 2. And this is the same for all tables. > > The logs for the slon deamons show: > > Origin node: > NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21942 > CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" > PL/pgSQL function "cleanupevent" line 83 at PERFORM > NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21945 > CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" > PL/pgSQL function "cleanupevent" line 83 at PERFORM > NOTICE: Slony-I: Logswitch to sl_log_2 initiated > CONTEXT: SQL statement "SELECT "_fleetcluster".logswitch_start()" > PL/pgSQL function "cleanupevent" line 101 at PERFORM > 2012-02-24 12:17:39 CETINFO cleanupThread: 0.019 seconds for cleanupEvent() > NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21949 > CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" > PL/pgSQL function "cleanupevent" line 83 at PERFORM > NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=23779 > CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" > PL/pgSQL function "cleanupevent" line 83 at PERFORM > > Subscribing node: > 2012-02-24 13:20:23 CETINFO remoteWorkerThread_1: SYNC 5000000856 > done in 0.012 seconds > 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 1 with > 9 table(s) from provider 1 > 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 2 with > 15 table(s) from provider 1 > 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 3 with > 4 table(s) from provider 1 > 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 4 with > 6 table(s) from provider 1 > 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 5 with > 3 table(s) from provider 1 > 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 6 with > 4 table(s) from provider 1 > 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 7 with > 3 table(s) from provider 1 > 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 8 with > 23 table(s) from provider 1 > 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 9 with > 8 table(s) from provider 1 > 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: SYNC 5000000857 > done in 0.014 seconds > 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 1 with > 9 table(s) from provider 1 > 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 2 with > 15 table(s) from provider 1 > 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 3 with > 4 table(s) from provider 1 > 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 4 with > 6 table(s) from provider 1 > 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 5 with > 3 table(s) from provider 1 > 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 6 with > 4 table(s) from provider 1 > 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 7 with > 3 table(s) from provider 1 > 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 8 with > 23 table(s) from provider 1 > 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 9 with > 8 table(s) from provider 1 > 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: SYNC 5000000858 > done in 0.011 seconds > > > > Have anyone experienced this before or have any idea what could be causing this? > From ulas.albayrak at gmail.com Fri Feb 24 07:41:52 2012 From: ulas.albayrak at gmail.com (Ulas Albayrak) Date: Fri, 24 Feb 2012 16:41:52 +0100 Subject: [Slony1-general] Slony replication stops right after start Slony replication stops right after start In-Reply-To: <4F479FEA.5080606@ca.afilias.info> References: <4F479FEA.5080606@ca.afilias.info> Message-ID: Hi, Sorry, seems I forgot to post versions: Slony: 2.0.7 PostgreSQL: 9.0.5 I restarted the slon deamon with debug=4 on both nodes and this is what I got: Origin: 2012-02-24 16:20:50 CETDEBUG2 localListenThread: Received event 1,5000002765 SYNC 2012-02-24 16:20:50 CETDEBUG2 syncThread: new sl_action_seq 45211 - SYNC 5000002766 2012-02-24 16:20:52 CETDEBUG2 localListenThread: Received event 1,5000002766 SYNC 2012-02-24 16:20:52 CETDEBUG2 syncThread: new sl_action_seq 45233 - SYNC 5000002767 2012-02-24 16:20:54 CETDEBUG2 localListenThread: Received event 1,5000002767 SYNC 2012-02-24 16:20:54 CETDEBUG2 syncThread: new sl_action_seq 45263 - SYNC 5000002768 2012-02-24 16:20:56 CETDEBUG2 localListenThread: Received event 1,5000002768 SYNC 2012-02-24 16:20:56 CETDEBUG2 remoteWorkerThread_2: forward confirm 1,5000002768 received by 2 2012-02-24 16:21:04 CETDEBUG2 syncThread: new sl_action_seq 45263 - SYNC 5000002769 2012-02-24 16:21:06 CETDEBUG2 remoteListenThread_2: queue event 2,5000001298 SYNC 2012-02-24 16:21:06 CETDEBUG2 remoteWorkerThread_2: Received event #2 from 5000001298 type:SYNC 2012-02-24 16:21:06 CETDEBUG1 calc sync size - last time: 1 last length: 18005 ideal: 3 proposed size: 3 2012-02-24 16:21:06 CETDEBUG2 remoteWorkerThread_2: SYNC 5000001298 processing 2012-02-24 16:21:06 CETDEBUG1 remoteWorkerThread_2: no sets need syncing for this event 2012-02-24 16:21:08 CETDEBUG2 localListenThread: Received event 1,5000002769 SYNC 2012-02-24 16:21:08 CETDEBUG2 syncThread: new sl_action_seq 45292 - SYNC 5000002770 2012-02-24 16:21:08 CETDEBUG2 remoteListenThread_2: queue event 2,5000001299 SYNC 2012-02-24 16:21:08 CETDEBUG2 remoteWorkerThread_2: Received event #2 from 5000001299 type:SYNC 2012-02-24 16:21:08 CETDEBUG1 calc sync size - last time: 1 last length: 2003 ideal: 29 proposed size: 3 2012-02-24 16:21:08 CETDEBUG2 remoteWorkerThread_2: SYNC 5000001299 processing 2012-02-24 16:21:08 CETDEBUG1 remoteWorkerThread_2: no sets need syncing for this event 2012-02-24 16:21:10 CETDEBUG2 localListenThread: Received event 1,5000002770 SYNC 2012-02-24 16:21:10 CETDEBUG2 syncThread: new sl_action_seq 45322 - SYNC 5000002771 2012-02-24 16:21:12 CETDEBUG2 localListenThread: Received event 1,5000002771 SYNC 2012-02-24 16:21:12 CETDEBUG2 syncThread: new sl_action_seq 45382 - SYNC 5000002772 2012-02-24 16:21:14 CETDEBUG2 localListenThread: Received event 1,5000002772 SYNC 2012-02-24 16:21:16 CETDEBUG2 remoteWorkerThread_2: forward confirm 1,5000002772 received by 2 2012-02-24 16:21:18 CETDEBUG2 syncThread: new sl_action_seq 45411 - SYNC 5000002773 2012-02-24 16:21:20 CETDEBUG2 localListenThread: Received event 1,5000002773 SYNC 2012-02-24 16:21:26 CETDEBUG2 syncThread: new sl_action_seq 45417 - SYNC 5000002774 2012-02-24 16:21:26 CETDEBUG2 remoteListenThread_2: queue event 2,5000001300 SYNC 2012-02-24 16:21:26 CETDEBUG2 remoteWorkerThread_2: Received event #2 from 5000001300 type:SYNC 2012-02-24 16:21:26 CETDEBUG1 calc sync size - last time: 1 last length: 18012 ideal: 3 proposed size: 3 2012-02-24 16:21:26 CETDEBUG2 remoteWorkerThread_2: SYNC 5000001300 processing 2012-02-24 16:21:26 CETDEBUG1 remoteWorkerThread_2: no sets need syncing for this event 2012-02-24 16:21:26 CETDEBUG2 remoteWorkerThread_2: forward confirm 1,5000002773 received by 2 2012-02-24 16:21:28 CETDEBUG2 syncThread: new sl_action_seq 45430 - SYNC 5000002775 2012-02-24 16:21:28 CETDEBUG2 remoteListenThread_2: queue event 2,5000001301 SYNC 2012-02-24 16:21:28 CETDEBUG2 remoteWorkerThread_2: Received event #2 from 5000001301 type:SYNC 2012-02-24 16:21:28 CETDEBUG1 calc sync size - last time: 1 last length: 2004 ideal: 29 proposed size: 3 2012-02-24 16:21:28 CETDEBUG2 remoteWorkerThread_2: SYNC 5000001301 processing 2012-02-24 16:21:28 CETDEBUG1 remoteWorkerThread_2: no sets need syncing for this event 2012-02-24 16:21:30 CETDEBUG2 syncThread: new sl_action_seq 45460 - SYNC 5000002776 2012-02-24 16:21:30 CETDEBUG2 remoteWorkerThread_2: forward confirm 1,5000002775 received by 2 2012-02-24 16:21:32 CETDEBUG2 localListenThread: Received event 1,5000002774 SYNC 2012-02-24 16:21:32 CETDEBUG2 localListenThread: Received event 1,5000002775 SYNC 2012-02-24 16:21:32 CETDEBUG2 localListenThread: Received event 1,5000002776 SYNC Subscriber: 2012-02-24 16:28:53 CETDEBUG2 ssy_action_list length: 0 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1: current local log_status is 2 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: activate helper 1 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: waiting for log data 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: got work to do 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1_1: current remote log_status = 1 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: allocate line buffers 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: fetch from cursor 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds delay for first row 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: fetched 0 log rows 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: return 50 unused line buffers 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: 0.003 seconds until close cursor 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: inserts=0 updates=0 deletes=0 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: sync_helper timing: pqexec (s/count)- provider 0.002/3 - subscriber 0.000/3 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: sync_helper timing: large tuples 0.000/0 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: change helper thread status 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: send DONE/ERROR line to worker 2012-02-24 16:28:53 CETDEBUG3 remoteHelperThread_1_1: waiting for workgroup to finish 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: helper 1 finished 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: returning lines to pool 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: all helpers done. 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: changing helper 1 to IDLE 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1: cleanup 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: waiting for work 2012-02-24 16:28:53 CETINFO remoteWorkerThread_1: SYNC 5000002873 done in 0.014 seconds 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: SYNC 5000002873 sync_event timing: pqexec (s/count)- provider 0.001/1 - subsc riber 0.008/1 - IUD 0.001/4 2012-02-24 16:28:55 CETDEBUG2 syncThread: new sl_action_seq 1 - SYNC 5000001345 2012-02-24 16:28:59 CETDEBUG2 localListenThread: Received event 2,5000001345 SYNC 2012-02-24 16:29:05 CETDEBUG2 syncThread: new sl_action_seq 1 - SYNC 5000001346 2012-02-24 16:29:11 CETDEBUG2 localListenThread: Received event 2,5000001346 SYNC 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event 1,5000002874 SYNC 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event 1,5000002875 SYNC 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event 1,5000002876 SYNC 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: Received event #1 from 5000002874 type:SYNC 2012-02-24 16:29:11 CETDEBUG1 calc sync size - last time: 1 last length: 18011 ideal: 3 proposed size: 3 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: SYNC 5000002876 processing 2012-02-24 16:29:11 CETDEBUG1 about to monitor_subscriber_query - pulling big actionid list for 1 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 1 with 9 table(s) from provider 1 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 2 with 15 table(s) from provider 1 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 3 with 4 table(s) from provider 1 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 4 with 6 table(s) from provider 1 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 5 with 3 table(s) from provider 1 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 6 with 4 table(s) from provider 1 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 7 with 3 table(s) from provider 1 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 8 with 23 table(s) from provider 1 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 9 with 8 table(s) from provider 1 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: current local log_status is 2 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: activate helper 1 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: waiting for log data 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: got work to do 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1_1: current remote log_status = 1 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: allocate line buffers 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: fetch from cursor 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds delay for first row 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: fetched 0 log rows 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: return 50 unused line buffers 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds until close cursor 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: inserts=0 updates=0 deletes=0 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: sync_helper timing: pqexec (s/count)- provider 0.002/3 - subscriber 0.000/3 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: sync_helper timing: large tuples 0.000/0 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: change helper thread status 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: send DONE/ERROR line to worker 2012-02-24 16:29:11 CETDEBUG3 remoteHelperThread_1_1: waiting for workgroup to finish 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: helper 1 finished 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: returning lines to pool 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: all helpers done. 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: changing helper 1 to IDLE 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: cleanup 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: waiting for work 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: SYNC 5000002876 done in 0.014 seconds 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: SYNC 5000002876 sync_event timing: pqexec (s/count)- provider 0.001/1 - subsc riber 0.008/1 - IUD 0.001/4 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: forward confirm 2,5000001346 received by 1 New SYNC events are being generated. I can find several rows in the origin's sl_event table with ev_origin=$originid. None of them however (including rows with ev_origin=$subscribedid) are older than about 15 minutes, even though the most recent data in the subscriber db is several hours old. I'm not really sure how to interpret the data in the pg_locks view, but none of them (neither origin or subscriber) have any rows where the "granted" column isn't set to TRUE, if that means anything? /Ulas On Fri, Feb 24, 2012 at 3:34 PM, Steve Singer wrote: > On 12-02-24 08:21 AM, Ulas Albayrak wrote: > > You didn't say what version of slony you are using with which version of > postgresql. > > I don't see anything in the logs you posted about the slon for the origin > node generating sync events. ?At DEBUG2 or higher (at least ons some > versions of slony) you should be getting "syncThread: new sl_action_seq %s " > type messages in the log for the slon origin. > > Are new SYNC events being generated in the origin sl_event table with > ev_origin=$originid? > > Many versions of slony require an exclusive lock on sl_event to generate > sync events. ?Do you have something preventing this? ?(ie look in pg_locks > to see if the slony sync connection is waiting on a lock). > > > > > >> Hi, >> >> I have been trying to set up a small Slony cluster (only 2 nodes) for >> the last 2 days but I can't get it to work. Everytime I get the same >> result: The replication starts of fine. Slony start copying, trying to >> get all the tables in the subscribing node up to speed. But somewhere >> along the way the 2nd node stops getting updates. Slony replicates all >> the data in a specific table up to a specific point in time and then >> no more. And this time seems to coincide with when the copying of data >> for that specific table started. >> >> An example to illustrate the scenario: >> >> Let's say I have set up the whole replication system and then at 12:00 >> I start the actual replication. Around 12:05 copying of table A from >> node 1 to node 2 starts. It finishes but only the data that was >> received before ?12:05 get copied to node 2. Then at 12:10 copying of >> table B starts. Same thing here: Slony copies all the data that was >> received before 12:10 to node 2. And this is the same for all tables. >> >> The logs for the slon deamons show: >> >> Origin node: >> NOTICE: ?Slony-I: cleanup stale sl_nodelock entry for pid=21942 >> CONTEXT: ?SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >> PL/pgSQL function "cleanupevent" line 83 at PERFORM >> NOTICE: ?Slony-I: cleanup stale sl_nodelock entry for pid=21945 >> CONTEXT: ?SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >> PL/pgSQL function "cleanupevent" line 83 at PERFORM >> NOTICE: ?Slony-I: Logswitch to sl_log_2 initiated >> CONTEXT: ?SQL statement "SELECT "_fleetcluster".logswitch_start()" >> PL/pgSQL function "cleanupevent" line 101 at PERFORM >> 2012-02-24 12:17:39 CETINFO ? cleanupThread: ? ?0.019 seconds for >> cleanupEvent() >> NOTICE: ?Slony-I: cleanup stale sl_nodelock entry for pid=21949 >> CONTEXT: ?SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >> PL/pgSQL function "cleanupevent" line 83 at PERFORM >> NOTICE: ?Slony-I: cleanup stale sl_nodelock entry for pid=23779 >> CONTEXT: ?SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >> PL/pgSQL function "cleanupevent" line 83 at PERFORM >> >> Subscribing node: >> 2012-02-24 13:20:23 CETINFO ? remoteWorkerThread_1: SYNC 5000000856 >> done in 0.012 seconds >> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 1 with >> 9 table(s) from provider 1 >> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 2 with >> 15 table(s) from provider 1 >> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 3 with >> 4 table(s) from provider 1 >> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 4 with >> 6 table(s) from provider 1 >> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 5 with >> 3 table(s) from provider 1 >> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 6 with >> 4 table(s) from provider 1 >> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 7 with >> 3 table(s) from provider 1 >> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 8 with >> 23 table(s) from provider 1 >> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 9 with >> 8 table(s) from provider 1 >> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: SYNC 5000000857 >> done in 0.014 seconds >> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 1 with >> 9 table(s) from provider 1 >> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 2 with >> 15 table(s) from provider 1 >> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 3 with >> 4 table(s) from provider 1 >> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 4 with >> 6 table(s) from provider 1 >> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 5 with >> 3 table(s) from provider 1 >> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 6 with >> 4 table(s) from provider 1 >> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 7 with >> 3 table(s) from provider 1 >> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 8 with >> 23 table(s) from provider 1 >> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 9 with >> 8 table(s) from provider 1 >> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: SYNC 5000000858 >> done in 0.011 seconds >> >> >> >> Have anyone experienced this before or have any idea what could be causing >> this? >> > -- Ulas Albayrak ulas.albayrak at gmail.com From ssinger at ca.afilias.info Fri Feb 24 08:44:19 2012 From: ssinger at ca.afilias.info (Steve Singer) Date: Fri, 24 Feb 2012 11:44:19 -0500 Subject: [Slony1-general] Slony replication stops right after start Slony replication stops right after start In-Reply-To: References: <4F479FEA.5080606@ca.afilias.info> Message-ID: <4F47BE63.9020000@ca.afilias.info> On 12-02-24 10:41 AM, Ulas Albayrak wrote: > Hi, > > Sorry, seems I forgot to post versions: > > Slony: 2.0.7 > PostgreSQL: 9.0.5 > > I restarted the slon deamon with debug=4 on both nodes and this is what I got: > > Origin: > 2012-02-24 16:20:50 CETDEBUG2 localListenThread: Received event > 1,5000002765 SYNC > 2012-02-24 16:20:50 CETDEBUG2 syncThread: new sl_action_seq 45211 - > SYNC 5000002766 This shows that SYNC events are now being generated on the origin. > Subscriber: > 2012-02-24 16:28:53 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1: current local > log_status is 2 > 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: activate helper 1 > 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: waiting for log data > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: got work to do > 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1_1: current remote > log_status = 1 > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: allocate line buffers > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: fetch from cursor > 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds > delay for first row > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: fetched 0 log rows > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: return 50 unused > line buffers > 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: 0.003 seconds > until close cursor > 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: inserts=0 > updates=0 deletes=0 This shows that SYNC events are being processed on the slave and there is nothing to do. I also notice that you have 8 replication sets. Are changes not being replicated to all tables in all the sets or only some of the sets? sl_log_1 and sl_log_2 on the origin should have a record of rows that need to be replicated. If you insert a row into one of your tables you should then see that data in sl_log_1 or sl_log_2. Is this the case. > 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: sync_helper > timing: pqexec (s/count)- provider 0.002/3 - subscriber 0.000/3 > 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: sync_helper > timing: large tuples 0.000/0 > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: change helper > thread status > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: send DONE/ERROR > line to worker > 2012-02-24 16:28:53 CETDEBUG3 remoteHelperThread_1_1: waiting for > workgroup to finish > 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: helper 1 finished > 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: returning lines to pool > 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: all helpers done. > 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: changing helper 1 to IDLE > 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1: cleanup > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: waiting for work > 2012-02-24 16:28:53 CETINFO remoteWorkerThread_1: SYNC 5000002873 > done in 0.014 seconds > 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: SYNC 5000002873 > sync_event timing: pqexec (s/count)- provider 0.001/1 - subsc > riber 0.008/1 - IUD 0.001/4 > 2012-02-24 16:28:55 CETDEBUG2 syncThread: new sl_action_seq 1 - SYNC 5000001345 > 2012-02-24 16:28:59 CETDEBUG2 localListenThread: Received event > 2,5000001345 SYNC > 2012-02-24 16:29:05 CETDEBUG2 syncThread: new sl_action_seq 1 - SYNC 5000001346 > 2012-02-24 16:29:11 CETDEBUG2 localListenThread: Received event > 2,5000001346 SYNC > 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event > 1,5000002874 SYNC > 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event > 1,5000002875 SYNC > 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event > 1,5000002876 SYNC > 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: Received event #1 > from 5000002874 type:SYNC > 2012-02-24 16:29:11 CETDEBUG1 calc sync size - last time: 1 last > length: 18011 ideal: 3 proposed size: 3 > 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: SYNC 5000002876 processing > 2012-02-24 16:29:11 CETDEBUG1 about to monitor_subscriber_query - > pulling big actionid list for 1 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 1 with > 9 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 2 with > 15 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 3 with > 4 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 4 with > 6 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 5 with > 3 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 6 with > 4 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 7 with > 3 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 8 with > 23 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 9 with > 8 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: current local > log_status is 2 > 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: activate helper 1 > 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: waiting for log data > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: got work to do > 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1_1: current remote > log_status = 1 > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: allocate line buffers > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: fetch from cursor > 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds > delay for first row > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: fetched 0 log rows > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: return 50 unused > line buffers > 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds > until close cursor > 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: inserts=0 > updates=0 deletes=0 > 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: sync_helper > timing: pqexec (s/count)- provider 0.002/3 - subscriber 0.000/3 > 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: sync_helper > timing: large tuples 0.000/0 > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: change helper > thread status > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: send DONE/ERROR > line to worker > 2012-02-24 16:29:11 CETDEBUG3 remoteHelperThread_1_1: waiting for > workgroup to finish > 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: helper 1 finished > 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: returning lines to pool > 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: all helpers done. > 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: changing helper 1 to IDLE > 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: cleanup > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: waiting for work > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: SYNC 5000002876 > done in 0.014 seconds > 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: SYNC 5000002876 > sync_event timing: pqexec (s/count)- provider 0.001/1 - subsc > riber 0.008/1 - IUD 0.001/4 > 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: forward confirm > 2,5000001346 received by 1 > > New SYNC events are being generated. I can find several rows in the > origin's sl_event table with ev_origin=$originid. None of them however > (including rows with ev_origin=$subscribedid) are older than about 15 > minutes, even though the most recent data in the subscriber db is > several hours old. > > I'm not really sure how to interpret the data in the pg_locks view, > but none of them (neither origin or subscriber) have any rows where > the "granted" column isn't set to TRUE, if that means anything? > > /Ulas > > > On Fri, Feb 24, 2012 at 3:34 PM, Steve Singer wrote: >> On 12-02-24 08:21 AM, Ulas Albayrak wrote: >> >> You didn't say what version of slony you are using with which version of >> postgresql. >> >> I don't see anything in the logs you posted about the slon for the origin >> node generating sync events. At DEBUG2 or higher (at least ons some >> versions of slony) you should be getting "syncThread: new sl_action_seq %s " >> type messages in the log for the slon origin. >> >> Are new SYNC events being generated in the origin sl_event table with >> ev_origin=$originid? >> >> Many versions of slony require an exclusive lock on sl_event to generate >> sync events. Do you have something preventing this? (ie look in pg_locks >> to see if the slony sync connection is waiting on a lock). >> >> >> >> >> >>> Hi, >>> >>> I have been trying to set up a small Slony cluster (only 2 nodes) for >>> the last 2 days but I can't get it to work. Everytime I get the same >>> result: The replication starts of fine. Slony start copying, trying to >>> get all the tables in the subscribing node up to speed. But somewhere >>> along the way the 2nd node stops getting updates. Slony replicates all >>> the data in a specific table up to a specific point in time and then >>> no more. And this time seems to coincide with when the copying of data >>> for that specific table started. >>> >>> An example to illustrate the scenario: >>> >>> Let's say I have set up the whole replication system and then at 12:00 >>> I start the actual replication. Around 12:05 copying of table A from >>> node 1 to node 2 starts. It finishes but only the data that was >>> received before 12:05 get copied to node 2. Then at 12:10 copying of >>> table B starts. Same thing here: Slony copies all the data that was >>> received before 12:10 to node 2. And this is the same for all tables. >>> >>> The logs for the slon deamons show: >>> >>> Origin node: >>> NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21942 >>> CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>> NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21945 >>> CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>> NOTICE: Slony-I: Logswitch to sl_log_2 initiated >>> CONTEXT: SQL statement "SELECT "_fleetcluster".logswitch_start()" >>> PL/pgSQL function "cleanupevent" line 101 at PERFORM >>> 2012-02-24 12:17:39 CETINFO cleanupThread: 0.019 seconds for >>> cleanupEvent() >>> NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21949 >>> CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>> NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=23779 >>> CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>> >>> Subscribing node: >>> 2012-02-24 13:20:23 CETINFO remoteWorkerThread_1: SYNC 5000000856 >>> done in 0.012 seconds >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 1 with >>> 9 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 2 with >>> 15 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 3 with >>> 4 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 4 with >>> 6 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 5 with >>> 3 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 6 with >>> 4 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 7 with >>> 3 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 8 with >>> 23 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 9 with >>> 8 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: SYNC 5000000857 >>> done in 0.014 seconds >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 1 with >>> 9 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 2 with >>> 15 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 3 with >>> 4 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 4 with >>> 6 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 5 with >>> 3 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 6 with >>> 4 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 7 with >>> 3 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 8 with >>> 23 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 9 with >>> 8 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: SYNC 5000000858 >>> done in 0.011 seconds >>> >>> >>> >>> Have anyone experienced this before or have any idea what could be causing >>> this? >>> >> > > > From ulas.albayrak at gmail.com Mon Feb 27 02:58:56 2012 From: ulas.albayrak at gmail.com (Ulas Albayrak) Date: Mon, 27 Feb 2012 11:58:56 +0100 Subject: [Slony1-general] Slony replication stops right after start Slony replication stops right after start In-Reply-To: <4F47BE63.9020000@ca.afilias.info> References: <4F479FEA.5080606@ca.afilias.info> <4F47BE63.9020000@ca.afilias.info> Message-ID: Hi, None of the tables in any of the sets are being replicated (there are 9 of them in total by the way). The most recent data in all of them is from within half an hour of each other. The s_log_1 and s_log_2 tables on the provider node are currently holding around 10k rows between the two of them. Nothing older that 1,5 hours. Does this mean that Slony thinks everything older than that has already been replicated? I don't know if it matters but maybe I should mention that the two machines in the cluster were previously in a cluster where the roles were reversed, i.e. the subscriber used to be the provider and vice versa. After a system failure and a subsequent failover I cleaned the whole cluster (drop node and uninstall node) before attempting to set it up again. I can't help to think that there maybe some lingering configurations that are getting in the way? /Ulas On Fri, Feb 24, 2012 at 5:44 PM, Steve Singer wrote: > On 12-02-24 10:41 AM, Ulas Albayrak wrote: > > > >> Hi, >> >> Sorry, seems I forgot to post versions: >> >> Slony: 2.0.7 >> PostgreSQL: ?9.0.5 >> >> I restarted the slon deamon with debug=4 on both nodes and this is what I >> got: >> >> Origin: >> 2012-02-24 16:20:50 CETDEBUG2 localListenThread: Received event >> 1,5000002765 SYNC >> 2012-02-24 16:20:50 CETDEBUG2 syncThread: new sl_action_seq 45211 - >> SYNC 5000002766 > > > This shows that SYNC events are now being generated on the origin. > > > >> Subscriber: >> 2012-02-24 16:28:53 CETDEBUG2 ?ssy_action_list length: 0 >> 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1: current local >> log_status is 2 >> 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: activate helper 1 >> 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: waiting for log data >> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: got work to do >> 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1_1: current remote >> log_status = 1 >> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: allocate line >> buffers >> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: fetch from cursor >> 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds >> delay for first row >> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: fetched 0 log rows >> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: return 50 unused >> line buffers >> 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: 0.003 seconds >> until close cursor >> 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: inserts=0 >> updates=0 deletes=0 > > > This shows that SYNC events are being processed on the slave and there is > nothing to do. ?I also notice that you have 8 replication sets. ?Are changes > not being replicated to all tables in all the sets or only some of the sets? > > sl_log_1 and sl_log_2 on the origin should have a record of rows that need > to be replicated. ?If you insert a row into one of your tables you should > then see that data in sl_log_1 or sl_log_2. ?Is this the case. > > > > >> 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: sync_helper >> timing: ?pqexec (s/count)- provider 0.002/3 - subscriber 0.000/3 >> 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: sync_helper >> timing: ?large tuples 0.000/0 >> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: change helper >> thread status >> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: send DONE/ERROR >> line to worker >> 2012-02-24 16:28:53 CETDEBUG3 remoteHelperThread_1_1: waiting for >> workgroup to finish >> 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: helper 1 finished >> 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: returning lines to >> pool >> 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: all helpers done. >> 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: changing helper 1 to >> IDLE >> 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1: cleanup >> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: waiting for work >> 2012-02-24 16:28:53 CETINFO ? remoteWorkerThread_1: SYNC 5000002873 >> done in 0.014 seconds >> 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: SYNC 5000002873 >> sync_event timing: ?pqexec (s/count)- provider 0.001/1 - subsc >> riber 0.008/1 - IUD 0.001/4 >> 2012-02-24 16:28:55 CETDEBUG2 syncThread: new sl_action_seq 1 - SYNC >> 5000001345 >> 2012-02-24 16:28:59 CETDEBUG2 localListenThread: Received event >> 2,5000001345 SYNC >> 2012-02-24 16:29:05 CETDEBUG2 syncThread: new sl_action_seq 1 - SYNC >> 5000001346 >> 2012-02-24 16:29:11 CETDEBUG2 localListenThread: Received event >> 2,5000001346 SYNC >> 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event >> 1,5000002874 SYNC >> 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event >> 1,5000002875 SYNC >> 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event >> 1,5000002876 SYNC >> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: Received event #1 >> from 5000002874 type:SYNC >> 2012-02-24 16:29:11 CETDEBUG1 calc sync size - last time: 1 last >> length: 18011 ideal: 3 proposed size: 3 >> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: SYNC 5000002876 >> processing >> 2012-02-24 16:29:11 CETDEBUG1 about to monitor_subscriber_query - >> pulling big actionid list for 1 >> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 1 with >> 9 table(s) from provider 1 >> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 2 with >> 15 table(s) from provider 1 >> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 3 with >> 4 table(s) from provider 1 >> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 4 with >> 6 table(s) from provider 1 >> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 5 with >> 3 table(s) from provider 1 >> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 6 with >> 4 table(s) from provider 1 >> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 7 with >> 3 table(s) from provider 1 >> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 8 with >> 23 table(s) from provider 1 >> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 9 with >> 8 table(s) from provider 1 > > > > >> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: current local >> log_status is 2 >> 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: activate helper 1 >> 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: waiting for log data >> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: got work to do >> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1_1: current remote >> log_status = 1 >> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: allocate line >> buffers >> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: fetch from cursor >> 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds >> delay for first row >> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: fetched 0 log rows >> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: return 50 unused >> line buffers >> 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds >> until close cursor >> 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: inserts=0 >> updates=0 deletes=0 >> 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: sync_helper >> timing: ?pqexec (s/count)- provider 0.002/3 - subscriber 0.000/3 >> 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: sync_helper >> timing: ?large tuples 0.000/0 >> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: change helper >> thread status >> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: send DONE/ERROR >> line to worker >> 2012-02-24 16:29:11 CETDEBUG3 remoteHelperThread_1_1: waiting for >> workgroup to finish >> 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: helper 1 finished >> 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: returning lines to >> pool >> 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: all helpers done. >> 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: changing helper 1 to >> IDLE >> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: cleanup >> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: waiting for work >> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: SYNC 5000002876 >> done in 0.014 seconds >> 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: SYNC 5000002876 >> sync_event timing: ?pqexec (s/count)- provider 0.001/1 - subsc >> riber 0.008/1 - IUD 0.001/4 >> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: forward confirm >> 2,5000001346 received by 1 >> >> New SYNC events are being generated. I can find several rows in the >> origin's sl_event table with ev_origin=$originid. None of them however >> (including rows with ev_origin=$subscribedid) are older than about 15 >> minutes, even though the most recent data in the subscriber db is >> several hours old. >> >> I'm not really sure how to interpret the data in the pg_locks view, >> but none of them (neither origin or subscriber) have any rows where >> the "granted" column isn't set to TRUE, if that means anything? >> >> /Ulas >> >> >> On Fri, Feb 24, 2012 at 3:34 PM, Steve Singer >> ?wrote: >>> >>> On 12-02-24 08:21 AM, Ulas Albayrak wrote: >>> >>> You didn't say what version of slony you are using with which version of >>> postgresql. >>> >>> I don't see anything in the logs you posted about the slon for the origin >>> node generating sync events. ?At DEBUG2 or higher (at least ons some >>> versions of slony) you should be getting "syncThread: new sl_action_seq >>> %s " >>> type messages in the log for the slon origin. >>> >>> Are new SYNC events being generated in the origin sl_event table with >>> ev_origin=$originid? >>> >>> Many versions of slony require an exclusive lock on sl_event to generate >>> sync events. ?Do you have something preventing this? ?(ie look in >>> pg_locks >>> to see if the slony sync connection is waiting on a lock). >>> >>> >>> >>> >>> >>>> Hi, >>>> >>>> I have been trying to set up a small Slony cluster (only 2 nodes) for >>>> the last 2 days but I can't get it to work. Everytime I get the same >>>> result: The replication starts of fine. Slony start copying, trying to >>>> get all the tables in the subscribing node up to speed. But somewhere >>>> along the way the 2nd node stops getting updates. Slony replicates all >>>> the data in a specific table up to a specific point in time and then >>>> no more. And this time seems to coincide with when the copying of data >>>> for that specific table started. >>>> >>>> An example to illustrate the scenario: >>>> >>>> Let's say I have set up the whole replication system and then at 12:00 >>>> I start the actual replication. Around 12:05 copying of table A from >>>> node 1 to node 2 starts. It finishes but only the data that was >>>> received before ?12:05 get copied to node 2. Then at 12:10 copying of >>>> table B starts. Same thing here: Slony copies all the data that was >>>> received before 12:10 to node 2. And this is the same for all tables. >>>> >>>> The logs for the slon deamons show: >>>> >>>> Origin node: >>>> NOTICE: ?Slony-I: cleanup stale sl_nodelock entry for pid=21942 >>>> CONTEXT: ?SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>>> NOTICE: ?Slony-I: cleanup stale sl_nodelock entry for pid=21945 >>>> CONTEXT: ?SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>>> NOTICE: ?Slony-I: Logswitch to sl_log_2 initiated >>>> CONTEXT: ?SQL statement "SELECT "_fleetcluster".logswitch_start()" >>>> PL/pgSQL function "cleanupevent" line 101 at PERFORM >>>> 2012-02-24 12:17:39 CETINFO ? cleanupThread: ? ?0.019 seconds for >>>> cleanupEvent() >>>> NOTICE: ?Slony-I: cleanup stale sl_nodelock entry for pid=21949 >>>> CONTEXT: ?SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>>> NOTICE: ?Slony-I: cleanup stale sl_nodelock entry for pid=23779 >>>> CONTEXT: ?SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>>> >>>> Subscribing node: >>>> 2012-02-24 13:20:23 CETINFO ? remoteWorkerThread_1: SYNC 5000000856 >>>> done in 0.012 seconds >>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 1 with >>>> 9 table(s) from provider 1 >>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 2 with >>>> 15 table(s) from provider 1 >>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 3 with >>>> 4 table(s) from provider 1 >>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 4 with >>>> 6 table(s) from provider 1 >>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 5 with >>>> 3 table(s) from provider 1 >>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 6 with >>>> 4 table(s) from provider 1 >>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 7 with >>>> 3 table(s) from provider 1 >>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 8 with >>>> 23 table(s) from provider 1 >>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 9 with >>>> 8 table(s) from provider 1 >>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: SYNC 5000000857 >>>> done in 0.014 seconds >>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 1 with >>>> 9 table(s) from provider 1 >>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 2 with >>>> 15 table(s) from provider 1 >>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 3 with >>>> 4 table(s) from provider 1 >>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 4 with >>>> 6 table(s) from provider 1 >>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 5 with >>>> 3 table(s) from provider 1 >>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 6 with >>>> 4 table(s) from provider 1 >>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 7 with >>>> 3 table(s) from provider 1 >>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 8 with >>>> 23 table(s) from provider 1 >>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 9 with >>>> 8 table(s) from provider 1 >>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: SYNC 5000000858 >>>> done in 0.011 seconds >>>> >>>> >>>> >>>> Have anyone experienced this before or have any idea what could be >>>> causing >>>> this? >>>> >>> >> >> >> > -- Ulas Albayrak ulas.albayrak at gmail.com From ssinger at ca.afilias.info Mon Feb 27 04:54:15 2012 From: ssinger at ca.afilias.info (Steve Singer) Date: Mon, 27 Feb 2012 07:54:15 -0500 Subject: [Slony1-general] Slony replication stops right after start Slony replication stops right after start In-Reply-To: References: <4F479FEA.5080606@ca.afilias.info> <4F47BE63.9020000@ca.afilias.info> Message-ID: <4F4B7CF7.8080402@ca.afilias.info> On 12-02-27 05:58 AM, Ulas Albayrak wrote: > Hi, > > None of the tables in any of the sets are being replicated (there are > 9 of them in total by the way). The most recent data in all of them is > from within half an hour of each other. > > The s_log_1 and s_log_2 tables on the provider node are currently > holding around 10k rows between the two of them. Nothing older that > 1,5 hours. Does this mean that Slony thinks everything older than that > has already been replicated? > This means that either 1) No changes were made before then to a replicated table 2) Slony thinks All changes earlier than 1.5 hours ago have been replicated 3) Changes made before then didn't need to be replicated anywhere (no subscriptions?) Do the changes you now see in sl_log_1 or sl_log_2 show up on your slave? > I don't know if it matters but maybe I should mention that the two > machines in the cluster were previously in a cluster where the roles > were reversed, i.e. the subscriber used to be the provider and vice > versa. After a system failure and a subsequent failover I cleaned the > whole cluster (drop node and uninstall node) before attempting to set > it up again. I can't help to think that there maybe some lingering > configurations that are getting in the way? > > /Ulas > > On Fri, Feb 24, 2012 at 5:44 PM, Steve Singer wrote: >> On 12-02-24 10:41 AM, Ulas Albayrak wrote: >> >> >> >>> Hi, >>> >>> Sorry, seems I forgot to post versions: >>> >>> Slony: 2.0.7 >>> PostgreSQL: 9.0.5 >>> >>> I restarted the slon deamon with debug=4 on both nodes and this is what I >>> got: >>> >>> Origin: >>> 2012-02-24 16:20:50 CETDEBUG2 localListenThread: Received event >>> 1,5000002765 SYNC >>> 2012-02-24 16:20:50 CETDEBUG2 syncThread: new sl_action_seq 45211 - >>> SYNC 5000002766 >> >> >> This shows that SYNC events are now being generated on the origin. >> >> >> >>> Subscriber: >>> 2012-02-24 16:28:53 CETDEBUG2 ssy_action_list length: 0 >>> 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1: current local >>> log_status is 2 >>> 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: activate helper 1 >>> 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: waiting for log data >>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: got work to do >>> 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1_1: current remote >>> log_status = 1 >>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: allocate line >>> buffers >>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: fetch from cursor >>> 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds >>> delay for first row >>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: fetched 0 log rows >>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: return 50 unused >>> line buffers >>> 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: 0.003 seconds >>> until close cursor >>> 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: inserts=0 >>> updates=0 deletes=0 >> >> >> This shows that SYNC events are being processed on the slave and there is >> nothing to do. I also notice that you have 8 replication sets. Are changes >> not being replicated to all tables in all the sets or only some of the sets? >> >> sl_log_1 and sl_log_2 on the origin should have a record of rows that need >> to be replicated. If you insert a row into one of your tables you should >> then see that data in sl_log_1 or sl_log_2. Is this the case. >> >> >> >> >>> 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: sync_helper >>> timing: pqexec (s/count)- provider 0.002/3 - subscriber 0.000/3 >>> 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: sync_helper >>> timing: large tuples 0.000/0 >>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: change helper >>> thread status >>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: send DONE/ERROR >>> line to worker >>> 2012-02-24 16:28:53 CETDEBUG3 remoteHelperThread_1_1: waiting for >>> workgroup to finish >>> 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: helper 1 finished >>> 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: returning lines to >>> pool >>> 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: all helpers done. >>> 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: changing helper 1 to >>> IDLE >>> 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1: cleanup >>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: waiting for work >>> 2012-02-24 16:28:53 CETINFO remoteWorkerThread_1: SYNC 5000002873 >>> done in 0.014 seconds >>> 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: SYNC 5000002873 >>> sync_event timing: pqexec (s/count)- provider 0.001/1 - subsc >>> riber 0.008/1 - IUD 0.001/4 >>> 2012-02-24 16:28:55 CETDEBUG2 syncThread: new sl_action_seq 1 - SYNC >>> 5000001345 >>> 2012-02-24 16:28:59 CETDEBUG2 localListenThread: Received event >>> 2,5000001345 SYNC >>> 2012-02-24 16:29:05 CETDEBUG2 syncThread: new sl_action_seq 1 - SYNC >>> 5000001346 >>> 2012-02-24 16:29:11 CETDEBUG2 localListenThread: Received event >>> 2,5000001346 SYNC >>> 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event >>> 1,5000002874 SYNC >>> 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event >>> 1,5000002875 SYNC >>> 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event >>> 1,5000002876 SYNC >>> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: Received event #1 >>> from 5000002874 type:SYNC >>> 2012-02-24 16:29:11 CETDEBUG1 calc sync size - last time: 1 last >>> length: 18011 ideal: 3 proposed size: 3 >>> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: SYNC 5000002876 >>> processing >>> 2012-02-24 16:29:11 CETDEBUG1 about to monitor_subscriber_query - >>> pulling big actionid list for 1 >>> 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 1 with >>> 9 table(s) from provider 1 >>> 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: >>> 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 >>> 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 2 with >>> 15 table(s) from provider 1 >>> 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: >>> 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 >>> 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 3 with >>> 4 table(s) from provider 1 >>> 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: >>> 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 >>> 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 4 with >>> 6 table(s) from provider 1 >>> 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: >>> 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 >>> 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 5 with >>> 3 table(s) from provider 1 >>> 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: >>> 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 >>> 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 6 with >>> 4 table(s) from provider 1 >>> 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: >>> 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 >>> 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 7 with >>> 3 table(s) from provider 1 >>> 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: >>> 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 >>> 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 8 with >>> 23 table(s) from provider 1 >>> 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: >>> 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 >>> 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 9 with >>> 8 table(s) from provider 1 >> >> >> >> >>> 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: >>> 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 >>> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: current local >>> log_status is 2 >>> 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: activate helper 1 >>> 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: waiting for log data >>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: got work to do >>> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1_1: current remote >>> log_status = 1 >>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: allocate line >>> buffers >>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: fetch from cursor >>> 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds >>> delay for first row >>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: fetched 0 log rows >>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: return 50 unused >>> line buffers >>> 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds >>> until close cursor >>> 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: inserts=0 >>> updates=0 deletes=0 >>> 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: sync_helper >>> timing: pqexec (s/count)- provider 0.002/3 - subscriber 0.000/3 >>> 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: sync_helper >>> timing: large tuples 0.000/0 >>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: change helper >>> thread status >>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: send DONE/ERROR >>> line to worker >>> 2012-02-24 16:29:11 CETDEBUG3 remoteHelperThread_1_1: waiting for >>> workgroup to finish >>> 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: helper 1 finished >>> 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: returning lines to >>> pool >>> 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: all helpers done. >>> 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: changing helper 1 to >>> IDLE >>> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: cleanup >>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: waiting for work >>> 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: SYNC 5000002876 >>> done in 0.014 seconds >>> 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: SYNC 5000002876 >>> sync_event timing: pqexec (s/count)- provider 0.001/1 - subsc >>> riber 0.008/1 - IUD 0.001/4 >>> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: forward confirm >>> 2,5000001346 received by 1 >>> >>> New SYNC events are being generated. I can find several rows in the >>> origin's sl_event table with ev_origin=$originid. None of them however >>> (including rows with ev_origin=$subscribedid) are older than about 15 >>> minutes, even though the most recent data in the subscriber db is >>> several hours old. >>> >>> I'm not really sure how to interpret the data in the pg_locks view, >>> but none of them (neither origin or subscriber) have any rows where >>> the "granted" column isn't set to TRUE, if that means anything? >>> >>> /Ulas >>> >>> >>> On Fri, Feb 24, 2012 at 3:34 PM, Steve Singer >>> wrote: >>>> >>>> On 12-02-24 08:21 AM, Ulas Albayrak wrote: >>>> >>>> You didn't say what version of slony you are using with which version of >>>> postgresql. >>>> >>>> I don't see anything in the logs you posted about the slon for the origin >>>> node generating sync events. At DEBUG2 or higher (at least ons some >>>> versions of slony) you should be getting "syncThread: new sl_action_seq >>>> %s " >>>> type messages in the log for the slon origin. >>>> >>>> Are new SYNC events being generated in the origin sl_event table with >>>> ev_origin=$originid? >>>> >>>> Many versions of slony require an exclusive lock on sl_event to generate >>>> sync events. Do you have something preventing this? (ie look in >>>> pg_locks >>>> to see if the slony sync connection is waiting on a lock). >>>> >>>> >>>> >>>> >>>> >>>>> Hi, >>>>> >>>>> I have been trying to set up a small Slony cluster (only 2 nodes) for >>>>> the last 2 days but I can't get it to work. Everytime I get the same >>>>> result: The replication starts of fine. Slony start copying, trying to >>>>> get all the tables in the subscribing node up to speed. But somewhere >>>>> along the way the 2nd node stops getting updates. Slony replicates all >>>>> the data in a specific table up to a specific point in time and then >>>>> no more. And this time seems to coincide with when the copying of data >>>>> for that specific table started. >>>>> >>>>> An example to illustrate the scenario: >>>>> >>>>> Let's say I have set up the whole replication system and then at 12:00 >>>>> I start the actual replication. Around 12:05 copying of table A from >>>>> node 1 to node 2 starts. It finishes but only the data that was >>>>> received before 12:05 get copied to node 2. Then at 12:10 copying of >>>>> table B starts. Same thing here: Slony copies all the data that was >>>>> received before 12:10 to node 2. And this is the same for all tables. >>>>> >>>>> The logs for the slon deamons show: >>>>> >>>>> Origin node: >>>>> NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21942 >>>>> CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>>>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>>>> NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21945 >>>>> CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>>>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>>>> NOTICE: Slony-I: Logswitch to sl_log_2 initiated >>>>> CONTEXT: SQL statement "SELECT "_fleetcluster".logswitch_start()" >>>>> PL/pgSQL function "cleanupevent" line 101 at PERFORM >>>>> 2012-02-24 12:17:39 CETINFO cleanupThread: 0.019 seconds for >>>>> cleanupEvent() >>>>> NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21949 >>>>> CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>>>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>>>> NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=23779 >>>>> CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>>>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>>>> >>>>> Subscribing node: >>>>> 2012-02-24 13:20:23 CETINFO remoteWorkerThread_1: SYNC 5000000856 >>>>> done in 0.012 seconds >>>>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 1 with >>>>> 9 table(s) from provider 1 >>>>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 2 with >>>>> 15 table(s) from provider 1 >>>>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 3 with >>>>> 4 table(s) from provider 1 >>>>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 4 with >>>>> 6 table(s) from provider 1 >>>>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 5 with >>>>> 3 table(s) from provider 1 >>>>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 6 with >>>>> 4 table(s) from provider 1 >>>>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 7 with >>>>> 3 table(s) from provider 1 >>>>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 8 with >>>>> 23 table(s) from provider 1 >>>>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 9 with >>>>> 8 table(s) from provider 1 >>>>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: SYNC 5000000857 >>>>> done in 0.014 seconds >>>>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 1 with >>>>> 9 table(s) from provider 1 >>>>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 2 with >>>>> 15 table(s) from provider 1 >>>>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 3 with >>>>> 4 table(s) from provider 1 >>>>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 4 with >>>>> 6 table(s) from provider 1 >>>>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 5 with >>>>> 3 table(s) from provider 1 >>>>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 6 with >>>>> 4 table(s) from provider 1 >>>>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 7 with >>>>> 3 table(s) from provider 1 >>>>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 8 with >>>>> 23 table(s) from provider 1 >>>>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 9 with >>>>> 8 table(s) from provider 1 >>>>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: SYNC 5000000858 >>>>> done in 0.011 seconds >>>>> >>>>> >>>>> >>>>> Have anyone experienced this before or have any idea what could be >>>>> causing >>>>> this? >>>>> >>>> >>> >>> >>> >> > > > From ulas.albayrak at gmail.com Mon Feb 27 05:53:29 2012 From: ulas.albayrak at gmail.com (Ulas Albayrak) Date: Mon, 27 Feb 2012 14:53:29 +0100 Subject: [Slony1-general] Slony replication stops right after start Slony replication stops right after start In-Reply-To: <4F4B7CF7.8080402@ca.afilias.info> References: <4F479FEA.5080606@ca.afilias.info> <4F47BE63.9020000@ca.afilias.info> <4F4B7CF7.8080402@ca.afilias.info> Message-ID: Hi, I noted a few specific rows in the sl_log_x tables and later, after they had been removed from the sl_log_x table, checked for them in the subscriber but they never got inserted. The answer to the question is answer #2, Slony thinks All changes earlier than 1.5 hours ago have been replicated. I know that changes were made before that time to the tables. Those specific tables continuously receive new data. But if Slony thinks the data has been replicated the question arises of why? Do the worker thread on the provider get some sort of acknowledgement that the data has been transferred to the provider? And if so, what is this ack based on? /Ulas On Mon, Feb 27, 2012 at 1:54 PM, Steve Singer wrote: > On 12-02-27 05:58 AM, Ulas Albayrak wrote: >> >> Hi, >> >> None of the tables in any of the sets are being replicated (there are >> 9 of them in total by the way). The most recent data in all of them is >> from within half an hour of each other. >> >> The s_log_1 and s_log_2 tables on the provider node are currently >> holding around 10k rows between the two of them. Nothing older that >> 1,5 hours. Does this mean that Slony thinks everything older than that >> has already been replicated? >> > > This means that either > > 1) No changes were made before then to a replicated table > 2) Slony thinks All changes earlier than 1.5 hours ago have been replicated > 3) Changes made before then didn't need to be replicated anywhere (no > subscriptions?) > > Do the changes you now see in sl_log_1 or sl_log_2 show up on your slave? > > > > >> I don't know if it matters but maybe I should mention that the two >> machines in the cluster were previously in a cluster where the roles >> were reversed, i.e. the subscriber used to be the provider and vice >> versa. After a system failure and a subsequent failover I cleaned the >> whole cluster (drop node and uninstall node) before attempting to set >> it up again. I can't help to think that there maybe some lingering >> configurations that are getting in the way? >> >> /Ulas >> >> On Fri, Feb 24, 2012 at 5:44 PM, Steve Singer >> ?wrote: >>> >>> On 12-02-24 10:41 AM, Ulas Albayrak wrote: >>> >>> >>> >>>> Hi, >>>> >>>> Sorry, seems I forgot to post versions: >>>> >>>> Slony: 2.0.7 >>>> PostgreSQL: ?9.0.5 >>>> >>>> I restarted the slon deamon with debug=4 on both nodes and this is what >>>> I >>>> got: >>>> >>>> Origin: >>>> 2012-02-24 16:20:50 CETDEBUG2 localListenThread: Received event >>>> 1,5000002765 SYNC >>>> 2012-02-24 16:20:50 CETDEBUG2 syncThread: new sl_action_seq 45211 - >>>> SYNC 5000002766 >>> >>> >>> >>> This shows that SYNC events are now being generated on the origin. >>> >>> >>> >>>> Subscriber: >>>> 2012-02-24 16:28:53 CETDEBUG2 ?ssy_action_list length: 0 >>>> 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1: current local >>>> log_status is 2 >>>> 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: activate helper 1 >>>> 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: waiting for log data >>>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: got work to do >>>> 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1_1: current remote >>>> log_status = 1 >>>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: allocate line >>>> buffers >>>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: fetch from cursor >>>> 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds >>>> delay for first row >>>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: fetched 0 log rows >>>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: return 50 unused >>>> line buffers >>>> 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: 0.003 seconds >>>> until close cursor >>>> 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: inserts=0 >>>> updates=0 deletes=0 >>> >>> >>> >>> This shows that SYNC events are being processed on the slave and there is >>> nothing to do. ?I also notice that you have 8 replication sets. ?Are >>> changes >>> not being replicated to all tables in all the sets or only some of the >>> sets? >>> >>> sl_log_1 and sl_log_2 on the origin should have a record of rows that >>> need >>> to be replicated. ?If you insert a row into one of your tables you should >>> then see that data in sl_log_1 or sl_log_2. ?Is this the case. >>> >>> >>> >>> >>>> 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: sync_helper >>>> timing: ?pqexec (s/count)- provider 0.002/3 - subscriber 0.000/3 >>>> 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: sync_helper >>>> timing: ?large tuples 0.000/0 >>>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: change helper >>>> thread status >>>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: send DONE/ERROR >>>> line to worker >>>> 2012-02-24 16:28:53 CETDEBUG3 remoteHelperThread_1_1: waiting for >>>> workgroup to finish >>>> 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: helper 1 finished >>>> 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: returning lines to >>>> pool >>>> 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: all helpers done. >>>> 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: changing helper 1 to >>>> IDLE >>>> 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1: cleanup >>>> 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: waiting for work >>>> 2012-02-24 16:28:53 CETINFO ? remoteWorkerThread_1: SYNC 5000002873 >>>> done in 0.014 seconds >>>> 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: SYNC 5000002873 >>>> sync_event timing: ?pqexec (s/count)- provider 0.001/1 - subsc >>>> riber 0.008/1 - IUD 0.001/4 >>>> 2012-02-24 16:28:55 CETDEBUG2 syncThread: new sl_action_seq 1 - SYNC >>>> 5000001345 >>>> 2012-02-24 16:28:59 CETDEBUG2 localListenThread: Received event >>>> 2,5000001345 SYNC >>>> 2012-02-24 16:29:05 CETDEBUG2 syncThread: new sl_action_seq 1 - SYNC >>>> 5000001346 >>>> 2012-02-24 16:29:11 CETDEBUG2 localListenThread: Received event >>>> 2,5000001346 SYNC >>>> 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event >>>> 1,5000002874 SYNC >>>> 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event >>>> 1,5000002875 SYNC >>>> 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event >>>> 1,5000002876 SYNC >>>> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: Received event #1 >>>> from 5000002874 type:SYNC >>>> 2012-02-24 16:29:11 CETDEBUG1 calc sync size - last time: 1 last >>>> length: 18011 ideal: 3 proposed size: 3 >>>> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: SYNC 5000002876 >>>> processing >>>> 2012-02-24 16:29:11 CETDEBUG1 about to monitor_subscriber_query - >>>> pulling big actionid list for 1 >>>> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 1 with >>>> 9 table(s) from provider 1 >>>> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >>>> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >>>> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 2 with >>>> 15 table(s) from provider 1 >>>> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >>>> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >>>> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 3 with >>>> 4 table(s) from provider 1 >>>> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >>>> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >>>> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 4 with >>>> 6 table(s) from provider 1 >>>> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >>>> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >>>> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 5 with >>>> 3 table(s) from provider 1 >>>> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >>>> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >>>> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 6 with >>>> 4 table(s) from provider 1 >>>> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >>>> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >>>> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 7 with >>>> 3 table(s) from provider 1 >>>> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >>>> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >>>> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 8 with >>>> 23 table(s) from provider 1 >>>> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >>>> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >>>> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: syncing set 9 with >>>> 8 table(s) from provider 1 >>> >>> >>> >>> >>> >>>> 2012-02-24 16:29:11 CETDEBUG4 ?ssy_action_list value: >>>> 2012-02-24 16:29:11 CETDEBUG2 ?ssy_action_list length: 0 >>>> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: current local >>>> log_status is 2 >>>> 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: activate helper 1 >>>> 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: waiting for log data >>>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: got work to do >>>> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1_1: current remote >>>> log_status = 1 >>>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: allocate line >>>> buffers >>>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: fetch from cursor >>>> 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds >>>> delay for first row >>>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: fetched 0 log rows >>>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: return 50 unused >>>> line buffers >>>> 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds >>>> until close cursor >>>> 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: inserts=0 >>>> updates=0 deletes=0 >>>> 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: sync_helper >>>> timing: ?pqexec (s/count)- provider 0.002/3 - subscriber 0.000/3 >>>> 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: sync_helper >>>> timing: ?large tuples 0.000/0 >>>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: change helper >>>> thread status >>>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: send DONE/ERROR >>>> line to worker >>>> 2012-02-24 16:29:11 CETDEBUG3 remoteHelperThread_1_1: waiting for >>>> workgroup to finish >>>> 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: helper 1 finished >>>> 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: returning lines to >>>> pool >>>> 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: all helpers done. >>>> 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: changing helper 1 to >>>> IDLE >>>> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: cleanup >>>> 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: waiting for work >>>> 2012-02-24 16:29:11 CETINFO ? remoteWorkerThread_1: SYNC 5000002876 >>>> done in 0.014 seconds >>>> 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: SYNC 5000002876 >>>> sync_event timing: ?pqexec (s/count)- provider 0.001/1 - subsc >>>> riber 0.008/1 - IUD 0.001/4 >>>> 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: forward confirm >>>> 2,5000001346 received by 1 >>>> >>>> New SYNC events are being generated. I can find several rows in the >>>> origin's sl_event table with ev_origin=$originid. None of them however >>>> (including rows with ev_origin=$subscribedid) are older than about 15 >>>> minutes, even though the most recent data in the subscriber db is >>>> several hours old. >>>> >>>> I'm not really sure how to interpret the data in the pg_locks view, >>>> but none of them (neither origin or subscriber) have any rows where >>>> the "granted" column isn't set to TRUE, if that means anything? >>>> >>>> /Ulas >>>> >>>> >>>> On Fri, Feb 24, 2012 at 3:34 PM, Steve Singer >>>> ?wrote: >>>>> >>>>> >>>>> On 12-02-24 08:21 AM, Ulas Albayrak wrote: >>>>> >>>>> You didn't say what version of slony you are using with which version >>>>> of >>>>> postgresql. >>>>> >>>>> I don't see anything in the logs you posted about the slon for the >>>>> origin >>>>> node generating sync events. ?At DEBUG2 or higher (at least ons some >>>>> versions of slony) you should be getting "syncThread: new sl_action_seq >>>>> %s " >>>>> type messages in the log for the slon origin. >>>>> >>>>> Are new SYNC events being generated in the origin sl_event table with >>>>> ev_origin=$originid? >>>>> >>>>> Many versions of slony require an exclusive lock on sl_event to >>>>> generate >>>>> sync events. ?Do you have something preventing this? ?(ie look in >>>>> pg_locks >>>>> to see if the slony sync connection is waiting on a lock). >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> I have been trying to set up a small Slony cluster (only 2 nodes) for >>>>>> the last 2 days but I can't get it to work. Everytime I get the same >>>>>> result: The replication starts of fine. Slony start copying, trying to >>>>>> get all the tables in the subscribing node up to speed. But somewhere >>>>>> along the way the 2nd node stops getting updates. Slony replicates all >>>>>> the data in a specific table up to a specific point in time and then >>>>>> no more. And this time seems to coincide with when the copying of data >>>>>> for that specific table started. >>>>>> >>>>>> An example to illustrate the scenario: >>>>>> >>>>>> Let's say I have set up the whole replication system and then at 12:00 >>>>>> I start the actual replication. Around 12:05 copying of table A from >>>>>> node 1 to node 2 starts. It finishes but only the data that was >>>>>> received before ?12:05 get copied to node 2. Then at 12:10 copying of >>>>>> table B starts. Same thing here: Slony copies all the data that was >>>>>> received before 12:10 to node 2. And this is the same for all tables. >>>>>> >>>>>> The logs for the slon deamons show: >>>>>> >>>>>> Origin node: >>>>>> NOTICE: ?Slony-I: cleanup stale sl_nodelock entry for pid=21942 >>>>>> CONTEXT: ?SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>>>>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>>>>> NOTICE: ?Slony-I: cleanup stale sl_nodelock entry for pid=21945 >>>>>> CONTEXT: ?SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>>>>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>>>>> NOTICE: ?Slony-I: Logswitch to sl_log_2 initiated >>>>>> CONTEXT: ?SQL statement "SELECT "_fleetcluster".logswitch_start()" >>>>>> PL/pgSQL function "cleanupevent" line 101 at PERFORM >>>>>> 2012-02-24 12:17:39 CETINFO ? cleanupThread: ? ?0.019 seconds for >>>>>> cleanupEvent() >>>>>> NOTICE: ?Slony-I: cleanup stale sl_nodelock entry for pid=21949 >>>>>> CONTEXT: ?SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>>>>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>>>>> NOTICE: ?Slony-I: cleanup stale sl_nodelock entry for pid=23779 >>>>>> CONTEXT: ?SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>>>>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>>>>> >>>>>> Subscribing node: >>>>>> 2012-02-24 13:20:23 CETINFO ? remoteWorkerThread_1: SYNC 5000000856 >>>>>> done in 0.012 seconds >>>>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 1 with >>>>>> 9 table(s) from provider 1 >>>>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 2 with >>>>>> 15 table(s) from provider 1 >>>>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 3 with >>>>>> 4 table(s) from provider 1 >>>>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 4 with >>>>>> 6 table(s) from provider 1 >>>>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 5 with >>>>>> 3 table(s) from provider 1 >>>>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 6 with >>>>>> 4 table(s) from provider 1 >>>>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 7 with >>>>>> 3 table(s) from provider 1 >>>>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 8 with >>>>>> 23 table(s) from provider 1 >>>>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: syncing set 9 with >>>>>> 8 table(s) from provider 1 >>>>>> 2012-02-24 13:20:41 CETINFO ? remoteWorkerThread_1: SYNC 5000000857 >>>>>> done in 0.014 seconds >>>>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 1 with >>>>>> 9 table(s) from provider 1 >>>>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 2 with >>>>>> 15 table(s) from provider 1 >>>>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 3 with >>>>>> 4 table(s) from provider 1 >>>>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 4 with >>>>>> 6 table(s) from provider 1 >>>>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 5 with >>>>>> 3 table(s) from provider 1 >>>>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 6 with >>>>>> 4 table(s) from provider 1 >>>>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 7 with >>>>>> 3 table(s) from provider 1 >>>>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 8 with >>>>>> 23 table(s) from provider 1 >>>>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: syncing set 9 with >>>>>> 8 table(s) from provider 1 >>>>>> 2012-02-24 13:20:43 CETINFO ? remoteWorkerThread_1: SYNC 5000000858 >>>>>> done in 0.011 seconds >>>>>> >>>>>> >>>>>> >>>>>> Have anyone experienced this before or have any idea what could be >>>>>> causing >>>>>> this? >>>>>> >>>>> >>>> >>>> >>>> >>> >> >> >> > -- Ulas Albayrak ulas.albayrak at gmail.com From stephane.schildknecht at postgresql.fr Mon Feb 27 06:02:09 2012 From: stephane.schildknecht at postgresql.fr (=?ISO-8859-15?Q?=22St=E9phane_A=2E_Schildknecht=22?=) Date: Mon, 27 Feb 2012 15:02:09 +0100 Subject: [Slony1-general] Syntax error? In-Reply-To: <4E6FA67E54034BA294B3DB3660A34B8D@CMOTUM25PC> References: <33357837.post@talk.nabble.com> <4E6FA67E54034BA294B3DB3660A34B8D@CMOTUM25PC> Message-ID: <4F4B8CE1.1030401@postgresql.fr> Le 20/02/2012 18:56, Efra?n D?ctor a ?crit : > Delete the _EOF_ at the end of the script. > > -----Mensaje original----- > From: NewToSlony > Sent: Monday, February 20, 2012 10:45 AM > To: slony1-general at lists.slony.info > Subject: [Slony1-general] Syntax error? > > > Hi, > > I'm trying to use the example in the 2.1.1 documentation but I keep getting > a syntax error. I've noted that the doc shows > cluster name = $CLUSTERNAME; > is a valid syntax but unless I change the $ to a @ I get a syntax error, > thus as I'm new and unsure I don't know if there are errors in the > documentation or something I am missing. Never the less, my script as as > follows: > > > #!/bin/sh > > CLUSTERNAME= slony_example; > /opt/local/lib/postgresql90/bin/slonik <<_EOL_ > define CLUSTERNAME slony_example; > cluster name = @CLUSTERNAME; > node 1 admin conninfo = 'dbname=my_primary host=localhost user=user'; > node 2 admin conninfo = 'dbname=my_rep host=localhost user=user'; > #-- > # init the first node. Its id MUST be 1. This creates the schema # > _$CLUSTERNAME containing all replication system specific database # objects. > #-- > init cluster ( id=1, comment='Master Node'); > #-- > # Slony-I organizes tables into sets. The smallest unit a node can # > subscribe is a set. The following commands create one set containing # all 4 > pgbench tables. The master or origin of the set is node 1. > #-- > create set (id=1, origin=1, comment='All pgbench tables'); > set add table (set id=1, origin=1, id=1, fully qualified > name='public.pgbench_accounts', comment='accounts table'); > set add table (set id=1, origin=1, id=2, fully qualified > name='public.pgbench_branches', comment='branches table'); > set add table (set id=1, origin=1, id=3, fully qualified > name='public.pgbench_tellers', comment='tellers table'); > set add table (set id=1, origin=1, id=4, fully qualified > name='public.pgbench_history', comment='history table'); > #-- > # Create the second node (the slave) tell the 2 nodes how to connect to > Slony-I 2.1.1 Documentation 10 / 163 > # each other and how they should listen for events. > #-- > store node (id=2, comment = 'Slave node', event node=1); > store path (server = 1, client = 2, conninfo='dbname=my_primary > host=localhost user=user'); > store path (server = 2, client = 1, conninfo='dbname=my_rep host=localhost > user=user'); > _EOF_ > > > Yet I get the following syntax error: > > > /tmp/slonik_example.sh: line 3: slony_example: command not found > :24: ERROR: syntax error at or near _EOF_ > You started with _EOL_. You then have to end with the same mark, instead of the _EOF_ you used at the end. -- St?phane Schildknecht http://www.Loxodata.com Contact r?gional PostgreSQL http://bistri.me/sas From stephane.schildknecht at postgresql.fr Mon Feb 27 06:10:10 2012 From: stephane.schildknecht at postgresql.fr (=?ISO-8859-15?Q?=22St=E9phane_A=2E_Schildknecht=22?=) Date: Mon, 27 Feb 2012 15:10:10 +0100 Subject: [Slony1-general] Cannot find slony1_funcs on macos despite lib in place In-Reply-To: <33365133.post@talk.nabble.com> References: <33365133.post@talk.nabble.com> Message-ID: <4F4B8EC2.20208@postgresql.fr> Le 21/02/2012 17:41, NewToSlony a ?crit : > > Trying to run the example slony config script in the documentation but > getting the following error: > > > postgres$ /tmp/slonik_example.sh > :8: PGRES_FATAL_ERROR load '$libdir/slony1_funcs'; - ERROR: could > not access file "$libdir/slony1_funcs": No such file or directory > :8: Error: the extension for the Slony-I C functions cannot be loaded > in database 'dbname=my_primary host=localhost user=warfish > password=coalitions' > > Yet the LIBDIR variable is set correctly: > postgres$ ./pg_config > BINDIR = /opt/local/lib/postgresql90/bin > DOCDIR = /opt/local/share/doc/postgresql > HTMLDIR = /opt/local/share/doc/postgresql > INCLUDEDIR = /opt/local/include/postgresql90 > PKGINCLUDEDIR = /opt/local/include/postgresql90 > INCLUDEDIR-SERVER = /opt/local/include/postgresql90/server > LIBDIR = /opt/local/lib/postgresql90 > > And the lib is present: > > ls -l /opt/local/lib/postgresql90/slony1_funcs.so > -rwxr-xr-x 1 root admin 34944 Feb 17 16:20 > /opt/local/lib/postgresql90/slony1_funcs.so > Could it be a permission problem on the libraries, as they seem to be property of root? -- St?phane Schildknecht http://www.Loxodata.com Contact r?gional PostgreSQL http://bistri.me/sas From ssinger at ca.afilias.info Mon Feb 27 06:41:22 2012 From: ssinger at ca.afilias.info (Steve Singer) Date: Mon, 27 Feb 2012 09:41:22 -0500 Subject: [Slony1-general] Slony replication stops right after start Slony replication stops right after start In-Reply-To: References: <4F479FEA.5080606@ca.afilias.info> <4F47BE63.9020000@ca.afilias.info> <4F4B7CF7.8080402@ca.afilias.info> Message-ID: <4F4B9612.6010503@ca.afilias.info> On 12-02-27 08:53 AM, Ulas Albayrak wrote: > Hi, > > I noted a few specific rows in the sl_log_x tables and later, after > they had been removed from the sl_log_x table, checked for them in the > subscriber but they never got inserted. The answer to the question is > answer #2, Slony thinks All changes earlier than 1.5 hours ago have > been replicated. I know that changes were made before that time to the > tables. Those specific tables continuously receive new data. > > But if Slony thinks the data has been replicated the question arises > of why? Do the worker thread on the provider get some sort of > acknowledgement that the data has been transferred to the provider? > And if so, what is this ack based on? > The slon threads figure out which rows in sl_log belong to a particular SYNC event based on the transaction id's stored as part of the sync event. Slon then replicates these rows and stores a confirmation (that gets sent back to the master) in sl_confirm. Do you have any rows in sl_confirm with con_origin=$master where the con_seqno is bigger than the largets ev_seqno in sl_event with ev_origin=$master ? (this situation should not be possible). > /Ulas > From JanWieck at Yahoo.com Tue Feb 28 08:12:56 2012 From: JanWieck at Yahoo.com (Jan Wieck) Date: Tue, 28 Feb 2012 11:12:56 -0500 Subject: [Slony1-general] Slony replication stops right after start In-Reply-To: References: Message-ID: <4F4CFD08.6020203@Yahoo.com> On 2/24/2012 8:12 AM, Ulas Albayrak wrote: > Let's say I have set up the whole replication system and then at 12:00 > I start the actual replication. Around 12:05 copying of table A from > node 1 to node 2 starts. It finishes but only the data that was > received before 12:05 get copied to node 2. Then at 12:10 copying of > table B starts. Same thing here: Slony copies all the data that was > received before 12:10 to node 2. And this is the same for all tables. This should only happen if both tables belong to different sets. If they belong to the same set, they are copied with the same snapshot inside the same transaction processing the ENABLE_SUBSCRIPTION event. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin From ngramsky at cs.umd.edu Tue Feb 28 19:42:56 2012 From: ngramsky at cs.umd.edu (NewToSlony) Date: Tue, 28 Feb 2012 19:42:56 -0800 (PST) Subject: [Slony1-general] Can you replicate tables between two db's with different schemas? Message-ID: <33411498.post@talk.nabble.com> Is it possible to replicate tables from one database to another if they have different schemas? For example Working set is: DB1 Schema1 DB2 Schema2 Want to replicate a subset of tables from DB1 -> DB2. The tables have the exact same composition, but as DB2 does not have have the entire set of tables from DB1, the schema is different. I ask as I look through the documentation I'm confused how to accomplish this. As I look at the tutorial I see where via the 'SET ADD' command you tell Slony to use teh tables from pgbench. public is the schema but if I were to do this with the above example, is it Schema1, Schema2 or is this not possible? create set (id=1, origin=1, comment=?All pgbench tables?); set add table (set id=1, origin=1, id=1, fully qualified name = ?public.pgbench_accounts ?? ?, comment=?accounts table?); set add table (set id=1, origin=1, id=2, fully qualified name = ?public.pgbench_branches ?? ?, comment=?branches table?); -- View this message in context: http://old.nabble.com/Can-you-replicate-tables-between-two-db%27s-with-different-schemas--tp33411498p33411498.html Sent from the Slony-I -- General mailing list archive at Nabble.com. From ajs at crankycanuck.ca Wed Feb 29 06:26:38 2012 From: ajs at crankycanuck.ca (Andrew Sullivan) Date: Wed, 29 Feb 2012 09:26:38 -0500 Subject: [Slony1-general] Can you replicate tables between two db's with different schemas? In-Reply-To: <33411498.post@talk.nabble.com> References: <33411498.post@talk.nabble.com> Message-ID: <20120229142626.GA67758@crankycanuck.ca> On Tue, Feb 28, 2012 at 07:42:56PM -0800, NewToSlony wrote: > Is it possible to replicate tables from one database to another if they have > different schemas? For example > > Working set is: > DB1 > Schema1 > > DB2 > Schema2 > > Want to replicate a subset of tables from DB1 -> DB2. The tables have the > exact same composition, but as DB2 does not have have the entire set of > tables from DB1, the schema is different. That works, _as long as_ the tables themselves have the same schema. That is, Slony works on sets of tables. Those sets are not identical to the database schema, but the table definition must match on all nodes in the set. A -- Andrew Sullivan ajs at crankycanuck.ca From ssinger at ca.afilias.info Wed Feb 29 06:27:48 2012 From: ssinger at ca.afilias.info (Steve Singer) Date: Wed, 29 Feb 2012 09:27:48 -0500 Subject: [Slony1-general] Can you replicate tables between two db's with different schemas? In-Reply-To: <33411498.post@talk.nabble.com> References: <33411498.post@talk.nabble.com> Message-ID: <4F4E35E4.6000309@ca.afilias.info> On 12-02-28 10:42 PM, NewToSlony wrote: > > Is it possible to replicate tables from one database to another if they have > different schemas? For example > > Working set is: > DB1 > Schema1 > > DB2 > Schema2 This is not possible. You can have DB1: public.a public.b DB2: public.a where some tables on db1 don't exist in db2 but any of the tables that are being replicated must have the same fully qualified name on both systems. > > Want to replicate a subset of tables from DB1 -> DB2. The tables have the > exact same composition, but as DB2 does not have have the entire set of > tables from DB1, the schema is different. > > I ask as I look through the documentation I'm confused how to accomplish > this. As I look at the tutorial I see where via the 'SET ADD' command you > tell Slony to use teh tables from pgbench. public is the schema but if I > were to do this with the above example, is it Schema1, Schema2 or is this > not possible? > > > create set (id=1, origin=1, comment=?All pgbench tables?); > set add table (set id=1, origin=1, id=1, fully qualified name = > ?public.pgbench_accounts ?? > ?, comment=?accounts table?); > set add table (set id=1, origin=1, id=2, fully qualified name = > ?public.pgbench_branches ?? > ?, comment=?branches table?); From JanWieck at Yahoo.com Wed Feb 29 08:04:40 2012 From: JanWieck at Yahoo.com (Jan Wieck) Date: Wed, 29 Feb 2012 11:04:40 -0500 Subject: [Slony1-general] Can you replicate tables between two db's with different schemas? In-Reply-To: <4F4E35E4.6000309@ca.afilias.info> References: <33411498.post@talk.nabble.com> <4F4E35E4.6000309@ca.afilias.info> Message-ID: <4F4E4C98.2000706@Yahoo.com> On 2/29/2012 9:27 AM, Steve Singer wrote: > On 12-02-28 10:42 PM, NewToSlony wrote: >> >> Is it possible to replicate tables from one database to another if they have >> different schemas? For example >> >> Working set is: >> DB1 >> Schema1 >> >> DB2 >> Schema2 > > This is not possible. > > You can have > > DB1: > public.a > public.b > > DB2: > public.a > > where some tables on db1 don't exist in db2 but any of the tables that > are being replicated must have the same fully qualified name on both > systems. For now. We do have plans on changing that in a future release, but it did not make it into the 2.2 development cycle. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin From dbrb2002-sql at yahoo.com Wed Feb 29 09:13:36 2012 From: dbrb2002-sql at yahoo.com (Brian Trudal) Date: Wed, 29 Feb 2012 09:13:36 -0800 (PST) Subject: [Slony1-general] Clone vs subscribe Message-ID: <1330535616.5592.YahooMailNeo@web31807.mail.mud.yahoo.com> Hi I have fairly large database (~1T) with few tables being replicated to other node in the cluster using slony 2.0.7. Now I wanted to add a fresh node with complete copy of the main master. So, whats the best recommended approach ? Clone (clone prepare, dump/load, clone finish ) or use the subscribe mode ? or any other alternative ? Thanks Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120229/948beb68/attachment.htm From cbbrowne at afilias.info Wed Feb 29 09:58:39 2012 From: cbbrowne at afilias.info (Christopher Browne) Date: Wed, 29 Feb 2012 12:58:39 -0500 Subject: [Slony1-general] Clone vs subscribe In-Reply-To: <1330535616.5592.YahooMailNeo@web31807.mail.mud.yahoo.com> References: <1330535616.5592.YahooMailNeo@web31807.mail.mud.yahoo.com> Message-ID: On Wed, Feb 29, 2012 at 12:13 PM, Brian Trudal wrote: > I have fairly large database (~1T) with few tables being replicated to other > node in the cluster using slony 2.0.7. Now I wanted to add a fresh node with > complete copy of the main master. > > So, whats the best recommended approach ? Clone (clone prepare, dump/load, > clone finish ) or use the subscribe mode ? or any other alternative ? Well, CLONE can only be done against a subscriber, not against the origin. That seems to rule out using it for the situation you describe.