Fri Mar 31 18:50:29 PST 2006
- Previous message: [Slony1-general] Initial COPY works, but no new data since.
- Next message: [Slony1-general] Initial COPY works, but no new data since.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, 31 Mar 2006 17:26:07 +0000 Christopher Browne <cbbrowne at ca.afilias.info> wrote: > I see a first problem here... > > It's useless to use the "-a" option against the "master" node. The > master node is not subscribing to any sets, so nothing will be recorded > in the "log shipping" logs there. OK fair enough. This evening I decided to uninstall both nodes, drop the _replication schema on both databases, and start again, this time much closer to the HOWTO... dropdb + createdb on slave, then import the schema only. The following was imported into slonik on Node1 without failure: cluster name = replication; node 1 admin conninfo='host=194.24.250.137 dbname=laterooms user=XXX port=5432 password=XXX'; node 2 admin conninfo='host=194.24.250.143 dbname=laterooms user=XXX port=5432 password=XXX'; init cluster (id=1, comment = 'Cayenne'); # CREATE SET try { create set (id = 1, origin = 1, comment = 'Set 1 for replication'); } on error { echo 'Could not create subscription set 1 for replication!'; exit -1; } # SET ADD TABLE echo 'Subscription set 1 created'; echo 'Adding tables to the subscription set'; set add table (set id = 1, origin = 1, id = 1, full qualified name = 'public.AccessRoles', comment = 'Table public.AccessRoles with primary key'); ... and so on for 85 tables and 89 sequences... # STORE NODE store node (id = 2, event node = 1, comment = 'Node 2 - Jalapeno'); echo 'Set up replication nodes'; # STORE PATH echo 'Next: configure paths for each node/origin'; store path (server = 2, client = 1, conninfo = 'host=194.24.250.143 dbname=laterooms user=XXX port=5432 password=XXX'); store path (server = 1, client = 2, conninfo = 'host=194.24.250.137 dbname=laterooms user=XXX port=5432 password=XXX'); # STORE LISTEN store listen (origin = 1, receiver = 2, provider = 1); store listen (origin = 2, receiver = 1, provider = 2); echo 'Replication nodes prepared'; echo 'Please start a slon replication daemon for each node'; So, then .. Node1$ slon replication "host=194.24.250.137 dbname=laterooms user=XXX password=XXX" Node2$ slon replication "host=194.24.250.143 dbname=laterooms user=XXX password=XXX" then watching the slon log on Node2.... 2006-04-01 01:24:09 BST CONFIG main: slon version 1.1.5 starting up 2006-04-01 01:24:09 BST CONFIG main: local node id = 2 2006-04-01 01:24:09 BST CONFIG main: launching sched_start_mainloop 2006-04-01 01:24:09 BST CONFIG main: loading current cluster configuration 2006-04-01 01:24:09 BST CONFIG storeNode: no_id=1 no_comment='Cayenne' 2006-04-01 01:24:09 BST CONFIG storePath: pa_server=1 pa_client=2 pa_conninfo="host=194.24.250.137 dbname=laterooms user=XXX port=5432 password=XXX" pa_connretry=10 2006-04-01 01:24:09 BST CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1 2006-04-01 01:24:09 BST CONFIG storeSet: set_id=1 set_origin=1 set_comment='Set 1 for replication' 2006-04-01 01:24:09 BST WARN remoteWorker_wakeup: node 1 - no worker thread 2006-04-01 01:24:09 BST CONFIG main: configuration complete - starting threads 2006-04-01 01:24:09 BST DEBUG1 localListenThread: thread starts 2006-04-01 01:24:09 BST CONFIG enableNode: no_id=1 2006-04-01 01:24:09 BST DEBUG1 remoteWorkerThread_1: thread starts 2006-04-01 01:24:09 BST DEBUG1 remoteListenThread_1: thread starts 2006-04-01 01:24:09 BST DEBUG1 cleanupThread: thread starts 2006-04-01 01:24:09 BST DEBUG1 main: running scheduler mainloop 2006-04-01 01:24:09 BST DEBUG1 syncThread: thread starts 2006-04-01 01:24:09 BST DEBUG1 remoteListenThread_1: connected to 'host=194.24.250.137 dbname=laterooms user=XXX port=5432 password=XXX' 2006-04-01 01:24:09 BST CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1 2006-04-01 01:24:09 BST CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1 2006-04-01 01:24:09 BST CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1 Now I submit the magic 'subscribe set ( id = 1, provider = 1, receiver = 2, forward = no);' 2006-04-01 01:25:45 BST CONFIG storeSubscribe: sub_set=1 sub_provider=1 sub_forward='f' 2006-04-01 01:25:45 BST CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1 2006-04-01 01:25:45 BST DEBUG1 copy_set 1 2006-04-01 01:25:45 BST DEBUG1 remoteWorkerThread_1: connected to provider DB 2006-04-01 01:34:31 BST DEBUG1 cleanupThread: 0.018 seconds for cleanupEvent() 2006-04-01 01:34:31 BST DEBUG1 cleanupThread: 0.006 seconds for delete logs 2006-04-01 01:44:58 BST DEBUG1 cleanupThread: 0.006 seconds for cleanupEvent() 2006-04-01 01:44:58 BST DEBUG1 cleanupThread: 0.004 seconds for delete logs 2006-04-01 01:55:33 BST DEBUG1 cleanupThread: 0.003 seconds for cleanupEvent() 2006-04-01 01:55:33 BST DEBUG1 cleanupThread: 0.003 seconds for delete logs 2006-04-01 02:06:26 BST DEBUG1 cleanupThread: 0.003 seconds for cleanupEvent() 2006-04-01 02:06:26 BST DEBUG1 cleanupThread: 0.005 seconds for delete logs 2006-04-01 02:16:49 BST DEBUG1 cleanupThread: 0.005 seconds for cleanupEvent() 2006-04-01 02:16:49 BST DEBUG1 cleanupThread: 0.004 seconds for delete logs 2006-04-01 02:27:28 BST DEBUG1 cleanupThread: 0.003 seconds for cleanupEvent() 2006-04-01 02:27:28 BST DEBUG1 cleanupThread: 0.004 seconds for delete logs 2006-04-01 02:39:00 BST DEBUG1 cleanupThread: 0.006 seconds for cleanupEvent() 2006-04-01 02:39:00 BST DEBUG1 cleanupThread: 0.007 seconds for delete logs 2006-04-01 02:49:33 BST DEBUG1 cleanupThread: 0.003 seconds for cleanupEvent() 2006-04-01 02:49:33 BST DEBUG1 cleanupThread: 0.005 seconds for delete logs 2006-04-01 02:59:54 BST DEBUG1 cleanupThread: 0.002 seconds for cleanupEvent() 2006-04-01 02:59:54 BST DEBUG1 cleanupThread: 0.005 seconds for delete logs 2006-04-01 03:10:14 BST DEBUG1 cleanupThread: 0.003 seconds for cleanupEvent() 2006-04-01 03:10:14 BST DEBUG1 cleanupThread: 0.007 seconds for delete logs 2006-04-01 03:13:46 BST DEBUG1 remoteWorkerThread_1: disconnected from provider DB 2006-04-01 03:13:46 BST DEBUG1 copy_set 1 done in 6480.502 seconds 2006-04-01 03:13:46 BST CONFIG enableSubscription: sub_set=1 2006-04-01 03:13:58 BST CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1 2006-04-01 03:13:58 BST DEBUG1 remoteWorkerThread_1: helper thread for provider 1 created 2006-04-01 03:13:58 BST DEBUG1 remoteWorkerThread_1: connected to data provider 1 on 'host=194.24.250.137 dbname=laterooms user=XXX port=5432 password=XXXXXXXXX' At this point I would be expecting Node2 to start catching up with whatever Node1's been up to during the 2 hours of the copy_set.. but it does not. If I run this: $ tcpdump -i bond0 -n host 194.24.250.143 and port 5432 -s 1500 -w out Then do a trivial UPDATE of one row on a table on Node1, and let tcpdump run for 30 seconds, the only queries I see sent across are "select ev_origin, ev_seqno, ev_timestamp..." - I'm not seeing any real data. The thing that worries me most is that 'slon' on Node2 is using 100% CPU time and apparently doing nothing at all. There is no CPU use for any postgres process, nor any disk activity (confirmed with iostat). Related to this /must/ be the fact that there's a postgres process on /both/ master+slave marked as 'idle in transaction'. I've asked our devels if they're using any transactions in the codebase and they've told me "We don't, but NHibernate likely does" if that makes any difference. How can I find out a bit more about what transaction the system is stuck in, since it may be causing some kind of busy-wait loop in slon hence the 100% CPU use? Node1 is an Quad Xeon machine, and Node2 is an AIX 5.3 box (yep, PG compiled --with-thread-safety - slony complained loudly when I didn't have this flag set, so I recompiled PG.) and PG 8.1.3 + Slony 1.1.5 are in use on both machines. I've followed all the instructions and I understand what each step does - I just can't work out what the problem is :( Finally, just after copy_set finished, this was noticed in the postgres log: 2006-04-01 03:13:57 BST LOG: duration: 11414.154 ms statement: select "_replication".enableSubscription(1, 1, 2); notify "_replication_Event"; notify "_replication_Confirm"; insert into "_replication".sl_event (ev_origin, ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type , ev_data1, ev_data2, ev_data3, ev_data4 ) values ('1', '249', '2006-04-01 01:25:45.586148', '196845367', '196845368', '', 'ENABLE_SUBSCRIPTION', '1', '1', '2', 'f'); insert into "_replication".sl_confirm (con_origin, con_received, con_seqno, con_timestamp) values (1, 2, '249', now()); commit transaction; Yours in a mix of hope + desperation, Gavin.
- Previous message: [Slony1-general] Initial COPY works, but no new data since.
- Next message: [Slony1-general] Initial COPY works, but no new data since.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list