Tue Oct 4 23:17:11 PDT 2005
- Previous message: [Slony1-general] Re: Slony1-1.0.5 Failover does not work - replication set isn't being moved
- Next message: [Slony1-general] Re: Slony1-1.0.5 Failover does not work - replication set isn't being moved
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
The sl_event table on Node 2 contains a FAILOVER_SET event but node 3 (the backup node specified in the failover command) does not. Should the backup node's sl_event table contain the FAILOVER_SET? sl_event on node 2 contains a FAILOVER_SET: ev_timestamp | ev_origin | ev_seqno | ev_type ----------------------------+-----------+----------+--------------------- 2005-10-04 17:49:10.487603 | 2 | 1 | STORE_PATH 2005-10-04 17:49:10.70457 | 2 | 2 | STORE_PATH 2005-10-04 17:49:10.712416 | 2 | 3 | STORE_LISTEN 2005-10-04 17:49:10.77891 | 2 | 4 | STORE_LISTEN 2005-10-04 17:49:38.146642 | 2 | 5 | SUBSCRIBE_SET 2005-10-04 17:49:05.608095 | 1 | 306 | STORE_NODE 2005-10-04 17:49:05.608095 | 1 | 307 | ENABLE_NODE 2005-10-04 17:49:08.029042 | 1 | 308 | STORE_NODE 2005-10-04 17:49:08.029042 | 1 | 309 | ENABLE_NODE 2005-10-04 17:49:10.641208 | 1 | 310 | STORE_PATH 2005-10-04 17:49:10.679501 | 1 | 311 | STORE_PATH 2005-10-04 17:49:10.722549 | 1 | 312 | STORE_LISTEN 2005-10-04 17:49:10.751999 | 1 | 313 | STORE_LISTEN 2005-10-04 17:55:02.413185 | 2 | 6 | SYNC 2005-10-04 17:49:42.44082 | 1 | 314 | ENABLE_SUBSCRIPTION 2005-10-04 17:49:10.60801 | 3 | 1 | STORE_PATH 2005-10-04 17:49:42.769833 | 1 | 315 | ENABLE_SUBSCRIPTION 2005-10-04 17:49:10.678128 | 3 | 2 | STORE_PATH 2005-10-04 17:49:10.713706 | 3 | 3 | STORE_LISTEN 2005-10-04 17:49:10.743235 | 3 | 4 | STORE_LISTEN 2005-10-04 17:49:38.417454 | 3 | 5 | SUBSCRIBE_SET 2005-10-04 17:49:52.680621 | 1 | 316 | SYNC 2005-10-04 17:50:53.010532 | 1 | 317 | SYNC 2005-10-04 17:51:53.112317 | 1 | 318 | SYNC 2005-10-04 17:52:53.146222 | 1 | 319 | SYNC 2005-10-04 17:53:53.192119 | 1 | 320 | SYNC 2005-10-04 17:54:53.602106 | 1 | 321 | SYNC 2005-10-04 17:55:53.710807 | 1 | 322 | SYNC 2005-10-04 17:56:02.893106 | 2 | 7 | SYNC 2005-10-04 17:56:42.786823 | 3 | 6 | SYNC 2005-10-04 17:56:53.833985 | 1 | 323 | SYNC 2005-10-04 17:57:03.007883 | 2 | 8 | SYNC 2005-10-04 17:57:43.692981 | 3 | 7 | SYNC 2005-10-04 17:57:53.902912 | 1 | 324 | SYNC 2005-10-04 17:58:03.062867 | 2 | 9 | SYNC 2005-10-04 17:58:43.736478 | 3 | 8 | SYNC 2005-10-04 17:58:53.953325 | 1 | 325 | SYNC 2005-10-04 17:59:03.112996 | 2 | 10 | SYNC 2005-10-04 17:59:43.77303 | 3 | 9 | SYNC 2005-10-04 17:59:54.095892 | 1 | 326 | SYNC 2005-10-04 18:00:03.155204 | 2 | 11 | SYNC 2005-10-04 18:00:43.810793 | 3 | 10 | SYNC 2005-10-04 18:01:03.196571 | 2 | 12 | SYNC 2005-10-04 18:01:43.865925 | 3 | 11 | SYNC 2005-10-04 18:02:03.216029 | 2 | 13 | SYNC 2005-10-04 18:02:43.905505 | 3 | 12 | SYNC 2005-10-04 18:03:03.238632 | 2 | 14 | SYNC 2005-10-04 18:03:38.947704 | 1 | 327 | FAILOVER_SET 2005-10-04 18:03:48.819508 | 3 | 13 | SYNC 2005-10-04 18:03:49.921361 | 2 | 15 | SYNC 2005-10-04 18:04:48.875801 | 3 | 14 | SYNC 2005-10-04 18:04:49.970829 | 2 | 16 | SYNC 2005-10-04 18:05:48.92941 | 3 | 15 | SYNC 2005-10-04 18:05:49.985511 | 2 | 17 | SYNC 2005-10-04 18:06:48.963277 | 3 | 16 | SYNC 2005-10-04 18:06:49.998737 | 2 | 18 | SYNC 2005-10-04 18:07:49.033346 | 3 | 17 | SYNC 2005-10-04 18:07:50.028334 | 2 | 19 | SYNC 2005-10-04 18:08:49.051861 | 3 | 18 | SYNC 2005-10-04 18:08:50.056542 | 2 | 20 | SYNC 2005-10-04 18:09:49.075309 | 3 | 19 | SYNC 2005-10-04 18:09:50.093277 | 2 | 21 | SYNC (62 rows) sl_event on node 3 (backup node) does not have the FAILOVER_SET: ev_timestamp | ev_origin | ev_seqno | ev_type ----------------------------+-----------+----------+--------------------- 2005-10-04 17:49:10.60801 | 3 | 1 | STORE_PATH 2005-10-04 17:49:10.678128 | 3 | 2 | STORE_PATH 2005-10-04 17:49:10.713706 | 3 | 3 | STORE_LISTEN 2005-10-04 17:49:10.743235 | 3 | 4 | STORE_LISTEN 2005-10-04 17:49:38.417454 | 3 | 5 | SUBSCRIBE_SET 2005-10-04 17:49:10.487603 | 2 | 1 | STORE_PATH 2005-10-04 17:49:08.029042 | 1 | 308 | STORE_NODE 2005-10-04 17:49:10.70457 | 2 | 2 | STORE_PATH 2005-10-04 17:49:08.029042 | 1 | 309 | ENABLE_NODE 2005-10-04 17:49:10.712416 | 2 | 3 | STORE_LISTEN 2005-10-04 17:49:10.641208 | 1 | 310 | STORE_PATH 2005-10-04 17:49:10.77891 | 2 | 4 | STORE_LISTEN 2005-10-04 17:49:10.679501 | 1 | 311 | STORE_PATH 2005-10-04 17:49:38.146642 | 2 | 5 | SUBSCRIBE_SET 2005-10-04 17:49:10.722549 | 1 | 312 | STORE_LISTEN 2005-10-04 17:55:02.413185 | 2 | 6 | SYNC 2005-10-04 17:56:02.893106 | 2 | 7 | SYNC 2005-10-04 17:49:10.751999 | 1 | 313 | STORE_LISTEN 2005-10-04 17:49:42.44082 | 1 | 314 | ENABLE_SUBSCRIPTION 2005-10-04 17:56:42.786823 | 3 | 6 | SYNC 2005-10-04 17:57:03.007883 | 2 | 8 | SYNC 2005-10-04 17:49:42.769833 | 1 | 315 | ENABLE_SUBSCRIPTION 2005-10-04 17:49:52.680621 | 1 | 316 | SYNC 2005-10-04 17:50:53.010532 | 1 | 317 | SYNC 2005-10-04 17:51:53.112317 | 1 | 318 | SYNC 2005-10-04 17:52:53.146222 | 1 | 319 | SYNC 2005-10-04 17:53:53.192119 | 1 | 320 | SYNC 2005-10-04 17:54:53.602106 | 1 | 321 | SYNC 2005-10-04 17:55:53.710807 | 1 | 322 | SYNC 2005-10-04 17:56:53.833985 | 1 | 323 | SYNC 2005-10-04 17:57:43.692981 | 3 | 7 | SYNC 2005-10-04 17:57:53.902912 | 1 | 324 | SYNC 2005-10-04 17:58:03.062867 | 2 | 9 | SYNC 2005-10-04 17:58:43.736478 | 3 | 8 | SYNC 2005-10-04 17:58:53.953325 | 1 | 325 | SYNC 2005-10-04 17:59:03.112996 | 2 | 10 | SYNC 2005-10-04 17:59:43.77303 | 3 | 9 | SYNC 2005-10-04 17:59:54.095892 | 1 | 326 | SYNC 2005-10-04 18:00:03.155204 | 2 | 11 | SYNC 2005-10-04 18:00:43.810793 | 3 | 10 | SYNC 2005-10-04 18:01:03.196571 | 2 | 12 | SYNC 2005-10-04 18:01:43.865925 | 3 | 11 | SYNC 2005-10-04 18:02:03.216029 | 2 | 13 | SYNC 2005-10-04 18:02:43.905505 | 3 | 12 | SYNC 2005-10-04 18:03:03.238632 | 2 | 14 | SYNC 2005-10-04 18:03:48.819508 | 3 | 13 | SYNC 2005-10-04 18:03:49.921361 | 2 | 15 | SYNC 2005-10-04 18:04:48.875801 | 3 | 14 | SYNC 2005-10-04 18:04:49.970829 | 2 | 16 | SYNC 2005-10-04 18:05:48.92941 | 3 | 15 | SYNC 2005-10-04 18:05:49.985511 | 2 | 17 | SYNC 2005-10-04 18:06:48.963277 | 3 | 16 | SYNC 2005-10-04 18:06:49.998737 | 2 | 18 | SYNC 2005-10-04 18:07:49.033346 | 3 | 17 | SYNC 2005-10-04 18:07:50.028334 | 2 | 19 | SYNC 2005-10-04 18:08:49.051861 | 3 | 18 | SYNC 2005-10-04 18:08:50.056542 | 2 | 20 | SYNC 2005-10-04 18:09:49.075309 | 3 | 19 | SYNC 2005-10-04 18:09:50.093277 | 2 | 21 | SYNC 2005-10-04 18:10:49.100012 | 3 | 20 | SYNC 2005-10-04 18:10:50.117138 | 2 | 22 | SYNC (61 rows) On 10/4/05, Fiel Cabral <e4696wyoa63emq6w3250kiw60i45e1 at gmail.com> wrote: > > The problem persists after the node IDs were changed from [1, 2, 3] to > [10, 20, 30]. > > Inside gdb, the failedNode2 query did not return an error (function return > value was 0). > > Node 2 was able to move the set_origin = node 3. > Nodes 3 is stuck with set_origin = node 1. > > On 10/4/05, Fiel Cabral <e4696wyoa63emq6w3250kiw60i45e1 at gmail.com > wrote: > > > > Thanks Elein. I'll run gdb and step through slonik_failed_node to > > (maybe) see if failedNode2 is failing. > > > > > > On 10/4/05, elein <elein at varlena.com > wrote: > > > > > > Fiel, > > > > > > In my own tests, with node 10->20->30, failover from 10 to 20 failed > > > because node 30 was unusable and had to be recreated from scratch. > > > This is a serious bug in my book. > > > > > > In one case the problem seemed to be dropping the first node > > > "too soon". I have not tested that case so I don't know that > > > this was the problem. > > > > > > What I have verified is that the third node never recieved any message > > > regarding the failover and did not change its information > > > to get its table set from the new origin, 20. > > > > > > Also, try not to use Node 1, 2, 3. Node 1 has some special meaning > > > in some cases that you will want to avoid. > > > > > > We are with you, not ignoring you. > > > > > > --elein > > > > > > On Tue, Oct 04, 2005 at 11:13:19AM -0400, Fiel Cabral wrote: > > > > Right after running the failover command I issue the DROP NODE > > > command to drop > > > > node 1. slonik prints an error message and exits with return value > > > 12: > > > > > > > > sys:17: TRY: drop node > > > > sys:19: PGRES_FATAL_ERROR select "_whatever".dropNode(1); - ERROR: > > > Slony-I: > > > > Node 1 is still origin of one or more sets > > > > > > > > Something should have changed the origin to node 3 but it isn't > > > happening. > > > > > > > > > > > > On 10/4/05, Fiel Cabral <e4696wyoa63emq6w3250kiw60i45e1 at gmail.com > > > > wrote: > > > > > > > > I have 3 nodes. Nodes 2 and 3 are subscribers of node 1 and I'm > > > trying to > > > > failover from node 1 to node 3. The failover command succeeds but > > > the > > > > database of node 3 is still read-only and the origin is still node > > > 1. I > > > > don't have the same problem when doing failover with only two nodes > > > because > > > > the set is moved immediately by failedNode. > > > > > > > > failedNode (in the code below) is able to set the provider > > > successfully. > > > > > > > > Some code elsewhere is actually moving the replication set. Where is > > > that > > > > code? Is it in slon or slonik or in the sql scripts? > > > > > > > > How do I find out that slon caught the signal and is doing the right > > > thing > > > > in response to the signal? > > > > > > > > 784 raise notice ''failedNode: set % has other direct receivers - > > > > change providers only'', v_row.set_id; > > > > 785 -- ---- > > > > 786 -- Backup node is not the only direct > > > > subscriber. This > > > > 787 -- means that at this moment, we redirect > > > > all direct > > > > 788 -- subscribers to receive from the backup > > > > node, and the > > > > 789 -- backup node itself to receive from > > > > another one. > > > > 790 -- The admin utility will wait for the slon > > > > engine to > > > > 791 -- restart and then call failedNode2() on > > > > the node with > > > > 792 -- the highest SYNC and redirect this to it > > > > on > > > > 793 -- backup node later. > > > > 794 -- ---- > > > > ... etc ... > > > > 811 > > > > 812 -- ---- > > > > 813 -- Make sure the node daemon will restart > > > > 814 -- ---- > > > > 815 notify "_ at CLUSTERNAME@_Restart"; > > > > 816 > > > > > > > > -Fiel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Slony1-general mailing list > > > > Slony1-general at gborg.postgresql.org > > > > http://gborg.postgresql.org/mailman/listinfo/slony1-general > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://gborg.postgresql.org/pipermail/slony1-general/attachments/20051004/41d49751/attachment-0001.html
- Previous message: [Slony1-general] Re: Slony1-1.0.5 Failover does not work - replication set isn't being moved
- Next message: [Slony1-general] Re: Slony1-1.0.5 Failover does not work - replication set isn't being moved
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list