[Slony1-general] redundent conninfo

Thu Feb 3 14:33:27 PST 2005

> I've been busy working on an example that shows an overly complex and
> advanced
> Slony-I replication cluster lay out (12 nodes).  In the process of setting
> it
> up I have found all sorts of ways to introduce head scratching while
> trying
> to trouble shoot why events arn't propagating, most of it related to
> conninfo.
>
> Is there a valid reason that we need to specify conninfo in more than one
> place? I would think the only place we need it is in the "node N admin
> conninfo" strings  every where else can be implied by referenceing the
> node
> id. Or am I missing something rather fundamental here?

You're speaking of the distinction between 'node n admin conninfo' and
'store path (server=a, client=b, conninfo='whatever'), I presume?

For the full _potential_ generality of it all to be used would be madness.

But I think I can outline where there needs to be what you might _call_
redundancy.

1.  The 'node N admin conninfo' entries in the preamble represent the way
that the Slony-I administrator accesses all of the nodes from his (or her)
workstation.

2.  STORE PATH presents how the nodes will talk to one another.

Consider a Slony-I configuration of six nodes at two sites:

Site 1                                    Site 2

  db1                           ----------- db4
                              /
  db2               ---------               db5
                   /
  db3  ------------                         db6

Within site 1, the 3 nodes db1-db3 are tightly connected on a Gigabyte
Ethernet LAN.  Addresses are 10.1.2.[1-3]  The same is true for db4-db6 at
site 2; addesses 10.1.3.[4-6]

But the only communications between the sites are via a WAN link where
data goes from Site 1 ("primary") to site 2 ("backup") via a slower link. 
db4 can potentially access any of db1-db3, as IP addresses 192.168.4.[1-3]
 And db1-db3 can access db4 as 192.168.4.4

The point of this configuration is that db1-db3 are the "main" cluster; if
huge disaster befalls that site, there is a secondary site that can take
over.

And note also the detail that those two sites are remote, to me.  I'm
running my console somewhere else, and can only get at them all by using
SSH tunnels whereby my conninfo entries are of the form:

hostname=128.0.0.1 port=6001 - db1
hostname=128.0.0.1 port=6002 - db2
hostname=128.0.0.1 port=6003 - db3
hostname=128.0.0.1 port=6004 - db4
hostname=128.0.0.1 port=6005 - db5
hostname=128.0.0.1 port=6006 - db6

This leads to a bunch of pretty different conninfos.

Note that 10.1.2.* addresses are _useless_ to me, in Toronto, as they also
are at site #2.  Likewise, 10.1.3.* addresses are useless at site #1 and
at my workstation in Toronto.

It's not a matter of redundancy; it's a matter of the "perspective" of how
nodes talk to one another.

In practice, I almost always find that the communications problems have to
do with the sl_listen network not being set up right.  This example is one
where it would be easy to be "off."  It's pretty easy to get things
functioning if all the nodes can talk to all of the nodes directly; in
this example, that is not the case.