[Slony1-general] Recommended maximum slaves per master

Fri Dec 1 08:27:22 PST 2006

Victoria Parsons wrote:
>
> Hi,
>
> I have been using slony to replicate two database from a master
> machine to a varied number of slaves on a production system. The
> maximum that has been tested is 12. I have been asked how many we
> could get up to. From following the mailing list I have got an idea in
> my head of no more than about 20. This is because of the increased CPU
> each slon daemon uses. I know this could be increased to some extent
> by getting a more powerful machine for my master.
>
> There is talk here of replicating two databases to 1024 machines. I'm
> pretty sure that will fall over in a big heap. Has anyone ever tried
> that many? I have never used the log shipping method - would that help
> by reducing load on the master? Also I run all slon daemons from the
> master server. Would it become more scalable if I moved the slon
> daemons to each slave in the system.
>
> If its crazy to try and get that many machines replicating then just
> scream at me now and we'll sort out a different way to send data.
>
Each slave will put some load on the provider it is requesting data
from, which will limit the number of direct subscribers you can set up.

Of course, using cascaded subscribers, e.g. - node 1 feeds node 2, and
node 2 then feeds node 3, can alleviate this particular problem.

Indeed, you could keep Slony-I-related load on the "master", let's say
it's node #1, to a minimum by setting up node #2 which subscribes to
node #1, and then have further subscribers feed off node #2.  The number
of nodes could further multiply if the subscription arrangement were set
up something like:

#1 feeds #2

#2 feeds #3-#8  (six nodes)

#3 feeds #9-14 (another six nodes)

#4 feeds #15-20 (another six nodes)

and so forth, where this rapidly gets you to 44 nodes, where node #1 is
only under the direct load of feeding one other node.

One might hope for this to allow exponential growth of numbers of nodes;
with 6-way feeding, after the first node, you get growth like...

(defun growth (levels) 
  (let ((nodes 0) (clevel 0))
    (loop
      for i from 1 to levels
      do (if (<= i 2) (progn (incf nodes) (setf clevel 1))
       (progn (setf clevel (* clevel 6)) (setf nodes (+ nodes clevel)))))
    nodes))

(loop for i from 1 to 20
  do (format t "~D  ~D~%" i (growth i)))

1  1
2  2
3  8
4  44
5  260
6  1556
7  9332
8  55988
9  335924
10  2015540
11  12093236
12  72559412
13  435356468
14  2612138804
15  15672832820
16  94036996916
17  564221981492
18  3385331888948
19  20311991333684
20  121871948002100

That would seem to point to this being good news; in principle, you
ought to be able to support an arbitrary number of nodes in this
fashion, each provider being only under moderate load.

There is, alas, a fly in the ointment in the form of the event
communications that takes place.

Each node responds to every other node, confirming that it has received
each Slony-I event.  The proliferation of event confirmation messages
provides a workload that grows at a quadratic rate based on the number
of nodes, and which applies to every single node.  THAT would be what
prevents you from getting to 1024 nodes.

As has been mentioned, an alternative that can get you to 1024 without
having any obvious single bottleneck would be to use log shipping.  That
is, you serialize the update queries into files by running a subscriber
slon with the "-a" option.  Those files may be copied around in ways
that shouldn't need to have any bottlenecks like those mentioned thus far.

Thus, you might set up some "reasonable" set of normal, first class
subscribers, perhaps on the order of a dozen of them.  Perhaps one per
major city, or one per continent, or one per region, or whatever.

Then you set up some (many?  most?)  of those subscribers to generate
log shipping data, and use that data to feed nodes further downstream,
where the logs get shipped from the nearest "major" node to various
smaller destinations.

Hope that helps...