[Slony1-general] 2d grid partitioned geographical database with local replication: practical with slony?

Thu Nov 24 08:05:13 PST 2005

Hello,

I'm new to slony, but have been evaluating it in order to examine the
feasibility of splitting up a national database so that it can run on a farm
of simple servers.

Very briefly, I describe the scheme, my tests so far, and then ask my
questions:

------------------------------------------------

THE SCHEME

----------

a) The entire country is subdivided into a grid of cells, (the size of which
to be decided. Let's say 2km square for now)

b) All the objects in the database are located by grid reference.

c) All queries are designed to return results which are geographically close
(let's say no more than 2km away from a particular point). No queries need
ever return data further than one cell away.

d) Each cell maintains its own database.

e) A cell's database is the 'master' database for all data located within
the cell's boundaries

f) The tables in each cell are slaved to the adjoining neighbours (with
slony-1)

g) In order to provide cross border searches (to prevent the problem of
people living near the edge of a cell only seeing half the stuff nearby),
queries served by the cell use a union of the master tables and the sets
slaved from the adjoining cells.

h) Cells are distributed across a server farm. The number of cells on each
server depends upon the activity in each cell and the capability of the
server. The worst case scenario is that a single cell occupies its own
server. To start with, many cells (20 or so) may occupy a single server, but
will be migrated to new servers as they become busier.

What this means in practice is best shown with a diagram:

----------------------------

|        |        |        |

| Cell A | Cell B | Cell C |

|        |        |        |

----------------------------

|        |        |        |

| Cell D | Cell E | Cell F |

|        |        |        |

----------------------------

|        |        |        |

| Cell G | Cell H | Cell I |

|        |        |        |

----------------------------

Considering Cell E:

a)Cell E's database is the master database for information located
geographically within cell E.

b) The 8 adjoining cells slave this data

c) Cell E slaves data from all 8 adjoining cells.

---------------------------------------------------------

MY TESTS SO FAR

---------------

So far, I've manually setup a test case with just the top row of the example
above (3 cells - A B and C in a row).

It works. Add something to A, it appears in B. Add something to B, it
appears in A and C. Add something to C, it appears in B.

So far, so good.

---------------------------------------------------------

MY CONCERNS

-----------

1. For cell E, the number of slon threads would be 18. (it seems to be two
for each node). Is this within the acceptable parameters, or is it a really
bad idea? What are the system overheads? In my example of a single server
running 20 (not very active) cells, this would inflate 20 360 processes.

2. In order to provide 'cross border' local searches, all 'cross border
data' is effectively slaved 8 times. Should I be concerned by this?

The aim is that both the web serving and the database for a particular cell
is managed by the same server, and that because the cells are small, the
local data will easily fit within RAM (allowing for apache and other
services) - even with the local slaved copies of adjacent cells' data.

3. Has this been tried before with disastrous consequences?!

Many thanks,

Andy Ballingall

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://gborg.postgresql.org/pipermail/slony1-general/attachments/20051124/fc3fcd72/attachment.html