murphy pope pope_murphy
Mon Feb 28 22:49:15 PST 2005
I'm trying to get a handle on how this stuff all works so I can explain
it to some team members.  I know that it does work, but it still looks
like magic to me :-)  

I apologize in advance for the length of this message, but I would
greatly appreciate any help.  The best answer(s) would point me to some
already existing documentation.  Thanks in advance.

Here's the scenario:

Let's say that I have a small chain of auto-part stores.  One in
Homeville, one in Nearville, and one in Farville (three imaginary
cities).  Each store keeps an inventory table that keeps track of the
parts in that store.  The Homeville store keeps its inventory table in a
database named homeville, the Nearville store keeps its inventory in a
database named Nearville, and the Farville store keeps its inventory in
a database named Farville.  

I want the Homeville store to replicate its homeville.inventory table to
Nearville and Farville. Likewise, Nearville and Farville each replicate
their databases to the other two.  That way, any store can find
(reasonably) current stock levels for all three stores.  

Question 1 - Is this do-able? Can a single node be both a master and a
slave (Homeville would be a master for homeville.inventory and would
subscribe to nearville.inventory and farville.inventory)?

Assuming the answer to question 1 is yes, here's how I think I should
set up my replication cluster (I'll call it 'stores').  First, I need a
single cluster with three nodes (homeville, nearville, and farville).  

The Homeville node creates a single SET(set number 1) that exposes
homeville.inventory to anyone who's interested.  
The Nearville node creates another SET (set number 2) that exposes
nearville.inventory.  
The Farville node creates a third SET (set number 3) that exposes
farville.inventory.

Homeville subscribes to sets 2 and 3, nearville subscribes to sets 1 and
3, and farville subscribes to sets 1 and 2.

Question 2 - Did I get that right?  Can I do all of this in a single
replication cluster or do I need three separate clusters?

Question 3 - Since Farville is a long way away from Homeville, I want
Farville to pull it's copy of homeville.inventory from Nearville instead
of directly from Homeville.  That's a cascading (or chained) slave,
right?  How do I do that?

Now the meaty stuff... I've been browsing through the Slony source
trying to understand how the magic works.  Here's my understanding so
far:

When I define set number 1 (homeville.inventory), slonik adds an AFTER
UPDATE/INSERT/DELETE trigger to my homeville.inventory table.  When I
modify a row in homeville.inventory, that trigger (_Slony_I_logTrigger)
records my changes in _stores.sl_log_1.  When the trigger writes to
sl_log_1, it also increments the sl_action_seq sequence.   

Question 4 - That's it - the trigger doesn't NOTIFY anybody and it
doesn't push any data to the slaves, is that correct?

The slon process that services node homeville (I'll call that
slon.homeville) wakes up every 10 seconds and checks sl_action_seq to
see if any modifications have been made to tables exposed by node
homeville.  If no modifications have been made, slon.homeville goes back
to sleep.  Except, every sixty seconds, slon.homeville throws out a SYNC
event even if there isn't any work to do.

Question 5 - Why does slon.homeville sleep for a pre-determined (but
adjustable) amount of time?  Why doesn't it LISTEN for a NOTIFY thrown
by _Slony_I_logTrigger instead?  My guess is that slon.homeville would
get flooded by notifications (and the postgres processes would waste a
lot of time sending the notifications), is that right?

If slon.homeville wakes up and finds that sl_action_seq has been changed
(or that 60 seconds have elapsed), it creates a SYNC event.  From what I
can see, slon.homeville writes a row to the _stores.sl_event table and
then does a NOTIFY _storesEvent.  Here's where I start to get a little
confused.

I know that slon.nearville has a connection to the homeville node (while
slon.nearville is running, I can see a backend process running on
homeville that's connected to slon.nearville).  I also know that the
node_1_listen thread in slon.nearville is LISTENing for a notification
on homeville._storesEvent (I can see that in homeville.pg_listenters).
So, when homeville does a NOTIFY _storesEvent, slon.nearville gets the
notification. 

When slon.nearville receives the SYNC event, it reads through the
sl_log_1 table on homeville, converts each entry into an UPDATE, INSERT,
or DELETE statement, and executes that statement, effectively
replicating the modifications from homeville to nearville.  
 
Question 6 - Why does slon.homeville have to write a SYNC event to
sl_event and then do a NOTIFY?  Why isn't the NOTIFY sufficient to tell
nearville that it needs to read through sl_log_1?  I guess it's because
the _storesEvent notification can signal replication data or it can
signal an administrative message - the sl_event record tells
slon.nearville which event actually occurred.  Right so far?

Question 7 - If slon.nearville isn't running, it obviously misses the
NOTIFY that slon.homeville signalled.  Is that what the 60-second
timeout thing is all about?  When slon.nearville finally gets up and
running, it gets a SYNC notification so that it can catch up.

Question 8 - How does homeville know that it can start cleaning out
sl_log_1?  homeville can't remove a modification record from sl_log_1
until it knows that all subscribers have pulled that modification.  Is
that what confirmations are all about?  Does homeville batch together
modification records (maybe each batch corresponds to a SYNC event?) and
keep track of which subscribers have pulled each batch?  

Question 9 - How does homeville know that it can remove an entry from
sl_event?

Thanks again

      ++ Murphy

Question 10 - (sorry, one more bonus question).  Instead of keeping
three databases on each system, can I keep a single database and three
schemas?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://gborg.postgresql.org/pipermail/slony1-general/attachments/20050228/7ef3f173/attachment.html


More information about the Slony1-general mailing list