[Slony1-general] gotchas from experience ?

Tue May 30 15:52:14 PDT 2006

On Tue, May 30, 2006 at 02:46:47PM -0400, Rod Taylor wrote:
> On Fri, 2006-05-26 at 21:09 +0000, Mark Stosberg wrote:
> > Hello,
> > 
> > I'm getting start with Slony 1.1.5, and have done a lot of testing, and
> > read a lot of docs, including the "Best Practices" section. Things have
> > gone fairly well.
> 
> There are a few issues with 1.1.5 you might run into if you replicate
> more than about 100GB of data, particularly if it is to multiple nodes.
> Earlier version had more issues but they have been solved (thanks!).
> 
> If you have more than about 500GB of data on a heavily used OLTP
> database, I strongly advise posting a detailed plan of attack or hiring
> a consultant. There are lots of gotchas at that level of the game just
> due to the impact of the time involved.

Some of the frontline consultants are also in this mess...

> 
> 1) Long running transactions. pg_dump, vacuum, and Slony itself (initial
> copy) all run in long running transactions and can play havoc with your
> regularly scheduled vacuum process.
>         
> You may find that the IO required as a result of the long running
> initial copy process is significant due to the lack of vacuum during
> that time. If you can, replicate large tables in different sets and
> different times and merge them after the initial copy has completed.
> 
> 2) pg_listener bloat. This should be solved in 1.2, but in the mean time
> beware pg_listener bloat. The most surprising time this can get you is
> in a 3 or more node system when creating an additional node. The backlog
> on this structure during the initial node creation (long running
> transaction) can cripple the system.
> 
> 3) Large tuples require approximately 4x memory (source db, destination
> db, 2 copies in slony) to be copied. If you have a 500MB tuple and the
> source and destination machine is the same, it will require a minimum of
> 2GB memory to accomplish the job. This is a dramatic improvement over
> the 1.0 days.
> 
> 4) Stampeding Slony. If for any reason Slony is unable to complete the
> replication work available within the timeframe allotted (5 minutes I
> think), Slony will abandon the first connection and establish a new one
> to retry.

Exactly what "replication work" do you mean?  One table? All tables being copied?
In my situation I have 6500*5 + 100 tables to copy.  No way is that going to be
completed in 5 minutes no matter that the tables are small.  (And no
I did not design the schema :)

> 
> The problem is the database doesn't necessarily get rid of the old
> connection. I've seen Slony with several hundred connections to the
> database from a single backend.
> 
> With PostgreSQL 8.1, apply a CONNECTION LIMIT to the slony user to
> ensure it doesn't eat up all of the DB connections preventing real work
> from happening.
> 
> I believe that fixing pg_listener issues (Slony 1.2 when released) may
> solve this problem as well.
> 
> 5) Large DBs will require some patches not in Slony 1.1.5 but they are
> in CVS, otherwise they will simply never create an up to date node.
> Probably not an issues unless you have > 100GB of data to replicate.
> Maximum group size is also something to look at here.

Are you saying slony won't handle databases of >100GB?  Or tables? 
If the database is larger than that, exactly what patches should be added 
for exactly what result?  For the most part I am doing production work and never apply 
patches not in the main release for obvious reasons.  If there are crucial 
ones, though, I need more details.

> 
> Consider shutting down Slon for other subscribers while creating a new
> subscriber. (node 1 feeds node 2 -- consider shutting down slon for node
> 2 while setting up node 3 as a subscriber to node 1).
> 
> 6) If you have a large tables (> 50GB without indexes) consider removing
> indexes other than the primary key before having that node subscribe to
> the set. Add the indexes back by hand after it is initialized if they're
> necessary for the work you will be doing on that machine.
> 
> 7) Network communication problems play havoc with Slony and can result
> in similar symptoms as #4. If the admins will be doing network
> maintenance, I recommend shutting down the impacted nodes slon daemons.
> 
> 8) Don't even try to use Slony with more than 3 to 4 nodes on an active
> dataset without having a dedicated forwarder to feed them. Dedicated
> means no user connections. Strictly slony work.
> 
>     1
>     | 
>     2 (dedicated forwarder)
> / | | | | \
> 3 4 5 6 7
> 
> -- 

elein
elein at varlena.com