[Slony1-general] Slony index space usage

Fri Jul 16 17:54:18 PDT 2004

I'm running V7.4.3.  7.4 has much better index reusage but it is not perfect and still can orphan space.  Something with how 'approximate' the unused space is relative to the new entry and whatever tolerance PostgreSQL has.  For example, with my app my DB runs around 8gb on Sunday after a full reindex and only indexes bloat during the week but they bloat the footprint up to 10gb(w/ 7.2 I would be 12-13gig a week later).  Annoying.  

It changes how I need to code my maintenance. ie. I have my own version of an auto-vac daemon because I need to account for a few things before I drain IO doing a vacuum.  My experince so far though with Slony's replication is that it does re-use index space.

Here is how I normally handle my reindex:(just to give some perspective on how I'm not really hurt by reindexing, excluding slony):
1.  I have a temporary table called TSyslog, all new logs are dumped here
2.  Every five minutes I grab all of the logs out of there, process them and move them to a long term storage table syslog_tarchive(a table comprising 99% of my footprint size)
3.  I 'autovac/reindex' the snot out of this thing to keep it small.  The syslog server queues up logs(can hold a few thousand) if it cannot write but in most cases the autovac/reindex time is 20-30ms, so this isn't a huge problem.
4.  When I reindex(the big ugly one on Sundays), the logs in the temporary table cannot be moved over to the syslog_tarchive table because it is locked.  No problem, the logs queue up in TSyslog in the mean time.
5.  When indexing is done, I grab the huge load of logs from the TSyslog table, process, and move them over to Syslog_TArchive and I am on my way with zero data lost.

Here is how I see Slony possibly complicating the situation based on my initial findings:
1.  Reindex maintenance is getting ready to run.
2.  The sl_log_1/2 tables are something like lets say 150 megs in size w/ another 40 megs in index space.
3.  I vacuum sl_log_1/2 and then begin reindexing sl_log_1/2
4.  An incoming log message comes in to TSyslog to be written, however the table siezes after the first insert because the triggers cannot write to sl_log_1/2.
5.  After a few minutes the reindex finishes and the system starts working again but in the meantime I have lost messages because I've run out of in_queue space on the syslog server to hold messages.  

So I am looking for confirmation that 2-4 are what I should see?  I've been busy beating up Slony on my laptop and VMWare that I haven't been able to test this yet.

My other concern I'm working on is trying to understand the impact of a slave node being down for some long duration and then Slony building up logs to replicate over.   At some point I'll need to shrink the log space and indexes.  However the impact is the same as the question above.

-----Original Message-----
From: Jan Wieck [mailto:JanWieck at Yahoo.com]
Sent: Fri 7/16/2004 11:57 AM
To: Guthrie, Jeremy
Cc: slony1-general at gborg.postgresql.org
Subject: Re: [Slony1-general] Slony index space usage

On 7/15/2004 3:07 PM, Guthrie, Jeremy wrote:

> First off, congrats to the writers of Slony.  Excellent software.
> 
> I am putting it through its paces as I try to update my application to better account for Slony.  My application au
> tomatically takes care of weekly reindexing because of the volume of data that changes daily.  However it is runnin
> g into errors when looking at the Slony Indexes.  Reindex is erroring out on the '-'.  I thought I saw something ab
> out this before but I cannot find where.   Specifically, the index is  tsyslog__Slony-I_tsyslog_cluster_rowID_key.
> 

You would have to double quote that index name because of the mixed case 
and dash.

I wonder though if that reindexing is an artefact left over from running 
older versions of PostgreSQL. Are you sure that the version you are 
running currently still suffers from index bloat despite of frequent 
(non-full) vacuuming?

> 1.
> a)  should I worry about having to reindex this?
> b)  if so, how?  psql won't let me.
> 
> 2.  I've noticed that the above listed index seems to stay the same size after running a replication test.  Is my a
> ssumption correct that because of how Slony behaves that it will rarely leave unused index space?
> 
> 3.  By doing a full vacuum or a reindex, will that block an entire replication set or the one table of the set till
>  the full vacuum or reindex finishes?

full vacuum as well as reindex acquire an exclusive lock on the table. 
You're not doing your application a favor with doing that. You should 
rather look into pg_autovacuum or setting up your own proper vacuum 
scheduling.

> 4.  If you want me to be quiet, just say so.  ;)

Questions welcome.

Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #