Christopher Browne cbbrowne at ca.afilias.info
Tue Nov 18 14:22:34 PST 2008
Richard Yen <dba at richyen.com> writes:
> This might be moot with the coming release of Slony 2.0.0, but I was
> wondering if there are any thoughts about the following question:
>
> Do the finishTableAfterCopy() and ANALYZE of each table need to happen
> in serial with the data copy from stdin?  i.e., can we create a new
> thread that will do these two things while slon proceeds to copy the
> data of the next table?
>
> I raise this question because for large data sets, I think the
> copy_set process time could be improved by 30-40% if we can split
> these two stages.  I have some large tables that take 30 min or so to
> copy, then another 15-20 min to finishTableAfterCopy() and ANALYZE.
>
> Thought I'd throw this out to get some feedback, before I go and
> mangle code...any thoughts?

I don't think the point is moot; no, indeed, there is considerable
value to this idea.

Jan and I have been bouncing this one around for a while.  We took the
idea further in concept (if not the implementation!); the further
thought is to do this two steps cleverer than you describe...

  Step 1.  Allow as many extra connections as the administrator
  requests.

       Thus, we have a "number_of_finish_connections" parameter (which
       presumably has a better name than that), and throw the
       finishTableAfterCopy()/ANALYZE requests to a "connection pool."

       I'd expect this to have diminishing returns, and that the
       useful maximum would be around 4.

  Step 2.  Order the requests so as to maximize parallelism.

       Thus, we subscribe to tables in reverse order of their
       estimated size (pg_class.relpages should be a reasonable
       approximation).

       This means that we tend to push the bigger tables onto the
       "reindex queue" as early as possible in the subscription
       process.

Haven't had the Round Tuits to get to it; if you could provide the
beginnings of it, that would make it easier to find the (hopefully
fewer, if effort is shared!) hours of implementation effort.
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://linuxdatabases.info/info/internet.html
As of next Monday, MACLISP will no longer support list structure.
Please downgrade your programs.


More information about the Slony1-general mailing list