Thu Apr 3 07:45:49 PDT 2008
- Previous message: [Slony1-general] slon can't distinguish nodes?
- Next message: [Slony1-general] Data from slaves pushed back to master - Can it be done?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jacques Caron <jc at oxado.com> writes: > Hi Christopher, > > At 17:37 31/03/2008, Christopher Browne wrote: >>The introduction of these partial indices dates back to the following >>discussion thread on pgsql-hackers: >>http://archives.postgresql.org/pgsql-hackers/2006-06/msg01516.php >> >>At one point, we had this as a second index: >> create index sl_log_1_idx2 on @NAMESPACE at .sl_log_1 >> (log_xid @NAMESPACE at .xxid_ops); >> >>Unfortunately, it was apparently leading to problems in that data >>sourced from different origins might have xxid values of varying sign. >> >>So, in lieu of that, I introduced code that would generate a >>per-origin partial index, which would necessarily not suffer from the >>rollover problem that sl_log_1_idx2 would run into. >> >>It is quite likely that the partial indices will be preferred, as the >>first column in sl_log_1_idx1 doesn't discriminate much. > > OK, so now I understand why there are partial indexes, but these > partial indexes are redundant with quite a bit of the full index. The > next logical step would be to remove the full index, however it seems > there are some cases where it's still needed (i.e. queries with a > where log_origin = node for which there is no partial index), at least > some DELETEs (which my wild guess is that they actually always return > 0 rows?). The other reason to need the "full index" is that immediately after creating a new origin, there probably won't yet be a partial index to consult, because the partial indices are added in as lazily as possible, at the instant that the unused sl_log_n table is being truncated. There may be some time delay between establishing an origin node and there being an index specifically for it. > So, possible options: > - add partial indexes for all possible origins (i.e. all nodes?) > rather than just the ones that are actually origins? Those would most > probably be completely empty most of the time. Seems overkill. > - don't do any queries against nodes that are not really origins (but > I guess this can become a bit complex in switchover/failover > scenarios?) Well, the origin node ID is always stored in sl_log_n, so that you don't do any queries where "log_origin" isn't an origin. It's not an option to restrict queries to origin nodes - that prevents having cascaded cases, which are one of the major reasons for Slony-I to exist in the first place. > The whole idea being of course to save a bit of I/O by avoiding the > need to maintain duplicate indexes (always a good thing for > heavily-updated tables when there's a bit of backlog). > > Comments? If anyone can suggest better index coverage, I'd be happy to hear of it. I don't think we can "improve things" without worsening them in some sense. :-( The quote below was randomly chosen, but fairly serendipitous. The addition of the partial indexes was done with the following reasoning: - We had started with a single index on (log_origin, log_xid, log_actionseq) In cases where log_origin was fixed, this wound up being unnecessarily inefficient. - We then added a secondary index on (log_xid). Unfortunately, this got us in trouble, because there could be a mixture of nodes on different sides of the 32 bit transaction rollover, which would make the ordering seem a tad nondeterministic :-(. This was the point at which Tom Lane pointed out a risk of corruption of this sort of index. *CLEARLY* bad mojo, if Tom warns of such! - I then concluded that the alternative of creating per-origin partial indexes would provide something quite a bit better than the index that got ruled out. I don't think we can afford to drop out either of the two sorts of indices that we are using now, and whether it "costs too much" isn't the problem, because the *other* evaluation is of whether or not we can afford to lose what each of them provides. If they're "too expensive," then I have to point out that running a Seq Scan because a needful index is GONE is likely to be More Expensive Still. We don't want to commit any more computing sins than necessary, particularly if we can't be certain they will actually achieve efficiency! -- (reverse (concatenate 'string "gro.mca" "@" "enworbbc")) http://www3.sympatico.ca/cbbrowne/rdbms.html "More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity." -- W.A. Wulf
- Previous message: [Slony1-general] slon can't distinguish nodes?
- Next message: [Slony1-general] Data from slaves pushed back to master - Can it be done?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list