[Slony1-general] replication reindexing forever after initial copy

Tue May 5 11:19:56 PDT 2009

On Tue, 2009-05-05 at 00:03 -0400, Joseph S wrote:
> I set up a slony cluster to replicate from an 8.2.13 database to a 8.3.7 
> database (slony 1.2.15) . During the initial copy this has been going on 
> for over 3 hours:
> select "_prod".finishTableAfterCopy(29); analyze "public"."el";
> 
> gdb shows:
> 
> (gdb) bt
> #0 0x0000003c144c54f0 in __read_nocancel () from /lib64/libc.so.6
> #1 0x00000000005919c6 in FileRead (file=<value optimized out>, 
> buffer=0x1fd08bcc "\t",
> amount=<value optimized out>) at fd.c:1039
> #2 0x0000000000592223 in BufFileRead (file=0x1fd08ba0, ptr=0x212af860, 
> size=8192) at buffile.c:237
> #3 0x000000000065cca8 in ltsReadBlock (lts=0x21245a00, blocknum=149704, 
> buffer=0x212af860) at logtape.c:227
> #4 0x000000000065d5fe in LogicalTapeRead (lts=0x21245a00, tapenum=<value 
> optimized out>, ptr=0x2121ebe0, size=24)
> at logtape.c:796
> #5 0x000000000065f624 in readtup_index (state=0x1f905300, 
> stup=0x7fff7f710ef0, tapenum=52,
> len=<value optimized out>) at tuplesort.c:2797
> #6 0x000000000065e3ac in mergeprereadone (state=0x1f905300, srcTape=52) 
> at tuplesort.c:1936
> #7 0x000000000065e461 in mergepreread (state=0x97) at tuplesort.c:1904
> #8 0x00000000006608d8 in tuplesort_performsort (state=0x1f905300) at 
> tuplesort.c:1753
> #9 0x0000000000469bb0 in _bt_leafbuild (btspool=0x1fbb1be0, 
> btspool2=0x1fd08bcc) at nbtsort.c:203
> #10 0x0000000000466c36 in btbuild (fcinfo=<value optimized out>) at 
> nbtree.c:136
> #11 0x0000000000648827 in OidFunctionCall3 (functionId=<value optimized 
> out>, arg1=47722312327280,
> arg2=47722312268368, arg3=532345328) at fmgr.c:1586
> #12 0x0000000000492819 in index_build (heapRelation=0x2b6737796070, 
> indexRelation=0x2b6737787a50,
> indexInfo=0x1fbaf1f0, isprimary=0 '\0') at index.c:1366
> #13 0x000000000049330c in reindex_index (indexId=16904) at index.c:2284
> #14 0x000000000049356e in reindex_relation (relid=<value optimized out>, 
> toast_too=1 '\001') at index.c:2393
> #15 0x00000000004e2c02 in ReindexTable (relation=0x1fb15da0) at 
> indexcmds.c:1344
> #16 0x000000000052d1ad in _SPI_execute_plan (plan=0x7fff7f711890, 
> Values=0x0, Nulls=0x0, snapshot=0x0,
> crosscheck_snapshot=0x0, read_only=0 '\0', fire_triggers=1 '\001', 
> tcount=<value optimized out>) at spi.c:1700
> #17 0x000000000052d931 in SPI_execute (src=0x1fb6d260 "reindex table 
> \"public\".\"eventlog\"", read_only=0 '\0',
> tcount=0) at spi.c:342
> #18 0x00002b67377b0413 in exec_stmt_dynexecute (estate=0x7fff7f711ba0, 
> stmt=0x1faa3f90) at pl_exec.c:2688
> #19 0x00002b67377b1ac7 in exec_stmts (estate=0x7fff7f711ba0, 
> stmts=<value optimized out>) at pl_exec.c:1264
> #20 0x00002b67377b31d5 in exec_stmt_block (estate=0x7fff7f711ba0, 
> block=0x1faa40b0) at pl_exec.c:1114
> #21 0x00002b67377b3b4f in plpgsql_exec_function (func=0x1fa8efb0, 
> fcinfo=0x7fff7f711e00) at pl_exec.c:291
> #22 0x00002b67377aaaae in plpgsql_call_handler (fcinfo=0x7fff7f711e00) 
> at pl_handler.c:95
> #23 0x000000000051837e in ExecMakeFunctionResult (fcache=0x1f97a740, 
> econtext=0x1f97a630, isNull=0x1f97ad48 " �]",
> ---Type <return> to continue, or q <return> to quit---
> isDone=0x1f97adf0) at execQual.c:1351
> #24 0x00000000005168e3 in ExecProject (projInfo=<value optimized out>, 
> isDone=0x7fff7f7122ac) at execQual.c:4610
> #25 0x0000000000527203 in ExecResult (node=0x1f97a520) at nodeResult.c:155
> #26 0x0000000000516356 in ExecProcNode (node=0x1f97a520) at 
> execProcnode.c:319
> #27 0x00000000005155df in ExecutorRun (queryDesc=<value optimized out>, 
> direction=ForwardScanDirection, count=0)
> at execMain.c:1335
> #28 0x00000000005a68e1 in PortalRunSelect (portal=0x1f9032f0, 
> forward=<value optimized out>, count=0,
> dest=0x1f8c32f0) at pquery.c:943
> #29 0x00000000005a7ab9 in PortalRun (portal=0x1f9032f0, 
> count=9223372036854775807, isTopLevel=0 '\0',
> dest=0x1f8c32f0, altdest=0x1f8c32f0, completionTag=0x7fff7f7125b0 "") at 
> pquery.c:769
> #30 0x00000000005a38a4 in exec_simple_query (
> query_string=0x1f8c1fa0 "select \"_prod\".finishTableAfterCopy(29); 
> analyze \"public\".\"el\"; ")
> at postgres.c:1004
> #31 0x00000000005a44f8 in PostgresMain (argc=4, argv=<value optimized 
> out>, username=0x1f820a10 "postgres")
> at postgres.c:3631
> #32 0x000000000057b16a in ServerLoop () at postmaster.c:3207
> #33 0x000000000057bc2f in PostmasterMain (argc=2, argv=0x1f81bb90) at 
> postmaster.c:1029
> #34 0x00000000005353de in main (argc=2, argv=<value optimized out>) at 
> main.c:188
> (gdb) quit
> 
> 
> What is going on here? This reindexing has been going on for far longer 
> than the initial copy time. Any why is it reindexing? 
> finishTableAfterCopy() should be a noop.

I can think of a few reasons for this

Is something is accessing the table(s) and blocking the indexing?

Do you have a lot of indexes which simply take time to build than the
copy?

Is maintenance_work_mem is set to low?
-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.