Christopher Browne cbbrowne at ca.afilias.info
Mon Sep 17 12:41:22 PDT 2007
1...  The filenames contain the node number, which is
anti-functional...

 slony1_log_2_00000000000000000115.sql  slony1_log_3_00000000000000000185.sql

2...  There is a race condition where we can get the same file number
generated multiple times (with different node numbers).

slony1_log_1_00000000000000000003.sql slony1_log_2_00000000000000000003.sql
slony1_log_1_00000000000000000005.sql slony1_log_2_00000000000000000005.sql
slony1_log_1_00000000000000000009.sql slony1_log_3_00000000000000000009.sql
slony1_log_1_00000000000000000010.sql slony1_log_3_00000000000000000010.sql
slony1_log_1_00000000000000000011.sql slony1_log_3_00000000000000000011.sql
slony1_log_1_00000000000000000012.sql slony1_log_2_00000000000000000012.sql
slony1_log_1_00000000000000000015.sql slony1_log_2_00000000000000000015.sql
slony1_log_1_00000000000000000017.sql slony1_log_3_00000000000000000017.sql
slony1_log_2_00000000000000000018.sql slony1_log_3_00000000000000000018.sql
slony1_log_1_00000000000000000019.sql slony1_log_3_00000000000000000019.sql

3...  There appears to be some perhaps-deeper race condition...

cbbrowne at dba2:/tmp/slony-regress.d29690/archive_logs_2> grep "'950'" *sql                                                                 
slony1_log_1_00000000000000000064.sql:insert into "public"."table1" (id,data) values ('950','hRfO[?a4Yl at xmyKvFtGxG?<tUf^`VW@=6M_dIFGsV0jobhYB5E44xCO^Rpkofp9qJ7W8Hy3]P`_SS - Gross C format string: %d%05d%s%s%f%l%-72.52LG');
slony1_log_1_00000000000000000064.sql:insert into "public"."table4" (id,numcol,realcol,ptcol,pathcol,polycol,circcol,ipcol,maccol,bitcol,newcol,newint) values ('950','77.7000','7.77','(7,7)','((7,7),(7,7),(7,7),(7,7))','((7,7),(7,7),(7,7),(7,7))','<(7,7),7>','192.168.7.77','08:00:2d:07:07:07','011101110111','2007-09-17 17:34:37.940536+00',NULL);
slony1_log_1_00000000000000000067.sql:insert into "public"."table1" (id,data) values ('950','hRfO[?a4Yl at xmyKvFtGxG?<tUf^`VW@=6M_dIFGsV0jobhYB5E44xCO^Rpkofp9qJ7W8Hy3]P`_SS - Gross C format string: %d%05d%s%s%f%l%-72.52LG');
slony1_log_1_00000000000000000067.sql:insert into "public"."table4" (id,numcol,realcol,ptcol,pathcol,polycol,circcol,ipcol,maccol,bitcol,newcol,newint) values ('950','77.7000','7.77','(7,7)','((7,7),(7,7),(7,7),(7,7))','((7,7),(7,7),(7,7),(7,7))','<(7,7),7>','192.168.7.77','08:00:2d:07:07:07','011101110111','2007-09-17 17:34:37.940536+00',NULL);

Note that the two insert statements occur once in each file (#64 and #67).

What is "triple odd" (hence this fits nicely as item #3) is that there
was no problem on the subscriber node that was writing data locally; I
had 4 nodes, one being a log shipper, and *only* the log shipped node
broke like this.

4...  The log shipper falls over a bit too easily

The race condition of #2 also implies that it is possible that file #7
could get "commited" earlier than file #6.

Consider the case where file #6 is being generated by a pretty big
SYNC from node 1...  Node 3 then generates a SYNC (which does not
mandate replicating any data as it is not an origin node); that can
become file #7, which, since the event involves so little work, means
that it can get submitted to the log shipper before file #6.

Unfortunately, in that case, that makes slony_logshipper decide to
fall over :-(.

Evidently testing is a useful thing to do...  We can probably look
forward to 1.2.12 involving *more* change than was expected :-(.
-- 
(reverse (concatenate 'string "moc.enworbbc" "@" "enworbbc"))
http://cbbrowne.com/info/wp.html
Bushydo, the way of the shrub -- BONSAI!


More information about the Slony1-general mailing list