<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Thanks Jan for your reply.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


Let me make sure that I understand the issue correctly. You say that your master database has a corrupted index, which allowed duplicate keys to be inserted</blockquote><div><br></div><div>Correct</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

. This means that without removing those duplicate rows, you could not REINDEX the table on the master either. Replication fails because you do not have the same index corruption on the replica(s). Correct me if I misunderstood.<br>


<br></blockquote><div><br></div><div>Not exactly. Sorry, I didn&#39;t described it well. My point is not about fixing the corrupted index on the master node which anyways required to continue syncing. But I want slony not to repeat the same thing in its catalogs. Due to corruption one of the relation has accepted a duplicate row, however being healthy the sl_log_* has accepted the duplicate row thats where am worried and it has crashed the replication. Later removing duplicate entry, slony smartly started without any hiccups thats a wonderful area am impressed. Let me put into stages from crash to fix.</div>

<div><br></div><div>1. Assume, at this stage Slony replication going fine between master/slave.</div><div>2. Now, on master node index got corrupted on one of the replicating table. (Due to some reason).</div><div>3. Master node allowed duplicate record due to corruption.</div>

<div>4. Same record copied in sl_log_* ( which mean two records of same value)</div><div>5. On Slave, event fails to apply and SYNC got aborted. (No corrupted pk index on slave, so it won&#39;t allow duplicates here).</div>

<div><br></div><div>This is what happened in my case. When I check master/slave db logs for the time of sync abort there are only DML entries. But the slony_log was just showing duplicate key violation error and then aborted. By digging more, I found the details in sl_log_* for two entries.</div>

<div><br></div><div>How I fixed is ?</div><div><br></div><div>6. Stop slon on master/slave</div><div>7. Remove duplicate row from master node (Remember slave node will not have duplicate since the PK index is fine and won&#39;t allow)</div>

<div>8. Remove duplicate row from sl_log_*</div><div>9. REINDEX table on master.</div><div>10. Start slon. </div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<div></div>

Slony log tables are capturing INSERT, UPDATE, DELETE and TRUNCATE operations. Except for the TRUNCATE, all those operations are captured on the row level. These insert operations that led to duplicate keys did happen, even though they should not have. But Slony has A) no way of detecting that and B) it isn&#39;t Slony&#39;s duty to second guess the integrity of the master database.<br>


<br></blockquote><div><br></div><div>I completely agree. If am not expecting too much here, how about adding this check in slony to handle such scenarios too ?.</div><div><br></div><div>On same corrupted table, I implemented a stupid trick, just to avoid duplicates in sl_log_* and it worked. Check this out, </div>

<div><br></div><div><div>postgres=# create unique index isl_log_1 on _rf.sl_log_1(log_cmdargs);</div><div>CREATE INDEX</div><div>postgres=# create unique index isl_log_2 on _rf.sl_log_2(log_cmdargs);</div><div>CREATE INDEX</div>

<div>postgres=# insert into dtest values (1,&#39;D&#39;);</div><div>INSERT 0 1</div><div>postgres=# insert into dtest values (1,&#39;D&#39;);</div><div>ERROR:  duplicate key value violates unique constraint &quot;isl_log_1&quot;</div>

<div>DETAIL:  Key (log_cmdargs)=({id,1,name,&quot;D         &quot;}) already exists.</div><div>CONTEXT:  SQL statement &quot;INSERT INTO _rf.sl_log_1 (log_origin, log_txid, log_tableid, log_actionseq, log_tablenspname, log_tablerelname,  log_cmdtype, log_cmdupdncols, log_cmdargs) VALUES (1, &quot;pg_catalog&quot;.txid_current(), $1, nextval(&#39;_rf.sl_action_seq&#39;), $2, $3, $4, $5, $6); &quot;</div>

<div>postgres=# select * from dtest ;</div><div> id |    name</div><div>----+------------</div><div>  1 | D</div><div>(1 row)</div><div><br></div><div>postgres=# select * from _rf.sl_log_1 ;</div><div> log_origin | log_txid | log_tableid | log_actionseq | log_tablenspname | log_tablerelname | log_cm</div>

<div>------------+----------+-------------+---------------+------------------+------------------+-------</div><div>          1 |    36882 |           2 |         15007 | public           | dtest            | I</div><div>(1 row)</div>

</div><div><br></div><div>Not similar, but something control like this will help slony continue even in case of corrupted index and it warns the user at database level itself. </div><div><br></div><div>--Raghav</div><div>

<br></div></div>

</div></div>