Brad Nicholson bnichols
Wed Feb 1 11:37:29 PST 2006
Michael Crozier wrote:

>>>Details:
>>>Solaris 9 sparc, 7.3.13, compiled with --thread-safety
>>>      
>>>
>>Upgrade this version ASAP.  The following bug, fixed in 7.4.8, reared
>>it's ugly head around here corrupting replica's with a duplicate key
>>violation errors and making several DBA's very cranky...
>>
>>http://www.postgresql.org/docs/7.4/static/release-7-4-8.html
>>    
>>
>
>I believe that this issue was fixed in 7.3 around 7.3.10.  7.3.13 is the 
>latest in the 7.3 series.  Unless I'm looking at the wrong bug/fix?
>  
>

Checked the release notes, looks like you're right, that fix was backported.

>  
>
>>>I was able to manually remove the offending rows and get the slon's
>>>processing events again, but I'm worried about a few things:
>>>
>>>1. How is my data?  Do I need to re-sync?
>>>      
>>>
>>Where did you remove the offending rows from?  If you manually removed
>>the row from sl_log_1, leaving the data on the master, but not the
>>subscriber, then your replica is shot.
>>    
>>
>
>I removed the offending row in the slave table:
> begin;
>     update pg_class set reltriggers=0 where relname = 'the_table';
>     delete from the_table where primary_key = 'the_rows_pk_val';
>     update pg_class set reltriggers=1 where relname = 'the_table';
> commit;
>
>After this, slon successfully processing the event group.
>  
>

Hmmm.... As much as I dislike rebuilding nodes, I certainly wouldn't 
trust one after messing with it manually like that.  I'd be rebuilding it.

>>Since moving to 7.4.8, we haven't had the replica corrupting scenario.
>>We have seen the duplicate key violation error on occasion, but the
>>problem was with a bad index on the table being replicated to.
>>Reindexing that table on table on the subscriber fixed the problem.
>>    
>>
>
>I don't think that this is my issue, as I could see that the row was actually 
>present.  Additionally, it appears the other changes from the same event 
>group had also been previously processed, but I can't be positively sure.
>  
>
Yup, that's a different case than what we saw.  The row wasn't there, 
but the index thought it was.

-- 
Brad Nicholson  416-673-4106    bnichols at ca.afilias.info
Database Administrator, Afilias Canada Corp.





More information about the Slony1-general mailing list