Brad Nicholson bnichols
Wed Feb 1 06:56:58 PST 2006
Michael Crozier wrote:

>Hi,
>
>I encountered some duplicate key errors in my slony cluster today.  Clearly, 
>an event/log was replicated more than once.
>
>I believe that this may be due to "the Solaris threading issue", but I can't 
>find enough clear information about this problem to determine whether I 
>failed to avoid it in the build of Postgresql and Slony.
>
>Detais:
> Solaris 9 sparc, 7.3.13, compiled with --thread-safety 
>  
>
Upgrade this version ASAP.  The following bug, fixed in 7.4.8, reared 
it's ugly head around here corrupting replica's with a duplicate key 
violation errors and making several DBA's very cranky...

http://www.postgresql.org/docs/7.4/static/release-7-4-8.html

"Repair ancient race condition that allowed a transaction to be seen as 
committed for some purposes (eg SELECT FOR UPDATE) slightly sooner than 
for other purposes This is an extremely serious bug since it could lead 
to apparent data inconsistencies being briefly visible to applications."


> Solaris 10 opteron, 8.0.6, compiled with --thread-safety
> All the slon's were running from the 8.0.6 instance/build.
>
>I was able to manually remove the offending rows and get the slon's processing 
>events again, but I'm worried about a few things:
>
>1. How is my data?  Do I need to re-sync?
>  
>
Where did you remove the offending rows from?  If you manually removed 
the row from sl_log_1, leaving the data on the master, but not the 
subscriber, then your replica is shot.

>2. How can I prove that this problem is related to threading issue?
>3. What IS the threading issue?  I can't find a good description of the
>    problem and the solution.
>4. If the problem still exists on the 8.0.X build, how do I correct it?
>
>  
>

Since moving to 7.4.8, we haven't had the replica corrupting scenario.  
We have seen the duplicate key violation error on occasion, but the 
problem was with a bad index on the table being replicated to.  
Reindexing that table on table on the subscriber fixed the problem.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.





More information about the Slony1-general mailing list