[Slony1-general] duplicate key error, thread safety problem?

Wed Feb 1 00:02:17 PST 2006

?hel kenal p?eval, T, 2006-01-31 kell 20:14, kirjutas Michael Crozier:
> Hi,
> 
> I encountered some duplicate key errors in my slony cluster today.  Clearly, 
> an event/log was replicated more than once.
> 
> I believe that this may be due to "the Solaris threading issue", but I can't 
> find enough clear information about this problem to determine whether I 
> failed to avoid it in the build of Postgresql and Slony.
> 
> Detais:
>  Solaris 9 sparc, 7.3.13, compiled with --thread-safety 
>  Solaris 10 opteron, 8.0.6, compiled with --thread-safety
>  All the slon's were running from the 8.0.6 instance/build.
>
> I was able to manually remove the offending rows and get the slon's processing 
> events again, but I'm worried about a few things:
> 
> 1. How is my data?  Do I need to re-sync?

Possible. Check your data :)

> 2. How can I prove that this problem is related to threading issue?

I don't think it is related to threading issue. 

If you have had more than 2G (_xxx_cluster_.sl_log_1.log_xid > 2G)
transactions executed during the replication, without reindexing
sl_log_1, then indexes on xxid starts misbehaving, resulting both in
duplicate key errors *and* some events not being replicated (i.e. data
loss). 

It should be (but is not) documented in BIG FRIENDLY LETTERS on title
page of slony docs .

> 3. What IS the threading issue?  I can't find a good description of the
>     problem and the solution.
> 4. If the problem still exists on the 8.0.X build, how do I correct it?

I've heard there are some plans to start alternating between sl_log_1
and sl_log_2 and truncating the unused tables in upcoming v2.0 of
slony. 

Until then the only alternative I know of is doing reindex on any
indexes using xxid_ops at least after every 1G transactions.

And NEVER use a setup where data from multiple masters goes through the
same node, as this greatly increases a potential of a situation where
there are xxids apart by more than 2G (due to different trx rates) in
which case btree indexes break. this 2G difference must not be present
at the same time, it just has to be so during the lifetime of the index.

This behaviour is especially nasty, because it is not detected in
testing (unless you are able to run tests for more than 2G tansactions,
which takes 23 days at 1000 trx/sec) and even when it activates after 2G
trx it starts eating your data quite slowly and undetectably at first -
you won't notice the data loss, only see an occasional duplicate key
error.

If you want to know a little more about the issue look for my recent
posts on this list.

----------------
Hannu