Michael Crozier crozierm
Wed Feb 1 11:22:12 PST 2006
> > 1. How is my data?  Do I need to re-sync?
>
> Possible. Check your data :)

Only a few hundred million rows...  I better get started :-)

> > 2. How can I prove that this problem is related to threading issue?
>
> I don't think it is related to threading issue.
>
> If you have had more than 2G (_xxx_cluster_.sl_log_1.log_xid > 2G)
> transactions executed during the replication, without reindexing
> sl_log_1, then indexes on xxid starts misbehaving, resulting both in
> duplicate key errors *and* some events not being replicated (i.e. data
> loss).

This could be it.  The problem has occurred three times, all after adding a 
new table which took some time to COPY and create indexes, but there were no 
pending events when the COPY started and it caught up quickly after the 
addition/merge was complete.

I very much doubt we've done 2G transactions yet, as this is a new cluster 
with only master->slave replication.  I would estimated ~40 million 
transactions.


> If you want to know a little more about the issue look for my recent
> posts on this list.

I will read this and continue to investigate.

Thanks,

 Michael




More information about the Slony1-general mailing list