Thu May 27 12:14:23 PDT 2010
- Previous message: [Slony1-general] Table not replicating, but no errors reported by slony.
- Next message: [Slony1-general] Table not replicating, but no errors reported by slony.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jaime Casanova wrote: > On Wed, May 26, 2010 at 4:13 PM, "Stéphane A. Schildknecht" > <stephane.schildknecht at postgresql.fr> wrote: > >> Could you check that this table is in sl_tables, and in which set it is ? >> >> Maybe this set isn't subscribed. >> >> > > Those all look fine, the table exists in the replication set on both machines, and the tab_relname matches the correct table in pg_catalog.pg_class. Just a little status update to the problem. The master and slave databases still do not match, as they are missing a small chunk of data. Yet replication is still taking place, any new data inserted into the master ends up on the slave. After doing more looking at it, all the data that is missing off the slave were added to the master in a certain window of time. We're looking into what happened during that period of time via logs and whatnot. Our daemons are started with the -a command, and I have a copy of every archive log from the slony slave since the point of adding that table to replication until now. I got a list of every single ID of the rows that are missing from the slony slave, and wrote up a little script to search for each of those rows ID's in each of the slony archive logs. None of them were present. So I think we can conclude that the data was not deleted from the slave underneath slony by a user, but rather it was never replicated to the slave in the first place. One thing is that we had a daemon that would attempt to start the slon daemons once every minute if they are not running already. Due to a bug, it ended up starting a new set of daemons once every minute. This was happening before, during, and after the chunk of data that is missing was generated. Each minute it generated an error message saying "duplicate key value violates unique constraint "sl_nodelock-pkey"", which points to the daemon realizing there is already a daemon running, and then exit. No other errors pertaining to the replicated table in question were present in the postgres logs at this time. At this point we will probably be removing the table from replication, then adding it again and let it sync up. A question: I'm still a little unfamiliar with a couple aspects of slony, but from my understanding (correct me if I'm wrong), when adding a table to replication, slonik modifies the table so that whenever a insert, delete, update happens, it creates a trigger that alerts slony of the existence of data that needs to be sent to the slave nodes. I guess my question is, is there a way to insert data into the table and cause that trigger effect to not be executed? And if it is possible, could that cause the situation of "missing data" that slony itself doesn't even know about (since it's reporting everything is in sync). If this is possible, then I may have an situation where a user is inserting data in an odd way that makes the inserted data not able to be replicated Thanks, Brian Fehrle > or maybe the table has the wrong tab_reloid in the slave. you can > probe that with this simple query (the same for sequences): > select * from _cluster_name.sl_table where tab_reloid <> (tab_nspname > || '.' || tab_relname)::regclass; > >
- Previous message: [Slony1-general] Table not replicating, but no errors reported by slony.
- Next message: [Slony1-general] Table not replicating, but no errors reported by slony.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list