[Slony1-general] Slony replication problem

Thu Mar 18 22:19:58 PDT 2010

Hi Brad,

Thanks for the quick reply. Please see inline.

On Thu, Mar 18, 2010 at 5:58 PM, Brad Nicholson
<bnichols at ca.afilias.info> wrote:
> Venkatraju wrote:
>>
>> Hi,
>>
>> I have a two node Slony cluster running version 1.2.14 on PostgreSQL
>> 8.1.11 on CentOS 5. I noticed that the subscriber had not been
>> replicating for almost 6
>> hours now (from sl_status). Slony logs on publisher and subscriber
>> contain the following messages:
>>
>> NOTICE:  Slony-I: log switch to sl_log_2 still in progress - sl_log_1
>> not truncated
>> WARN   cleanupThread: "select "_cluster".logswitch_weekly(); " -
>> ERROR: Previous logswitch still in progress
>> CONTEXT:  SQL statement "SELECT  "_cluster".logswitch_start()"
>>
>> Both sl_log_1 and sl_log_2 tables contain a large number of rows
>> (~500,000 rows in sl_log_1 and ~8 million in sl_log_2). Some
>> questions:
>>
>> 1) We found some long running transactions active on the publisher.
>> Could this cause logswitch to fail? The long running transactions were
>> against a database that is not used in replication - could those
>> queries hold up the logswitch?
>>
>>
>
> The long running transaction could block the log switch.  The truncate from
> of sl_log_1 and switch to sl_log_2 is waiting.
> Check pg_locks for ungranted locks - I imagine that will find the backend
> that's doing the log switch is probably waiting on a lock.

I did not find any entries in pg_locks with granted = false. Is there
something else I should be looking for? Also, would a query running on
database2 affect Slony replicating database1? Trying to understand
this better to see if I should look for some other root cause for
logswitch failure (other than the long running queries). I know that a
COPY_SET is blocked by transactions on other databases; does the same
hold for cleanup and/or logswitch as well?

> I would kill the long running transactions.  Slony does not function well
> with systems have long running transactions against the provider.
>
> Run a vacuum on pg_listener after killing them as well.
>>
>> 2) Slony maintenance page
>> (http://www.slony.info/documentation/maintenance.html) says:
>>
>> "That means that on a regular basis, these tables are completely
>> cleared out, so that you will not suffer from them having grown to
>> some significant size, due to heavy load, after which they are
>> incapable of shrinking back down."
>>
>> "incapable of shrinking back down"? Does this mean there is no way to
>> recover from this state without rebuilding the DBs?
>>
>
> You don't have to worry about that for the sl_log_1/2 tables.  Truncate gets
> rid of the dead space altogether.  Other tables in your system however may
> be bloated though.
>
> You don't need to rebuild your database to recover this, but you do need to
> run some disruptive operations - Vaccum full+reindex (really slow), cluster
> (not MVCC safe in 8.1, be 100% sure there are no old transactions accessing
> your tables before you start if you don't want to lose data), or a no-op
> alter table.  All of these operations will block access to your table while
> running.
>
> Depending on the size of your DB and how bad it is bloated - reloading may
> be quicker.
>
> --
> Brad Nicholson  416-673-4106
> Database Administrator, Afilias Canada Corp.
>
>