[Slony1-general] Slony seemingly only receiving sync events, not syncing data

Mon Mar 27 11:01:04 PST 2006

Kyle Hanson wrote:

>Hi again,
>
>With the risk of sounding uninformed about slony...
>
>Can I make the 'catchup' process much shorter by configuring slon process:
>   - to have a really high '-s' option so it's not checking/issuing for
>syncs as often giving the slave time to do work,
>  
>
That won't help terribly much at this point, but it won't hurt too
badly, either.  If you bumped it up to indicate 30 seconds, that would
mean you'd not be adding events at a rapid rate.

>   - to set a high -g option, any ideas how high might be good for my
>situation, I have it at 24 right now
>  
>
I'd guess that setting it to close to 100 (default max for 1.1) would be
as high as would be terribly useful.

The trouble, at this point, looks to be that you had replication down
for so long that it's taking a long, long time for the slon to run
through the events to figure out what work might need to be done.

>Like I say, I'm very new to slony, but these options kinda stuck out as
>potential helpers, please correct me so I can understand.
>
>Much appreciated,
>Kyle
>  
>

What you need to look for in the logs on the subscriber is for lines
like the following:

2006-03-27 18:55:01 GMT DEBUG2 remoteWorkerThread_36: SYNC 9914013
processing
2006-03-27 18:55:01 GMT DEBUG2 remoteWorkerThread_36: syncing set 1 with
61 table(s) from provider 36
2006-03-27 18:55:01 GMT DEBUG2  ssy_action_list value:  length: 0
2006-03-27 18:55:01 GMT DEBUG2 remoteHelperThread_36_36: 0.041 seconds
delay for first row
2006-03-27 18:55:01 GMT DEBUG2 remoteHelperThread_36_36: 0.070 seconds
until close cursor
2006-03-27 18:55:15 GMT DEBUG2 remoteWorkerThread_36: new sl_rowid_seq
value: 36000000000020165
2006-03-27 18:55:15 GMT DEBUG2 remoteWorkerThread_36: SYNC 9914013 done
in 14.695 seconds

That is only output if you have debugging set to level 2 or higher.  For
you, 2's almost certainly enough.

I'd watch the logs to see if it's getting one or more of these lines.

It might be that it's taking forever to process each SYNC.

If you have a HUGE number of outstanding SYNCs, we have found that it is
valuable to have an extra index on the table sl_log_1.  This would
affect the origin node, where you'll probably find that there are
hundreds of thousands (or millions?) of rows in sl_log_1.

In the schema script, it is defined thus...

-- Add in an additional index as sometimes log_origin isn't a useful
discriminant
create index sl_log_1_idx2 on @NAMESPACE at .sl_log_1
    (log_xid @NAMESPACE at .xxid_ops);

You can create it by hand; replace @NAMESPACE@ with the cluster
namespace that starts with "_".

Doing that will interrupt activity on the origin :-(.

It might, as I mentioned earlier, be more effective to drop replication
and start from scratch.