Brad Nicholson bnichols
Fri Feb 18 14:33:39 PST 2005
Fiel Cabral wrote:

>No, I put the WAIT FOR EVENT outside the TRY block.
>This is the script that I run to perform switchover.
>
>     1  cluster name = CLUSTERNAME;
>     2  node 1 admin conninfo = 'dbname=DATABASE host=NODE1 user=slony
>sslmode=require';
>     3  node 2 admin conninfo = 'dbname=DATABASE host=NODE2 user=slony
>sslmode=require';
>     4  echo 'SWITCHOVER BEGIN';
>     5  try {
>     6    echo 'TRY: lock set';
>     7    lock set (id = 1, origin = 1);
>     8  }
>     9  on success {
>    10    echo 'SUCCESS: lock set';
>    11  }
>    12  on error {
>    13    echo 'ERROR: lock set';
>    14    exit 10;
>    15  }
>    16  echo 'wait for event';
>    17  wait for event (origin = 1, confirmed = 2, timeout = 180);
>    18  try {
>    19    echo 'TRY: move set';
>    20    move set (id = 1, old origin = 1, new origin = 2);
>    21  }
>    22  on success {
>    23    echo 'SUCCESS: move set';
>    24  }
>    25  on error {
>    26    echo 'ERROR: move set';
>    27    unlock set (id = 1, origin = 1);
>    28    exit 11;
>    29  }
>    30  echo 'wait for event';
>    31  wait for event (origin = 1, confirmed = 2, timeout = 180);
>    32  echo 'SWITCHOVER END';
>
>While slonik is executing this script the following programs may or
>may not be connected to PostgreSQL:
>a. a C# GUI application (connected some of the time)
>b. Tomcat servlet container (connected most the time)
>c. another Java application (connected most of the time)
>d. pg_dump (connects very  infrequently, once a night)
>e. vacuumdb (connects once a night)
>
>
>I've been testing this by doing switchovers 30 times and it hasn't
>happened yet but slonik did hang the other day and the day before
>that. When slonik hangs, subsequent attempts to run it with this
>script cause it to output either "a MOVE SET operation is in progress"
>or "a LOCK SET operation is in progress" or "the set is already
>locked" (something similar).  While slonik was hung, I stopped Tomcat
>and then half a minute later slonik was able to continue to
>completion.
>  
>
Sounds like a locking issue.  Move set requires an exclusive lock on 
each table in the set.  If you have connections to the database other 
than slony, they are likely preventing slonik from getting the lock.  If 
at all possible, you really should try and do this during an outage when 
you can have all the clients disconnected.

Brad.


More information about the Slony1-general mailing list