Synchronizing Disks for HA

What is this document?
This is only ONE way to keeps nodes in synchronization for a high availability cluster. It suits my needs and may be a good
starting block for others so I'm sharing it. Your mileage may vary.

We use this on a web server to update ~1GB worth of documents in sync every 10 minutes. Depending on how quickly your files change, you may be able to update more data, more frequently. I would not recommend using this without a private fast ethernet channel, i.e. an extra NIC in each node connected via a crossover cable.

To determine whether this method would be appropriate for you, you may wish to do this:

Set the minimum update rate you find acceptable (say, 3 minutes).
Conservatively approximate the amount of data you would expect to change in 3 minutes (200MB)
Assuming you can count on 6MB/sec from your fast ethernet, multiply this by 1/3 of your update time. Why? You need to account for rsync (relatively slow executing) to determine what needs updating and disk write time. In our example, this would give us 360MB worth of updates per cycle.

Of course, you can always do what I did - set it up and see if it updates fast enough. The scripts I've provided will notify you if it kicks off again before the previous one finished.......

One last item: rsync with the --delete option is dangerous! Make sure your command options point to the proper destination! Test this on non-critical data. And test it again.

What you'll need:

    1. A distribution of SSH. PLEASE - SSH is NOT freeware. Respect their license!
                You can get it at: ftp://ftp.cs.hut.fi/pub/ssh
                 Mirrors and additional info can be found at: http://www.csua.berkeley.edu/ssh-howto.html
                License information is available at: http://www.ssh.fi/sshprotocols2/licensing/ssh2_non-commercial_licensing.html
    2. A copy of rsync
                Found at http://rsync.samba.org/ftp/rsync/binaries/    -OR-   http://rsync.samba.org/ftp/rsync/rsync-2.3.1.tar.gz
   3.   Cron and stuff to mirror.

In actuality, you don't even need SSH. You could use rsh instead, if your security needs permit it.

Installing SSH

START HERE for source distribution:
Untar your ssh distribution. You might want to read the INSTALL file or HOWTO, but most can get by with the following:
           ./configure
     make
     make install

START HERE for rpm distribution after running "rpm -ivh ssh_<version#>.rpm":
Once this is done, make sure sshd is started on bootup. This could be done via inetd or by placing (if ssh is installed to the default location) "/usr/local/sbin/sshd" in you /etc/rc.d/rc.local file. Type this in now to start it.

Goto your ssh home directory: cd ~/.ssh

Type "ls" to see what's in there if you like.

Make sure that /usr/local/bin is in your path and type: "ssh-keygen" This will create your identity.pub file. You'll need to enter a passphrase.

Copy the file "/etc/ssh_host_key.pub" to "/etc/ssh_known_hosts" and open known_hosts for editing - it should look like this (but all one line!!!):

    1024 35 1240189225834733795967<MANY NUMBERS CUT>354908432570619298213206066427 root@nodeA.domain.com

In front of the "1024", add whatever aliases you want this machine to be know as separated by commas and no spaces:

    nodeA,nodeA.domain.com,localnetA,localnetA.domain.com 1024 35 12401892258367<MANY NUMBERS CUT>35490843266427 root@nodeA.domain.com

Repeat all the above steps for your other cluster node.

Merge the two "/etc/ssh_known_hosts" files so that they are identical to the following:

    nodeA,nodeA.domain.com,localnetA,localnetA.domain.com 1024 35 12401892258367<MANY NUMBERS CUT>35490843266427 root@nodeA.domain.com

    nodeB,nodeB.domain.com,localnetB,localnetB.domain.com 1024 35 24354454234458<MANY NUMBERS CUT>82638761332764 root@nodeB.domain.com

On node A, type "cd ~/" and create a ".shosts" - For node A, if you want to synchronize over the "localnet" route, it should look like this:

    localnetB    username            I use root.  I do not believe this is a risk if you consider your cluster as one machine - any SSH experts out there?

Create the corresponding "~/.shosts" on node B.

Similar to the "/etc/ssh_known_hosts" file, we need a "~/.ssh/authorized_keys" file. On node A, copy "~/.ssh/identity.pub" to "~/.ssh/authorized_keys".

Open "~/.ssh/authorized_keys" for editing and merge the contents of node B's "~/.ssh/authorized_keys" to get something like this:

    1024 37 10847<MANY #s CUT>91475044341719 root@nodeA.domain.com
    1024 37 12544<MANY #s CUT>79835130992747 root@nodeB.domain.com

You should be all set. Try it out by typing on node A: "ssh nodeB". You should be logged in to nodeB without having to type anything.

Installing Rsync

Well, I'm not really going to tell you how to do this. I just used the rpm. If that's not possible for you, I'm sure the good folks at samba will have a nice README.Install for you to follow. However, the binary link in the "What you'll need" section has binaries for just about all flavors. Here's the link for the rpm: http://rsync.samba.org/ftp/rsync/binaries/redhat

Determining your Rsync command

For our example, let's say you have a web server cluster. As a result, you need the directory tree "/html" to be current on both nodes. Assuming node A is the master, I would use the command from node A:

    rsync --rsh=/usr/local/bin/ssh -naurvl --delete /html/ localnetB:/html

Let's note a few things. First, since this will be used with cron, you want to be sure that you use the full path for the ssh executable (and the rsync executable for that matter). Second, you need to use the same alias here that you used in ".shosts". Otherwise, you'll need to log in - inconvenient for cron. Also, note the "/html/" syntax. This last "/" is necessary - otherwise you'll have the tree "/html/html" on your slave. Lastly, I used the "-n" option. This is for a dry run. You want to do this to make sure everything is copied/deleted as you would expect it. When you put this in your crontab file, you'll leave off the "-n" option. Similarly, the "-v" verbosity option is only for this test. In your crontab entry, it will be replaced with the "-q" option for quiet. Test the command now and make sure it does what you want - the "--delete" option can be dangerous!

Create your sync script and crontab entry

You now want to create the script which will run rsync every X minutes via cron. I say script and not command because you want to account for the startup case. Should your master fail and is just coming up, you don't want it nuking the slave's directories until the master takes over and receives any changes made in its absence. So, I use the following perl script, call it "sync.pl":

#!/usr/bin/perl

# Only sync if serving IP
open(X,"/sbin/ifconfig|");
while(<X>) {
        if (/eth0:0/) {
                #make sure previous mirror has completed...
                if (link("/etc/ha.d/ha.cf", "/var/lock/subsys/mirror")) {
                        system "/usr/bin/rsync --rsh=/usr/local/bin/ssh -aurlq --delete /html/ localnetB:/html";

# release rsync process
unlink "/var/lock/subsys/mirror";

                } else {
                        open(MAIL,"|mail username\@domain.com");
                        print MAIL "Next RSYNC process starting before previous has completed!\n";
                        close MAIL;
                        exit;
                }
        }
}

NOTE: This script makes the assumption that your cluster IPaddr service is on eth0 as "eth0:0". Your setup may be different. There are other ways of making sure the master has control (such as IPaddr <IP> status), but that was the way I wrote it and all my clusters will service the IP on eth0 anyway.

NOTE #2: You may not want a mail fired off. You may just want a log entry. You might want both. If you're unfamiliar with perl, to add a log entry,
substitute all the commands with "MAIL" in them above with the following:

open(LOG,">>/var/log/ha-log");
$dstr = `date +%Y/%m/%d_%T`;
chomp $dstr;
print LOG "$dstr RSYNC: Process starting before previous one completes!\n";
close LOG;

At this point you want to create your crontab entry. The "lock file" prevents more than one sync process from running at the same time, but you want to have a decent idea of how often you need to synchronize and how long it will take to synchronize. This brings me to explain the "localnet" gibberish above. In my cluster, I have 3 NICs - one "global"and two "local". The global is hooked into our company's network and services the node and cluster IP address. One local is for a UDP heartbeat via a crossover cable. The other also connects via a crossover cable and is used solely for rsync. It's 100MBps and for our use, I'm not even pushing it hard (yet?). Anyway, I'd recommend something similar - they're cheap. If you're pushing the bandwidth limit, rsync may not be appropriate for you.

SO, once you determine how often you'll be synchronizing, type "crontab -e" to modify your crontab entry. If you don't like vi, try using "setenv EDITOR /usr/local/bin/emacs" (or export) to select emacs or a different editor. If you want to synchronize every 10 minutes, your entry would look like this:

*/10 * * * * /script_directory/sync.pl &> /dev/null

You could also redirect output to some logfile if you desire, but keep in mind how often it runs.

Recovering from Failover

We're just about there now. The last case you need to consider is when the master is coming back up after a failure. You want to update the master's files to include any changes made while the slave was handling the services. I do this via a services script in the /etc/ha.d/resources.d directory. Typically, these scripts will be the same on both nodes. For our purpose, however, we want the script to have different functions on the slave as opposed to the master. They need to have the same name, however, and I call it "mirror".

NOTE: You will want to be sure that any applications writing to the synch'ed disks stop before giving back services to the master. If these applications are controlled via the ipresources configuration file, you can ensure this by listing the "mirror" script last. However, if they are not, you will want to add an application exit to the beginning of your mirror script.

NOTE: As you may be realizing, since the disks are re-synchronized during the fail-back, your rsync time has an effect on the minimum downtime in transitions. If your master has been down a long time, you may want to bring it up without heartbeat and manually run the rsync commands.

Here is "/etc/ha.d/resources.d/mirror" for the slave node:

#!/bin/sh
#

# See how we were called.
case "$1" in
start)
        #Nothing to do.
        echo -n "Slave mirror start: "
        ;;
stop)
        #Need to put lockfile on Master
        echo -n "Slave mirror stop: "
        /usr/local/bin/ssh ha1 touch /var/lock/subsys/mirrorslave
        ;;
status)
        echo "Hi There!"
        ;;
restart)
        $0 stop
        $0 start
        ;;
*)
        echo "Usage: syslog {start|stop|status|restart}"
        exit 1
esac

exit 0

When the master fails, the slave will have the latest possible files via cron and sync.pl, so there's nothing to do. However, when the master comes back up, we want to notify it that there are changes to its files that it must get from the slave. Hence, the lockfile. This helps us discern between failovers, which needs the sync and a normal reboot, for instance, which doesn't.

Here's the master's mirror script:

#!/bin/sh
#
# Get updates from Slave when re-acquiring cluster IP
 
 
# See how we were called.
case "$1" in
  start)
        #Checks to see if failed over.  If so, get update from slave.
        touch /var/lock/subsys/mirror
        if [ -f "/var/lock/subsys/mirrorslave" ]; then
                echo -n "Master start - Synchronizing from slave node: "
                /usr/bin/rsync --rsh=/usr/local/bin/ssh -aurlq --delete ha2:/html/ /html
                rm -f /var/lock/subsys/mirrorslave
        fi
        rm -f /var/lock/subsys/mirror
        ;;
  stop)
        #Want to update slave's data before exiting.
        if [ -f "/var/lock/subsys/mirror" ]; then
                exit
        fi
        echo -n "Master stop - Synchronizing data on slave node: "
        link /etc/ha.d/ha.cf /var/lock/subsys/mirror
        /usr/bin/rsync --rsh=/usr/local/bin/ssh -aurlq --delete /html/ ha2:/html

        rm -f /var/lock/subsys/mirror
        ;;
status)
        ;;
restart)
        $0 stop
        $0 start
        ;;
*)
        echo "Usage: syslog {start|stop|status|restart}"
        exit 1
esac

exit 0

On startup, the master determines whether the slave has new information for it. If so, it locks the cron process out and gets the updates. This is only a safeguard, though, as the cron job can't possibly run until this script has finished - the cluster IP is unavailable. When the master is shutdown, the last thing it does is update the information on the slave. This lock is necessary.

Finally, you need to install the script in your ipresources on both nodes - I would shut down heartbeat before making changes. You'll want mirror to be the last service listed. For our webserver, it would read:

    nodeA 192.168.85.1 httpd mirror

So, the timeline for a recover from the slave is: "Slave releases IP, Slave stops httpd, Slave runs mirror stop, Master runs mirror start, Master runs http start, Master acquires IP". Bring your cluster back up and you should be in business!

Be Careful

Please test your setup on non-critical data first. There could be a bad typo above or whatever. The "--delete" option can be dangerous. You've been warned.

Rev 0.0.2
Rudy Pawul
rpawul@iso-ne.com