What is this document?
This is only ONE way to keeps nodes in synchronization for a high
availability cluster. It suits my needs and may be a good
starting block for others so I'm sharing it. Your mileage
may vary.
We use this on a web server to update ~1GB worth of documents in sync every 10 minutes. Depending on how quickly your files change, you may be able to update more data, more frequently. I would not recommend using this without a private fast ethernet channel, i.e. an extra NIC in each node connected via a crossover cable.
To determine whether this method would be appropriate for you, you may wish to do this:
One last item: rsync with the --delete
option is dangerous!
Make sure your command options point to the proper destination! Test
this on non-critical data. And test it again.
What you'll need:
1. A distribution of SSH. PLEASE
- SSH is NOT freeware. Respect their license!
You can get it at: ftp://ftp.cs.hut.fi/pub/ssh
Mirrors and additional info can be found at: http://www.csua.berkeley.edu/ssh-howto.html
License information is available at: http://www.ssh.fi/sshprotocols2/licensing/ssh2_non-commercial_licensing.html
2. A copy of rsync
Found at http://rsync.samba.org/ftp/rsync/binaries/
-OR- http://rsync.samba.org/ftp/rsync/rsync-2.3.1.tar.gz
3. Cron and stuff to mirror.
In actuality, you don't even need SSH. You could use rsh instead, if your security needs permit it.
Installing SSH
START HERE for source distribution:
Untar your ssh distribution. You might want to read the INSTALL
file or HOWTO, but most can get by with the following:
./configure
make
make install
START HERE for rpm distribution after running
"rpm -ivh ssh_<version#>.rpm":
Once this is done, make sure sshd is started on bootup. This
could be done via inetd or by placing (if ssh is installed to the default
location) "/usr/local/sbin/sshd" in you /etc/rc.d/rc.local file.
Type this in now to start it.
Goto your ssh home directory: cd ~/.ssh
Type "ls" to see what's in there if you like.
Make sure that /usr/local/bin is in your path and type: "ssh-keygen" This will create your identity.pub file. You'll need to enter a passphrase.
Copy the file "/etc/ssh_host_key.pub" to "/etc/ssh_known_hosts" and open known_hosts for editing - it should look like this (but all one line!!!):
1024 35 1240189225834733795967<MANY NUMBERS CUT>354908432570619298213206066427 root@nodeA.domain.comIn front of the "1024", add whatever aliases you want this machine to be know as separated by commas and no spaces:
nodeA,nodeA.domain.com,localnetA,localnetA.domain.com 1024 35 12401892258367<MANY NUMBERS CUT>35490843266427 root@nodeA.domain.comRepeat all the above steps for your other cluster node.
Merge the two "/etc/ssh_known_hosts" files so that they are identical to the following:
nodeA,nodeA.domain.com,localnetA,localnetA.domain.com 1024 35 12401892258367<MANY NUMBERS CUT>35490843266427 root@nodeA.domain.com
nodeB,nodeB.domain.com,localnetB,localnetB.domain.com 1024 35 24354454234458<MANY NUMBERS CUT>82638761332764 root@nodeB.domain.comOn node A, type "cd ~/" and create a ".shosts" - For node A, if you want to synchronize over the "localnet" route, it should look like this:
localnetB username I use root. I do not believe this is a risk if you consider your cluster as one machine - any SSH experts out there?Create the corresponding "~/.shosts" on node B.
Similar to the "/etc/ssh_known_hosts" file, we need a "~/.ssh/authorized_keys" file. On node A, copy "~/.ssh/identity.pub" to "~/.ssh/authorized_keys".
Open "~/.ssh/authorized_keys" for editing and merge the contents of node B's "~/.ssh/authorized_keys" to get something like this:
1024 37 10847<MANY #s CUT>91475044341719 root@nodeA.domain.com 1024 37 12544<MANY #s CUT>79835130992747 root@nodeB.domain.comYou should be all set. Try it out by typing on node A: "ssh nodeB". You should be logged in to nodeB without having to type anything.
Installing Rsync
Well, I'm not really going to tell you how to do this. I just used the rpm. If that's not possible for you, I'm sure the good folks at samba will have a nice README.Install for you to follow. However, the binary link in the "What you'll need" section has binaries for just about all flavors. Here's the link for the rpm: http://rsync.samba.org/ftp/rsync/binaries/redhat
Determining your Rsync command
For our example, let's say you have a web server cluster. As a result, you need the directory tree "/html" to be current on both nodes. Assuming node A is the master, I would use the command from node A:
rsync --rsh=/usr/local/bin/ssh -naurvl --delete /html/ localnetB:/htmlLet's note a few things. First, since this will be used with cron, you want to be sure that you use the full path for the ssh executable (and the rsync executable for that matter). Second, you need to use the same alias here that you used in ".shosts". Otherwise, you'll need to log in - inconvenient for cron. Also, note the "/html/" syntax. This last "/" is necessary - otherwise you'll have the tree "/html/html" on your slave. Lastly, I used the "-n" option. This is for a dry run. You want to do this to make sure everything is copied/deleted as you would expect it. When you put this in your crontab file, you'll leave off the "-n" option. Similarly, the "-v" verbosity option is only for this test. In your crontab entry, it will be replaced with the "-q" option for quiet. Test the command now and make sure it does what you want - the "--delete" option can be dangerous!
Create your sync script and crontab entry
You now want to create the script which will run rsync every X minutes via cron. I say script and not command because you want to account for the startup case. Should your master fail and is just coming up, you don't want it nuking the slave's directories until the master takes over and receives any changes made in its absence. So, I use the following perl script, call it "sync.pl":
#!/usr/bin/perl
# Only sync if serving IP
# release rsync process
} else {
|
NOTE #2: You may not want a mail fired off. You may
just want a log entry. You might want both. If you're unfamiliar
with perl, to add a log entry,
substitute all the commands with "MAIL" in them above with the following:
open(LOG,">>/var/log/ha-log");
$dstr = `date +%Y/%m/%d_%T`; chomp $dstr; print LOG "$dstr RSYNC: Process starting before previous one completes!\n"; close LOG; |
SO, once you determine how often you'll be synchronizing, type "crontab -e" to modify your crontab entry. If you don't like vi, try using "setenv EDITOR /usr/local/bin/emacs" (or export) to select emacs or a different editor. If you want to synchronize every 10 minutes, your entry would look like this:
You could also redirect output to some logfile if you desire, but keep in mind how often it runs.*/10 * * * * /script_directory/sync.pl &> /dev/null
Recovering from Failover
We're just about there now. The last case you need to consider is when the master is coming back up after a failure. You want to update the master's files to include any changes made while the slave was handling the services. I do this via a services script in the /etc/ha.d/resources.d directory. Typically, these scripts will be the same on both nodes. For our purpose, however, we want the script to have different functions on the slave as opposed to the master. They need to have the same name, however, and I call it "mirror".
NOTE: You will want to be sure that any applications writing to the synch'ed disks stop before giving back services to the master. If these applications are controlled via the ipresources configuration file, you can ensure this by listing the "mirror" script last. However, if they are not, you will want to add an application exit to the beginning of your mirror script.
NOTE: As you may be realizing, since the disks are re-synchronized during the fail-back, your rsync time has an effect on the minimum downtime in transitions. If your master has been down a long time, you may want to bring it up without heartbeat and manually run the rsync commands.
Here is "/etc/ha.d/resources.d/mirror" for the slave node:
#!/bin/sh
# # See how we were called.
exit 0 |
Here's the master's mirror script:
#!/bin/sh # # Get updates from Slave when re-acquiring cluster IP # See how we were called. case "$1" in start) #Checks to see if failed over. If so, get update from slave. touch /var/lock/subsys/mirror if [ -f "/var/lock/subsys/mirrorslave" ]; then echo -n "Master start - Synchronizing from slave node: " /usr/bin/rsync --rsh=/usr/local/bin/ssh -aurlq --delete ha2:/html/ /html rm -f /var/lock/subsys/mirrorslave fi rm -f /var/lock/subsys/mirror ;; stop) #Want to update slave's data before exiting. if [ -f "/var/lock/subsys/mirror" ]; then exit fi echo -n "Master stop - Synchronizing data on slave node: " link /etc/ha.d/ha.cf /var/lock/subsys/mirror /usr/bin/rsync --rsh=/usr/local/bin/ssh -aurlq --delete /html/ ha2:/htmlrm -f /var/lock/subsys/mirror ;; status) ;; restart) $0 stop $0 start ;; *) echo "Usage: syslog {start|stop|status|restart}" exit 1 esac exit 0 |
Finally, you need to install the script in your ipresources on both nodes - I would shut down heartbeat before making changes. You'll want mirror to be the last service listed. For our webserver, it would read:
nodeA 192.168.85.1 httpd mirrorSo, the timeline for a recover from the slave is: "Slave releases IP, Slave stops httpd, Slave runs mirror stop, Master runs mirror start, Master runs http start, Master acquires IP". Bring your cluster back up and you should be in business!
Be Careful
Please test your setup on non-critical data first. There could
be a bad typo above or whatever. The "--delete" option can be dangerous.
You've been warned.
Rev 0.0.2
Rudy Pawul
rpawul@iso-ne.com