Heartbeat-2 on Dom0
These instructions will help you setup heartbeat-2 to run on a sarge system. Although this guide uses the CRM to configure heartbeat-2, since no OCF scripts are used, many of the advanced heartbeat-2 features, like monitoring, are unavailable. Documentation for writing OCF resource scripts is mostly non-existent and right now, I don't have the time to experiment. In the future, I'd like to come back to this problem because monitoring would allow for the health of each domain to be known and appropriate action to be taken.
Install
Grab the latest Ultra Monkey heartbeat-2 backport for Debian Sarge.
wget [1]
Install it. If it complains about unsatisfied dependencies, that's okay.
dpkg -i heartbeat-2_2.0.2-4bpo1_i386.deb
Run aptitude and install heartbeat-2's unsatisfied dependencies.
Configure
There are three files to create in /etc/ha.d:
- authkeys -- network authentication
- ha.cf -- heartbeat configuration
- haresources -- resources being served
I'm not going into a full explanation of everything. Heartbeat is sophisticated software and you should take some time to read the docs at Linux-HA.
# /etc/ha.d/authkeys # auth 1 1 sha1 SecretPassphraseStuff
# /etc/ha.d/ha.cf # logfacility daemon # Log to syslog as facility "daemon" crm yes # use the new Cluster Resource Manager keepalive 1 # Send one heartbeat each second warntime 3 # How long before issuing "late heartbeat" warning? deadtime 10 # Declare nodes dead after 10 seconds initdead 90 # Dead time for when things first come up bcast eth0 eth1 # Broadcast heartbeats on eth0 and eth1 interfaces auto_failback no # Don't fail back to paul automatically node xena xeno # List our cluster members #ping 1.2.3.254 # Ping our router to monitor ethernet connectivity #respawn hacluster /usr/lib/heartbeat/ipfail # Failover on network failures
We are running heartbeat-2 because we want to use some of the new resource monitoring features. This means /etc/ha.d/haresources isn't used by heartbeat to configure resources. Instead the new cluster resource manager (CRM) uses some difficult to read/write XML and our resource scripts have to conform to the OCF specifications (which are poorly documented). Heartbeat-2 ships with a script to convert the haresources file for use by the CRM. However, the configuration file it generates uses the old style heartbeat resource scripts and so don't support the monitor operation. It's my hope going through this process brings me at least a little closer to the heartbeat-2 system i envision.
# /etc/ha.d/haresources # # !! NOTE: This file is not used directly by heartbeat-2. It must be # converted to XML for the CRM. Use the command below to # install the required XML file. # # # python /usr/lib/heartbeat/cts/haresources2cib.py > /var/lib/heartbeat/crm/cib.xml # node1 drbddisk::test xendomains::drbdtest
Create a resource.d script for heartbeat to control your Xen domains. Below is the script I'm using. It is a modified version of the one found in Setup guide: Active/Passive Redundancy using Xen, DRBD and Heartbeat posted to the Xen-users list. Since this is not an OCF script, it does not support the monitor function. I've created it as a starting point in getting the heartbeat system running.
#! /bin/bash
#
# /etc/ha.d/resource.d/xendomains
#
# heartbeat resource script to control Xen domains
#
PATH='/usr/local/sbin:/bin:/usr/sbin:/usr/bin'
RES="$1"
CMD="$2"
case "$CMD" in
start)
xm create $RES
;;
stop)
exec xm destroy $RES
;;
status)
xm list | awk '{print $1}' | grep $RES > /dev/null
if [ $? -eq 0 ] ; then
echo 'running'
else
echo 'stopped'
fi
;;
*)
echo "Usage: xendomain [domain] {start|stop|status}"
exit 1
;;
esac
exit 0
Test
Make sure you've place all the configuration files and the xendomains script onto all the heartbeat nodes.
To test the resource scripts you can run the following commands checking the status of cat /proc/drbd and xm list.
/etc/ha.d/resource.d/drbddisk testdisk start cat /proc/drbd /etc/ha.d/resource.d/xendomains testdom start xm list /etc/ha.d/resource.d/xendomains testdom stop xm list /etc/ha.d/resource.d/drbddisk r0 stop cat /proc/drbd
If everything looks good, start heartbeat on both machines. See if the primary node has started up the VM and then restart heartbeat on that machine so the resources transition over to the other machine. See if the other machine has the VM running.