jdmail双机热备配置(3)
时间:2010-06-09 来源:libo20100322
4.2.1 热备软件
实施使用的群集配置如图 2 所示。该设置包括一对构成群集服务器(ha1 和 ha2),两者都可以访问包含多个物理磁盘的磁盘盒;服务器处于冷备份模式。应用程序数据需要位于两个节点都可访问的共享设备上。该设备可以是一个共享磁盘,或者网络文件系统。为了防止数据被破坏,设备本身应该被镜像或者具有数据保护。这种配置经常被称作共享磁盘群集,不过,实际上,这是一个什么都不共享的体系结构,因为在同一时刻任何磁盘都只能被一个节点访问。
图 2. 产品环境中的 heartbeat 群集配置
在测试设置中,我使用的共享磁盘机制是 NFS,如图 3所示,不过,建议使用如图2 所示的选项,尤其是在产品环境中时。两个系统的串口之间的直连线缆用来在两个节点间传输 heartbeat。
图 3. 使用 NFS 作为共享文件系统的 heartbeat 群集配置
为了适应Red Hat 9,需要安装的是heartbeat for Red Hat 9的rpm包heartbeat-1.0.4。主要包括三个组件:
heartbeat-pils-1.0.4-2.rh.9.i386.rpm
heartbeat-stonith-1.0.4-2.rh.9.i386.rpm
heartbeat-1.0.4-2.rh.9.i386.rpm
在安装时可能会遇到依赖性的错误,可以用--nodeps参数进行安装。总之把它提示所需要的rpm包(在dependancies目录下)全部装上。在dependancies目录下其它一些其它rpm包,如果不用相关功能可以不装。
4.2.2 Heartbeat的安装
用root身份登陆进入REDHAT LINUX的命令行下,输入
#rpm -ivh heartbeat-pils-1.0.4-2.rh.9.i386.rpm #rpm -ivh heartbeat-stonith-1.0.4-2.rh.9.i386.rpm #rpm -ivh heartbeat-1.0.4-2.rh.9.i386.rpm
|
注意安装有先后顺序。
4.2.3 配置 Heartbeat
需要配置的有三个文件:ha.cf haresources(在每个节点必须相同) Authkeys,应该将它们放置在/etc/ha.d目录下。范例配置在/usr/shared/doc/heartbeat-1.0.4目录下,你可以修改后拷贝到/etc/ha.d目录下使用。
4.2.3.1. 配置/etc/ha.d/ha.cf
4.2.3.1. 配置ha.cf
|
这个配置文件告诉heartbeat 使用的是什么介质和如何配置它们。ha.cf 包含你将到的所有的选项,内容如下:
serial /dev/ttyS0
使用串口heartbeat - 如果你不使用串口heartbeat, 你必须选择其它的介质,比如以太网bcast (ethernet) heartbeat。如果你使用其它串口heartbeat,修改/dev/ttyS0 为其它的串口设备。
watchdog /dev/watchdog
可选项:watchdog功能提供了一种方法能让系统在出现故障无法提供"heartbeat"时,仍然具有最小的功能,能在出现故障1分钟后重启该机器。这个功能可以帮助服务器在确实停止心跳后能够重新恢复心跳。如果你想使用该特性,你必须在内核中装入"softdog" 内核模块用来生成实际的设备文件。想要达到这个目的, 首先输入 "insmod softdog" 加载模块。然后,输入"grep misc /proc/devices" 注意显示的数字 (should be 10).然后, 输入"cat /proc/misc | grep watchdog" 注意输出显示出的数字(should be 130)。现在你可以生成设备文件使用如下命令:"mknod /dev/watchdog c 10 130" 。
bcast eth1
指定使用的广播heartbeat 的网络接口eth1(修改为eth0, eth2, 或你所使用的接口)
keepalive 2
设置心跳间隔时间为2两秒。
warntime 10
在日志中发出最后心跳"late heartbeat" 前的警告时间设定。
deadtime 30
在30秒后明确该节点的死亡。
initdead 120
在一些配置中,节点重启后需要花一些时间启动网络。这个时间与"deadtime"不同,要单独对待。至少是标准死亡时间的两倍。
hopfudge 1
可选项: 用于环状拓扑结构,在集群中总共跳跃节点的数量。
baud 19200
串口波特率的设定(bps).
udpport 694
bcast和ucast通讯使用的端口号694 。这是缺省值,官方IANA 使用标准端口号。
nice_failback on
可选项:对那些熟悉Tru64 Unix, 心跳活动就像是"favored member"模式。主节点获取所有资源直到它宕机,同时备份节点启用。一旦主节点重新开始工作, 它将从备份节点重新获取所有资源。这个选项用来防止主节点失效后重新又获得集群资源。
node linuxha1.linux-ha.org
强制选项:通过`uname -n`命令显示出的集群中的机器名。
node linuxha2.linux-ha.org
强制选项:通过`uname -n`命令显示出的集群中的机器名。
respawnuseridcmd
可选项:列出可以被spawned 和监控的命令。例如:To spawn ccm 后台进程,可以增加如下内容:
respawn hacluster /usr/lib/heartbeat/ccm
通知heartbeat 重新以可信任userid身份运行(在我们的例子中是hacluster) 同时监视该进程的"健康"状况,如果进程死掉,重启它。例如ipfail, 内容如下:
respawn hacluster /usr/lib/heartbeat/ipfail
NOTE: 如果进程以退出代码100死掉, 这个进程将不会respawned。
pingping1.linux-ha.orgping2.linux-ha.org ....
可选项:指定ping 的节点。 这些节点不是集群中的节点。它们用来检测网络的连接性,以便运行一些像ipfail的模块。
4.2.3.2. 配置 haresources
一旦你配置好了ha.cf文件,下面就需要设置haresources文件,这个文件指定集群所提供的服务以及谁是缺省的主节点。注意,该配置文件在所有节点应该是相同的。
# # This is a list of resources that move from machine to machine as # nodes go down and come up in the cluster. Do not include # "administrative" or fixed IP addresses in this file. # # <VERY IMPORTANT NOTE> # The haresources files MUST BE IDENTICAL on all nodes of the cluster. # # The node names listed in front of the resource group information # is the name of the preferred node to run the service. It is # not necessarily the name of the current machine. If you are running # nice_failback OFF then these services will be started # up on the preferred nodes - any time they're up. # # If you are running with nice_failback ON, then the node information # will be used in the case of a simultaneous start-up. # # BUT FOR ALL OF THESE CASES, the haresources files MUST BE IDENTICAL. # If your files are different then almost certainly something # won't work right. # </VERY IMPORTANT NOTE> # # # We refer to this file when we're coming up, and when a machine is being # taken over after going down. # # You need to make this right for your installation, then install it in # /etc/ha.d # # Each logical line in the file constitutes a "resource group". # A resource group is a list of resources which move together from # one node to another - in the order listed. It is assumed that there # is no relationship between different resource groups. These # resource in a resource group are started left-to-right, and stopped # right-to-left. Long lists of resources can be continued from line # to line by ending the lines with backslashes ("\"). # # These resources in this file are either IP addresses, or the name # of scripts to run to "start" or "stop" the given resource. # # The format is like this: # #node-name resource1 resource2 ... resourceN # # # If the resource name contains an :: in the middle of it, the # part after the :: is passed to the resource script as an argument. # Multiple arguments are separated by the :: delimeter # # In the case of IP addresses, the resource script name IPaddr is # implied. # # For example, the IP address 135.9.8.7 could also be represented # as IPaddr::135.9.8.7 # # The given IP address is directed to an interface which has a route # to the given address. This means you have to have a net route # set up outside of the High-Availability structure. We don't set it # up here -- we key off of it. # # The broadcast address for the IP alias that is created to support # an IP address defaults to the highest address on the subnet. # # The netmask for the IP alias that is created defaults to the same # netmask as the route that it selected in in the step above. # # The base interface for the IPalias that is created defaults to the # same netmask as the route that it selected in in the step above. # # If you want to specify that this IP address is to be brought up # on a subnet with a netmask of 255.255.255.0, you would specify # this as IPaddr::135.9.8.7/24 . # # If you wished to tell it that the broadcast address for this subnet # was 135.9.8.210, then you would specify that this way: # IPaddr::135.9.8.7/24/135.9.8.210 # # If you wished to tell it that the interface to add the address to # is eth0, then you would need to specify it this way: # IPaddr::135.9.8.7/24/eth0 # # And this way to specify both the broadcast address and the # interface: # IPaddr::135.9.8.7/24/eth0/135.9.8.210 # # The IP addresses you list in this file are called "service" addresses, # since they're they're the publicly advertised addresses that clients # use to get at highly available services. # # For a hot/standby (non load-sharing) 2-node system with only # a single service address, # you will probably only put one system name and one IP address in here. # The name you give the address to is the name of the default "hot" # system. # # Where the nodename is the name of the node which "normally" owns the # resource. If this machine is up, it will always have the resource # it is shown as owning. # # The string you put in for nodename must match the uname -n name # of your machine. Depending on how you have it administered, it could # be a short name or a FQDN. # #------------------------------------------------------------------- # # Simple case: One service address, default subnet and netmask # No servers that go up and down with the IP address # #just.linux-ha.org 135.9.216.110 # #------------------------------------------------------------------- # # Assuming the adminstrative addresses are on the same subnet... # A little more complex case: One service address, default subnet # and netmask, and you want to start and stop http when you get # the IP address... # #just.linux-ha.org 135.9.216.110 http #------------------------------------------------------------------- # # A little more complex case: Three service addresses, default subnet # and netmask, and you want to start and stop http when you get # the IP address... # #just.linux-ha.org 135.9.216.110 135.9.215.111 135.9.216.112 httpd #------------------------------------------------------------------- # # One service address, with the subnet, interface and bcast addr # explicitly defined. # #just.linux-ha.org 135.9.216.3/28/eth0/135.9.216.12 httpd # #------------------------------------------------------------------- # # An example where a shared filesystem is to be used. # Note that multiple aguments are passed to this script using # the delimiter '::' to separate each argument. # #node1 10.0.0.170 Filesystem::/dev/sda1::/data1::ext2 # # Regarding the node-names in this file: # # They must match the names of the nodes listed in ha.cf, which in turn # must match the `uname -n` of some node in the cluster. So they aren't # virtual in any sense of the word. #
ha-a 192.168.2.50 jdmail |
Heartbeat 会在下面的路径搜索同名的启动脚本:
/etc/ha.d/resource.d
/etc/rc.d/init.d
这里的服务脚本的使用是符合Init标准语法,所以可以在这里通过Heartbeat方便地运行、停止/etc/rc.d/init.d下标准的服务后台进程。
针对本次项目,需要在/etc/ha.d/resource.d/下增加金笛邮件的启动脚本jdmail:
/jdmail/startjd.sh &
/jdmail/web/bin/startup.sh &