实验环境:RHEL 5.5 64bit
实验需求:VM虚拟机、heartbeat安装包
实验目的:实现两台samba服务器之间的自动切换,以及磁盘的共享存储,达到简单故障转移的目的。
实验规划:
HOSTA:
hostname:sev1.example.com sev1 eth0:192.168.138.10 eth1:192.168.1.10 (心跳端口) GW:192.168.138.2 主节点
HOSTB:
hostname:sev2.example.com sev2 eth0:192.168.138.20 eth1:192.168.1.20 (心跳端口) GW:192.168.138.2 备用节点
实验步骤:
1、打开VMware虚拟机,首先安装2台虚拟主机,均使用RHEL 5.564bit操作系统。在安装操作系统的时候注意把samba服务安装好。(如果等系统安装好之后再装samba的话,依赖关系很杂,使用rpm安装不太方便!)
2、在HOSTA虚拟主机下修改虚拟配置,手动添加一个磁盘做共享,暂时命名为share,这里为了实现2台机器能自动挂载共享存储,需修改该磁盘的参数。在VM的根目录下找的新建的共享磁盘,修改share.vmx文件,添加如下几行参数:
disk.locking = "FALSE"
diskLib.dataCacheMaxSize=0
diskLib.dataCacheMaxReadAheadSize=0
diskLib.dataCacheMinReadAheadSize=0diskLib.dataCachePageSize=4096 diskLib.maxUnsyncedWrites=0scsi0:1.sharedBus = "virtual"(scsi是虚拟设备节点,根据实际情况修改即可)
scsi0:1.shared = "true"3、启动HOSTA,用root身份登录(方便以后操作),打开终端,使用fdisk-l命令查看磁盘,接着格式化该磁盘,这里我是想使用整个磁盘,所以就不分区,直接格式化成ext3格式,具体命令如下:
fdisk -l 查询该磁盘“盘符” /dev/sdb
fdisk /dev/sdb m(这里可以用不同的参数分区,就不多说了,自己百度) 重启之
终端输入 mkdir -p /home/share 新建挂载点
mkfs -t ext3 -c /dev/sdb 格式化为ext3
tips:手动挂载 mount /dev/sdb /home/share测试成功! (记得unmount)
4、HOSTB的配置不需要新建磁盘,直接在添加硬盘的时候选择已存在的硬盘,指定到share这个磁盘,记得使用新建好挂载点之后要测试下,mount成功即可。
5、配置samba服务器:a、采用终端配置,直接终端输入vi/etc/samba/smb.conf (主配置文件)。b、图形化界面配置,路径为:管理-->服务器-->samba 。samba配置很简单,就不多说了,关键是要搞懂权限问题。(自己也有点模糊~!)
6、在HOSTA上安装heartbeat软件
这里采用rpm安装,直接把安装包CP到虚拟机里,heartbeat-2.1.3-3版本需要3个包,安装顺序如下:
heartbeat-pils-2.1.3-3.el5.centos.i386.rpm
heartbeat-stonith-2.1.3-3.el5.centos.i386.rpm
heartbeat-2.1.3-3.el5.centos.i386.rpm
安装方法:先cd到该目录,ls查看文件,rpm -ivhheartbeat-pils-2.1.3-3.el5.centos.i386.rpm(注意使用tab键),根据提示安装即可。待3个包都安装好之后,最好rpm -q heartbeat -d 查看安装了哪些东西,这是一个好习惯哈。
7、heartbeat安装好之后,在/use/share/doc/heartbeat-2.1.3下找到以下3个文件:authkeys haresources ha.cf 把这三个文件cp到/etc/ha.d 下面。具体配置如下:
a、ha.cf配置:
There are lots of options in this file. Allyou have to have is a set
# of nodes listed {"node ...} one of{serial, bcast, mcast, or ucast},# and a value for"auto_failback".# ATTENTION: As the configurationfile is read line by line,# THE ORDER OF DIRECTIVE MATTERS!# In particular, make sure that theudpport, serial baud rate# etc. are set before the heartbeatmedia are defined!# debug and log file directives gointo effect when they# are encountered.# All will be fine if you keep themordered as in this example.# Note on logging:# If any of debugfile, logfile andlogfacility are defined then they# will be used. If debugfile and/orlogfile are not defined and# logfacility is defined then therespective logging and debug# messages will be loged to syslog.If logfacility is not defined# then debugfile and logfile will beused to log messges. If# logfacility is not defined anddebugfile and/or logfile are not# defined then defaults will be usedfor debugfile and logfile as# required and messages will be sentthere.# File to write debug messagesto#debugfile /var/log/ha-debug# File to write other messagestologfile /var/log/ha-log# Facility to use forsyslog()/loggerlogfacility local0# A note on specifying "how long"times below...# The default time unit isseconds# 10 means ten seconds# You can also specify them inmilliseconds# 1500ms means 1.5 seconds# keepalive: how long betweenheartbeats?keepalive 2# deadtime: howlong-to-declare-host-dead?# If you set this too low you will get the problematic# split-brain (or cluster partition) problem.# See the FAQ for how to use warntime to tune deadtime.deadtime 60# warntime: how long before issuing"late heartbeat" warning?# See the FAQ for how to usewarntime to tune deadtime.warntime 10# Very first dead time(initdead)# On some machines/OSes, etc. thenetwork takes a while to come up# and start working right afteryou've been rebooted. As a result# we have a separate dead time forwhen things first come up.# It should be at least twice thenormal dead time.initdead 120# What UDP port to use forbcast/ucast communication?#udpport 694# Baud rate for serial ports...#baud 19200 # serial serialportname...#serial /dev/ttyS0 # Linux#serial /dev/cuaa0 # FreeBSD#serial /dev/cuad0 # FreeBSD 6.x#serial /dev/cua/a # Solaris# What interfaces to broadcastheartbeats over?bcast eth1 # Linux#bcast eth1 eth2 # Linux#bcast le0 # Solaris#bcast le1 le2 #Solaris# Set up a multicast heartbeatmedium# mcast [dev] [mcast group] [port][ttl] [loop]# [dev] deviceto send/rcv heartbeats on# [mcast group] multicastgroup to join (class D multicast address# 224.0.0.0 - 239.255.255.255)# [port] udp port tosendto/rcvfrom (set this value to the# same value as "udpport" above)# [ttl] thettl value for outbound heartbeats. this effects# how far the multicast packet will propagate. (0-255)# Must be greater than zero.# [loop] togglesloopback for outbound multicast heartbeats.# if enabled, an outbound packet will be looped back and# received by the interface it was sent on. (0 or 1)# Set this value to zero.#mcast eth0 225.0.0.1 694 1 0# Set up a unicast / udp heartbeatmedium# ucast [dev] [peer-ip-addr]# [dev] deviceto send/rcv heartbeats on# [peer-ip-addr] IP address ofpeer to send packets toucast eth1 192.168.1.20# About boolean values...# Any of the followingcase-insensitive values will work for true:# true, on, yes, y, 1# Any of the followingcase-insensitive values will work for false:# false, off, no, n, 0# auto_failback: determineswhether a resource will# automatically fail back to its"primary" node, or remain# on whatever node is serving ituntil that node fails, or# an administrator intervenes.# The possible values forauto_failback are:# on - enable automatic failbacks# off - disable automatic failbacks# legacy - enable automatic failbacks in systems# where all nodes do not yet support# the auto_failback option.# auto_failback "on" and "off" arebackwards compatible with the old# "nice_failback on" setting.# See the FAQ for information on howto convert# from "legacy" to "on" without a flash cut.# (i.e., using a "rolling upgrade" process)# The default value forauto_failback is "legacy", which# will issue a warning atstartup. So, make sure you put# an auto_failback directive in yourha.cf file.# (note: auto_failback can be anyboolean or "legacy")#auto_failback on# Basic STONITH support# Using this directive assumes thatthere is one stonith# device in the cluster. Parameters to this device are# read from a configuration file.The format of this line is:# stonith # NOTE: it is up to you to maintainthis file on each node in the# cluster!#stonith baytech /etc/ha.d/conf/stonith.baytech# STONITH support# You can configure multiple stonithdevices using this directive.# The format of the line is:# stonith_host# is themachine the stonith device is attached# to or * to mean it is accessible from any host.# is thetype of stonith device (a list of# supported drives is in /usr/lib/stonith.)# are driverspecific parameters. To see the# format for a particular device, run:# stonith -l-t # Note that if you put your stonithdevice access information in# here, and you make this filepublically readable, you're asking# for a denial of service attack;-)# To get a list of supported stonithdevices, run# stonith -L# For detailed information on whichstonith devices are supported# and their detailed configurationoptions, run this command:# stonith -h#stonith_host * baytech 10.0.0.3 myloginmysecretpassword#stonith_host ken3 rps10 /dev/ttyS1 kathy 0#stonith_host kathy rps10 /dev/ttyS1 ken3 0# Watchdog is the watchdogtimer. If our own heart doesn't beat for# a minute, then our machine willreboot.# NOTE: If you are using thesoftware watchdog, you very likely# wish to load the module with theparameter "nowayout=0" or# compile it withoutCONFIG_WATCHDOG_NOWAYOUT set. Otherwise even# an orderly shutdown of heartbeatwill trigger a reboot, which is# very likely NOT what you want.#watchdog /dev/watchdog # Tell what machines are in thecluster# node nodename... -- must match uname -nnode sev1.example.comnode sev2.example.com# Less common options...# Treats 10.10.10.254 as apsuedo-cluster-member# Used together with ipfailbelow...# note: don't use a cluster node asping nodeping 192.168.138.2# Treats 10.10.10.254 and10.10.10.253 as a psuedo-cluster-member# called group1. If either10.10.10.254 or 10.10.10.253 are up# then group1 is up# Used together with ipfailbelow...#ping_group group1 10.0.0.1 10.0.0.2# HBA ping derective for FiberChannel# Treats fc-card-name aspsudo-cluster-member# used with ipfail below ...## You can obtain HBAAPI fromhttp://hbaapi.sourceforge.net. Youneed# to get the library specific toyour HBA directly from the vender# To install HBAAPI stuff, all Youneed to do is to compile the common# part you obtained from thesourceforge. This will produce libHBAAPI.so# which you need to copy to/usr/lib. You need also copy hbaapi.h to# /usr/include.# The fc-card-name is the nameobtained from the hbaapitest program# that is part of the hbaapipackage. Running hbaapitest will produce# a verbose output. One of the firstline is similar to:# Apapter number 0 is named: qlogic-qla2200-0# Here fc-card-name isqlogic-qla2200-0.#hbaping fc-card-name# Processes started and stopped withheartbeat. Restarted unless# they exit with rc=100#respawn userid /path/name/to/run#respawn root /usr/lib/heartbeat/ipfail# Access control for client api# default is no access#apiauth client-name gid=gidlist uid=uidlist#apiauth ipfail gid=root uid=root############################ Unusual options.############################ hopfudge maximum hop count minusnumber of nodes in config#hopfudge 1# deadping - dead time for pingnodes#deadping 30# hbgenmethod - Heartbeat generationnumber creation method# Normally these are stored on disk and incremented asneeded.#hbgenmethod time# realtime - enable/disable realtimeexecution (high priority, etc.)# defaults to on#realtime off# debug - set debug level# defaults to zero#debug 1# API Authentication - replaces thefifo-permissions-based system of the past# You can put a uid list and/or agid list.# If you put both, then a process isauthorized if it qualifies under either# the uid list, or under the gidlist.# The groupname "default" hasspecial meaning. If it is specified, then# this will be used for authorizinggroupless clients, and any client groups# not otherwise specified.# There is a subtle exception tothis. "default" will never be used in the# following cases (actual defaultauth directives noted in brackets)# ipfail (uid=HA_CCMUSER)# ccm (uid=HA_CCMUSER)# ping (gid=HA_APIGROUP)# cl_status (gid=HA_APIGROUP)# This is done to avoid creating agaping security hole and matches the most# likely desired configuration.#apiauth ipfail uid=hacluster#apiauth ccm uid=hacluster#apiauth cms uid=hacluster#apiauth ping gid=haclient uid=alanr,root#apiauth default gid=haclient# message format in the wire, it canbe classic or netstring,# default: classic#msgfmt classic/netstring# Do we use logging daemon?# If logging daemon is used,logfile/debugfile/logfacility in this file# are not meaningful any longer. Youshould check the config file for logging# daemon (the default is/etc/logd.cf)# more infomartion can be fould inhttp://www.linux-ha.org/ha_2ecf_2fUseLogdDirective# Setting use_logd to "yes" isrecommended use_logd yes# the interval we reconnect tologging daemon if the previous connection failed# default: 60 seconds#conn_logd_time 60# Configure compression module# It could be zlib or bz2, dependingon whether u have the corresponding# library in the system.#compression bz2# Confiugre compressionthreshold# This value determines thethreshold to compress a message,# e.g. if the threshold is 1, thenany message with size greater than 1 KB# will be compressed, the default is2 (KB)# compression_threshold 2b、配置authkeys
# Authenticationfile. Must be mode 600
# Must have exactly one authdirective at the front.# auth sendauthentication using this method-id# Then, list the method and key thatgo with that method-id# Available methods: crc sha1,md5. Crc doesn't need/want a key.# You normally only have oneauthentication method-id listed in this file# Put more than one to make a smoothtransition when changing auth# methods and/or keys.# sha1 is believedto be the "best", md5 next best.
# crc adds no security, except frompacket corruption.# Use only on physically secure networks.auth 1# Authentication file. Must bemode 600# Must have exactly one authdirective at the front.# auth sendauthentication using this method-id# Then, list the method and key thatgo with that method-id# Available methods: crc sha1,md5. Crc doesn't need/want a key.# You normally only have oneauthentication method-id listed in this file# Put more than one to make a smoothtransition when changing auth# methods and/or keys.# sha1 is believed to be the "best",md5 next best.# crc adds no security, except frompacket corruption.# Use only on physically secure networks.auth 11 crc#2 sha1 HI!#3 md5 Hello!重点:配置完后要修改authkeys文件权限 chmod 600authkeys(这一步必须做)
c、配置haresources# This is a list ofresources that move from machine to machine as
# nodes go down and come up in thecluster. Do not include# "administrative" or fixed IPaddresses in this file.# # The haresources files MUST BEIDENTICAL on all nodes of the cluster.# The node names listed in front ofthe resource group information# is the name of the preferred nodeto run the service. It is# not necessarily the name of thecurrent machine. If you are running# auto_failback ON (or legacy), thenthese services will be started# up on the preferred nodes - anytime they're up.# If you are running withauto_failback OFF, then the node information# will be used in the case of asimultaneous start-up, or when using# the hb_standby {foreign,local}command.# BUT FOR ALL OF THESE CASES, theharesources files MUST BE IDENTICAL.# If your files are different thenalmost certainly something# won't work right.# # We refer to this file when we'recoming up, and when a machine is being# taken over after going down.# You need to make this right foryour installation, then install it in# /etc/ha.d# Each logical line in the fileconstitutes a "resource group".# A resource group is a list ofresources which move together from# one node to another - in the orderlisted. It is assumed that there# is no relationship betweendifferent resource groups. These# resource in a resource group arestarted left-to-right, and stopped# right-to-left. Long lists ofresources can be continued from line# to line by ending the lines withbackslashes ("\").# These resources in this file areeither IP addresses, or the name# of scripts to run to "start" or"stop" the given resource.# The format is like this:#node-name resource1 resource2 ... resourceNsev1.example.com 192.168.138.23 httpdsev1.example.com 192.168.138.24Filesystem::/dev/sdb::/home/share::ext3 smb# If the resource name contains an:: in the middle of it, the# part after the :: is passed to theresource script as an argument.# Multiple arguments are separatedby the :: delimeter# In the case of IP addresses, theresource script name IPaddr is# implied.# For example, the IP address135.9.8.7 could also be represented# as IPaddr::135.9.8.7# THIS IS IMPORTANT!! vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv# The given IP address is directedto an interface which has a route# to the given address. Thismeans you have to have a net route# set up outside of theHigh-Availability structure. We don't set it# up here -- we key off of it.# The broadcast address for the IPalias that is created to support# an IP address defaults to thehighest address on the subnet.# The netmask for the IP alias thatis created defaults to the same# netmask as the route that itselected in in the step above.# The base interface for the IPaliasthat is created defaults to the# same netmask as the route that itselected in in the step above.# If you want to specify that thisIP address is to be brought up# on a subnet with a netmask of255.255.255.0, you would specify# this as IPaddr::135.9.8.7/24 .# If you wished to tell it that thebroadcast address for this subnet# was 135.9.8.210, then you wouldspecify that this way:# IPaddr::135.9.8.7/24/135.9.8.210# If you wished to tell it that theinterface to add the address to# is eth0, then you would need tospecify it this way:# IPaddr::135.9.8.7/24/eth0# And this way to specify both thebroadcast address and the# interface:# IPaddr::135.9.8.7/24/eth0/135.9.8.210# The IP addresses you list in thisfile are called "service" addresses,# since they're they're the publiclyadvertised addresses that clients# use to get at highly availableservices.# For a hot/standby (n 2-node system with only# a single service address,# you will probably only put onesystem name and one IP address in here.# The name you give the address tois the name of the default "hot"# system.# Where the nodename is the name ofthe node which "normally" owns the# resource. If this machine isup, it will always have the resource# it is shown as owning.# The string you put in for nodenamemust match the uname -n name# of your machine. Dependingon how you have it administered, it could# be a short name or a FQDN.##-------------------------------------------------------------------# Simple case: One service address,default subnet and netmask# No servers that go up and down with the IP address#just.linux-ha.org 135.9.216.110#-------------------------------------------------------------------# Assuming the adminstrativeaddresses are on the same subnet...# A little more complex case: Oneservice address, default subnet# and netmask, and you want to startand stop http when you get# the IP address...#just.linux-ha.org 135.9.216.110 http#-------------------------------------------------------------------# A little more complex case: Threeservice addresses, default subnet# and netmask, and you want to startand stop http when you get# the IP address...#just.linux-ha.org 135.9.216.110135.9.215.111 135.9.216.112 httpd#-------------------------------------------------------------------# One service address, with thesubnet, interface and bcast addr# explicitly defined.#just.linux-ha.org 135.9.216.3/28/eth0/135.9.216.12 httpd#-------------------------------------------------------------------# An example where a sharedfilesystem is to be used.# Note that multiple aguments arepassed to this script using# the delimiter '::' to separateeach argument.#node1 10.0.0.170 Filesystem::/dev/sda1::/data1::ext2# Regarding the node-names in thisfile:# They must match the names of thenodes listed in ha.cf, which in turn# must match the `uname -n` of somenode in the cluster. So they aren't# virtual in any sense of theword.8、在HOSTB上配置heartbeat
这里我采用了比较偷懒的方法,因为配置和HOSTA一样,只需要在ha.cf配置里找的ucast eth1192.168.1.20这一行,把地址改为192.168.1.10即可,所以我直接用ftp登录到HOSTA上面,把上面3个配置文件GET一下就OK!
9、启动heartbeat
HOSTA:终端输入:service heartbeatstart OK
HOSTB:终端输入:service heartbeatstart OK
这里如果配置正确,网络连通性OK,那么就会自动虚拟出一个eth0:0网口,即为heartbeat协商出的虚拟IP。记得使用 ps-ef 命了查看heartbeat的运行状态哈~~!
打字太累,截图不好传,写这么多主要是方便自己以后忘记的时候在看看~!本人在虚拟机上测试通过,可以自动切换并启动smb服务,httpd服务也是出奇测试用的,磁盘挂载也OK,这里千万不能在fstab内把磁盘自动挂载上了,必须要heartbeat来挂载,这样才有效!、
总结:使用heartbeat来实现故障转移群集只是简单的配置而已,需要注意一下几点:
1、安装heartbeat之前要修改主机名,IP等信息,需关注hosts /etc/sysconfig/network等网络配置文件 配置好之后再安装
2、heartbeat配置主要是ha.cf,需要主要的是添加节点、选择心跳检测端口、 ping外网连通性,authkeys只是验证方式,选择一种即可,在haresources文件内也只需加入一条要执行的命令就行了!(这条命令是精华,花了偶一个星期,后来才发现注释里都有说明,英文不好伤不起啊……)
3、linux下的配置文件里的注释很重要,有空一定要多看看,配置起来很有帮助!
4、群集大致分3种:高可用,负载均衡(貌似故障转移也属于负载均衡的哈)和高性能计算,对于大型服务器的部署,这些都是必须的,以后需要多研究!以后不知道还有没有机会学习veritas和oracle!