CentOS7上可通過三大模塊corosync(心跳管理),pacemaker(資源管理),pcs(配置工具)來實(shí)現(xiàn)最基本的高可用性集群功能,本文將介紹這些工具的工作原理和配置過程。
1. 實(shí)驗(yàn)環(huán)境
-
虛擬化軟件:
-
實(shí)驗(yàn)虛機(jī):
- iscsi-disks: 192.168.56.20,通過iSCSI協(xié)議提供共享存儲(chǔ),默認(rèn)配置1個(gè)cpu,1G內(nèi)存。
- ha-host1: 192.168.56.31,默認(rèn)配置1個(gè)cpu,1G內(nèi)存。
- ha-host2: 192.168.56.32,默認(rèn)配置1個(gè)cpu,1G內(nèi)存。
- ha-host3: 192.168.56.33,默認(rèn)配置1個(gè)cpu,1G內(nèi)存。
安裝和管理網(wǎng)絡(luò):192.168.56.0/24,該網(wǎng)絡(luò)為VirtualBox的Host-Only網(wǎng)絡(luò),支持物理機(jī)和VirtualBox虛機(jī)間的互相訪問。
2. 克隆項(xiàng)目并啟動(dòng)上述虛擬機(jī)
$ git clone https://github.com/lprincewhn/iscsi.git
$ cd iscsi
$ vagrant up iscsi-disks
$ cd ..
$ git clone https://github.com/lprincewhn/linuxha.git
$ cd LinuxHA
$ vagrant up
虛擬機(jī)啟動(dòng)完畢后可使用以下用戶登陸:
- root/vagrant
- vagrant/vagrant
3. 創(chuàng)建pcs集群
Step 1 安裝軟件包
在3臺(tái)集群主機(jī)上安裝corosync,pacemaker,pcs包, 啟動(dòng)pcsd服務(wù)。
# yum -y install corosync pacemaker pcs
# systemctl start pcsd && systemctl enable pcsd
Step 2 修改hacluster用戶密碼
在3臺(tái)集群主機(jī)上修改用戶hacluster的密碼,3臺(tái)主機(jī)的密碼保持一致。這個(gè)用戶僅用于集群主機(jī)間通信,無法登陸系統(tǒng)。因此該密碼僅在主機(jī)集群認(rèn)證時(shí)一次性使用。
[root@ha-host1 ~]# passwd hacluster
Changing password for user hacluster.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
Step 3 互相認(rèn)證集群主機(jī)
[root@ha-host1 ~]# pcs cluster auth ha-host1 ha-host2 ha-host3
Username: hacluster
Password:
ha-host1: Authorized
ha-host2: Authorized
ha-host3: Authorized
[root@ha-host1 ~]#
Step 4 創(chuàng)建集群
[root@ha-host1 ~]# pcs cluster setup --name linuxha ha-host1 ha-host2 ha-host3
Destroying cluster on nodes: ha-host1, ha-host2, ha-host3...
ha-host3: Stopping Cluster (pacemaker)...
ha-host1: Stopping Cluster (pacemaker)...
ha-host2: Stopping Cluster (pacemaker)...
ha-host2: Successfully destroyed cluster
ha-host1: Successfully destroyed cluster
ha-host3: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'ha-host1', 'ha-host2', 'ha-host3'
ha-host2: successful distribution of the file 'pacemaker_remote authkey'
ha-host1: successful distribution of the file 'pacemaker_remote authkey'
ha-host3: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
ha-host1: Succeeded
ha-host2: Succeeded
ha-host3: Succeeded
Synchronizing pcsd certificates on nodes ha-host1, ha-host2, ha-host3...
ha-host1: Success
ha-host2: Success
ha-host3: Success
Restarting pcsd on the nodes in order to reload the certificates...
ha-host1: Success
ha-host2: Success
ha-host3: Success
Step 5 啟動(dòng)集群并設(shè)置自動(dòng)啟動(dòng)
[root@ha-host1 ~]# pcs cluster start --all
ha-host1: Starting Cluster...
ha-host2: Starting Cluster...
ha-host3: Starting Cluster...
[root@ha-host1 ~]# pcs cluster enable --all
ha-host1: Cluster Enabled
ha-host2: Cluster Enabled
ha-host3: Cluster Enabled
Step 6 查看集群狀態(tài)
[root@ha-host1 ~]# pcs status
Cluster name: linuxha
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: ha-host3 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Mon Apr 23 03:39:50 2018
Last change: Mon Apr 23 03:38:08 2018 by hacluster via crmd on ha-host3
3 nodes configured
0 resources configured
Online: [ ha-host1 ha-host2 ha-host3 ]
No resources
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
pcs status的結(jié)果顯示:
當(dāng)前的仲裁節(jié)點(diǎn)(Current DC, DC即為Designated Co-ordinator)為ha-host3,這個(gè)節(jié)點(diǎn)負(fù)責(zé)向集群中的節(jié)點(diǎn)發(fā)出一系列指令,使各個(gè)資源按照定義(存儲(chǔ)在cib數(shù)據(jù)庫中)啟動(dòng)或停止。
3臺(tái)集群主機(jī)ha-host1,ha-host2,ha-host3都已經(jīng)online,但是目前沒有任何資源。
-
有一個(gè)關(guān)于stonith設(shè)備的Warning:
WARNING: no stonith devices and stonith-enabled is not false
stonith設(shè)備在后面將會(huì)介紹,目前因?yàn)檫€沒有創(chuàng)建,因此先向stonith-enabled屬性設(shè)為false。關(guān)閉后該WARNING消失。
[root@ha-host1 ~]# pcs property set stonith-enabled=false [root@ha-host1 ~]# pcs status Cluster name: linuxha Stack: corosync Current DC: ha-host3 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum Last updated: Mon Apr 23 03:40:17 2018 Last change: Mon Apr 23 03:40:15 2018 by root via cibadmin on ha-host1 3 nodes configured 0 resources configured Online: [ ha-host1 ha-host2 ha-host3 ] No resources Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
4. 創(chuàng)建資源
創(chuàng)建一個(gè)最簡單的IP資源:
[root@ha-host1 ~]# pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.56.24 cidr_netmask=24 op monitor interval=30s
查看pcs的狀態(tài):
[root@ha-host1 ~]# pcs status
Cluster name: linuxha
Stack: corosync
Current DC: ha-host3 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Mon Apr 23 03:41:46 2018
Last change: Mon Apr 23 03:41:40 2018 by root via cibadmin on ha-host1
3 nodes configured
1 resource configured
Online: [ ha-host1 ha-host2 ha-host3 ]
Full list of resources:
vip (ocf::heartbeat:IPaddr2): Started ha-host1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
上面顯示新創(chuàng)建的資源vip,這個(gè)IP被分配在ha-host1上。
檢查ha-host1的網(wǎng)口,可以看到ha-host1的eth1網(wǎng)口上分配了新的IP 192.168.56.24
[root@ha-host1 ~]# ip a
...
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:55:a1:19 brd ff:ff:ff:ff:ff:ff
inet 192.168.56.21/24 brd 192.168.56.255 scope global eth1
valid_lft forever preferred_lft forever
inet 192.168.56.24/24 brd 192.168.56.255 scope global secondary eth1
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe55:a119/64 scope link
valid_lft forever preferred_lft forever
[root@ha-host1 ~]# ssh 192.168.56.24
Last login: Wed Apr 18 05:38:11 2018 from 192.168.56.21
[root@ha-host1 ~]# exit
logout
Connection to 192.168.56.24 closed.
5. 觸發(fā)切換
將ha-host1的eth1網(wǎng)口停掉,可發(fā)現(xiàn)192.168.56.24這個(gè)IP被切換到了ha-host2的eth1端口。
[root@ha-host1 ~]# ifconfig eth1 down
[root@ha-host2 ~]# pcs status
Cluster name: linuxha
Stack: corosync
Current DC: ha-host3 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Mon Apr 23 03:43:36 2018
Last change: Mon Apr 23 03:41:39 2018 by root via cibadmin on ha-host1
3 nodes configured
1 resource configured
Online: [ ha-host2 ha-host3 ]
OFFLINE: [ ha-host1 ]
Full list of resources:
vip (ocf::heartbeat:IPaddr2): Started ha-host2
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@ha-host2 ~]# ip a
...
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:bc:94:42 brd ff:ff:ff:ff:ff:ff
inet 192.168.56.22/24 brd 192.168.56.255 scope global eth1
valid_lft forever preferred_lft forever
inet 192.168.56.24/24 brd 192.168.56.255 scope global secondary eth1
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:febc:9442/64 scope link
valid_lft forever preferred_lft forever
上面pcs status的結(jié)果顯示了ha-host1變?yōu)榱薕FFLINE狀態(tài),這是因?yàn)閑th1也是ha-host1的集群間的通信網(wǎng)口。在實(shí)際部署中,虛擬IP資源一般用于承載業(yè)務(wù),應(yīng)該和集群通信用的網(wǎng)絡(luò)分開。
6. Pacemaker中的資源定義
Pacemaker中的資源類型使用standard, provider(僅當(dāng)standard為ocf使用)和agent來進(jìn)行標(biāo)識,格式如下:
<standard>:[provider]:<agent>
可用pcs resources list指令列出當(dāng)前支持的資源類型:
[root@ha-host1 ~]# pcs resource list
...
ocf:heartbeat:iface-vlan - Manages VLAN network interfaces.
ocf:heartbeat:IPaddr - Manages virtual IPv4 and IPv6 addresses (Linux specific
version)
ocf:heartbeat:IPaddr2 - Manages virtual IPv4 and IPv6 addresses (Linux specific
version)
ocf:heartbeat:IPsrcaddr - Manages the preferred source address for outgoing IP
packets
ocf:heartbeat:iSCSILogicalUnit - Manages iSCSI Logical Units (LUs)
ocf:heartbeat:iSCSITarget - iSCSI target export agent
ocf:heartbeat:LVM - Controls the availability of an LVM Volume Group
ocf:heartbeat:MailTo - Notifies recipients by email in the event of resource
takeover
ocf:heartbeat:mysql - Manages a MySQL database instance
ocf:heartbeat:nagios - Nagios resource agent
...
以上輸出中可以找到之前創(chuàng)建的vip資源類型 ocf:heartbeat:IPaddr2。
然后使用pcs resource describe指令查看該類型資源所需參數(shù)
[root@ha-host1 ~]# pcs resource describe ocf:heartbeat:IPaddr2
ocf:heartbeat:IPaddr2 - Manages virtual IPv4 and IPv6 addresses (Linux specific
version)
This Linux-specific resource manages IP alias IP addresses.
It can add an IP alias, or remove one.
In addition, it can implement Cluster Alias IP functionality
if invoked as a clone resource.
If used as a clone, you should explicitly set clone-node-max >= 2,
and/or clone-max < number of nodes. In case of node failure,
clone instances need to be re-allocated on surviving nodes.
This would not be possible if there is already an instance on those nodes,
and clone-node-max=1 (which is the default).
Resource options:
ip (required): The IPv4 (dotted quad notation) or IPv6 address (colon
hexadecimal notation) example IPv4 "192.168.1.1". example IPv6
"2001:db8:DC28:0:0:FC57:D4C8:1FFF".
nic: The base network interface on which the IP address will be brought
online. If left empty, the script will try and determine this from the
routing table. Do NOT specify an alias interface in the form eth0:1 or
anything here; rather, specify the base interface only. If you want a
label, see the iflabel parameter. Prerequisite: There must be at least
one static IP address, which is not managed by the cluster, assigned to
the network interface. If you can not assign any static IP address on the
interface, modify this kernel parameter: sysctl -w
net.ipv4.conf.all.promote_secondaries=1 # (or per device)
cidr_netmask: The netmask for the interface in CIDR format (e.g., 24 and not
255.255.255.0) If unspecified, the script will also try to
determine this from the routing table.
broadcast: Broadcast address associated with the IP. If left empty, the script
will determine this from the netmask.
iflabel: You can specify an additional label for your IP address here. This
label is appended to your interface name. A label can be specified in
nic parameter but it is deprecated. If a label is specified in nic
name, this parameter has no effect.
lvs_support: Enable support for LVS Direct Routing configurations. In case a
IP address is stopped, only move it to the loopback device to
allow the local node to continue to service requests, but no
longer advertise it on the network. Notes for IPv6: It is not
necessary to enable this option on IPv6. Instead, enable
'lvs_ipv6_addrlabel' option for LVS-DR usage on IPv6.
lvs_ipv6_addrlabel: Enable adding IPv6 address label so IPv6 traffic
originating from the address's interface does not use this
address as the source. This is necessary for LVS-DR health
checks to realservers to work. Without it, the most
recently added IPv6 address (probably the address added by
IPaddr2) will be used as the source address for IPv6
traffic from that interface and since that address exists
on loopback on the realservers, the realserver response to
pings/connections will never leave its loopback. See
RFC3484 for the detail of the source address selection.
See also 'lvs_ipv6_addrlabel_value' parameter.
lvs_ipv6_addrlabel_value: Specify IPv6 address label value used when
'lvs_ipv6_addrlabel' is enabled. The value should be
an unused label in the policy table which is shown
by 'ip addrlabel list' command. You would rarely
need to change this parameter.
mac: Set the interface MAC address explicitly. Currently only used in case of
the Cluster IP Alias. Leave empty to chose automatically.
clusterip_hash: Specify the hashing algorithm used for the Cluster IP
functionality.
unique_clone_address: If true, add the clone ID to the supplied value of IP to
create a unique address to manage
arp_interval: Specify the interval between unsolicited ARP packets in
milliseconds.
arp_count: Number of unsolicited ARP packets to send at resource
initialization.
arp_count_refresh: Number of unsolicited ARP packets to send during resource
monitoring. Doing so helps mitigate issues of stuck ARP
caches resulting from split-brain situations.
arp_bg: Whether or not to send the ARP packets in the background.
arp_mac: MAC address to send the ARP packets to. You really shouldn't be
touching this.
arp_sender: The program to send ARP packets with on start. For infiniband
interfaces, default is ipoibarping. If ipoibarping is not
available, set this to send_arp.
flush_routes: Flush the routing table on stop. This is for applications which
use the cluster IP address and which run on the same physical
host that the IP address lives on. The Linux kernel may force
that application to take a shortcut to the local loopback
interface, instead of the interface the address is really bound
to. Under those circumstances, an application may, somewhat
unexpectedly, continue to use connections for some time even
after the IP address is deconfigured. Set this parameter in
order to immediately disable said shortcut when the IP address
goes away.
run_arping: Whether or not to run arping for IPv4 collision detection check.
preferred_lft: For IPv6, set the preferred lifetime of the IP address. This
can be used to ensure that the created IP address will not be
used as a source address for routing. Expects a value as
specified in section 5.5.4 of RFC 4862.
Default operations:
start: interval=0s timeout=20s
stop: interval=0s timeout=20s
monitor: interval=10s timeout=20s
之前創(chuàng)建vip資源時(shí)使用的資源參數(shù)(ip,cidr_netmask)和操作參數(shù)(monitor:interval)都可在以上輸出中找到。