2020-04-04 Keepalived高可用集群應(yīng)用實(shí)踐(三)

1. Nginx負(fù)載均衡配合Keepalived服務(wù)的案例實(shí)戰(zhàn)

1.1 在lb01和lb02上配置Nginx負(fù)載均衡

結(jié)合前面介紹的Nginx負(fù)載均衡的環(huán)境,根據(jù)下圖調(diào)整好主負(fù)載均衡器lb01、備用負(fù)載均很器lb02服務(wù)器上Nginx負(fù)載均衡環(huán)境,兩臺(tái)服務(wù)器的安裝基礎(chǔ)環(huán)境一致。

Nginx負(fù)載均衡器高可用邏輯圖

這里使用的Nginx負(fù)載均衡的配置如下:

[root@lb01 ~]# cd /application/nginx/conf/
[root@lb01 conf]# cat nginx.conf
worker_processes  1;
events {
    worker_connections  1024;
}
http {
    include       mime.types;
    default_type  application/octet-stream;
    sendfile        on;
    keepalive_timeout  65;
    upstream www_pools {    ---這里定義Web服務(wù)器池,包含了106和108兩個(gè)Web節(jié)點(diǎn)
        server 192.168.9.106:80  weight=1;
        server 192.168.9.108:80  weight=1;
    }
    server {    ---這里定義代理的負(fù)載均衡域名虛擬主機(jī)
        listen       192.168.9.210:80;    ---請(qǐng)注意這里的配置,指定IP監(jiān)聽(tīng)了
        server_name  www.etiantian.org;
        location / {
            proxy_pass http://www_pools;    ---訪問(wèn)www.etiantian.org,請(qǐng)求發(fā)送給www_pools里面的節(jié)點(diǎn)
        proxy_set_header Host $host;
        }
   }
}

提示:此配置僅代理了www.etiantian.org域名。

1.2 在lb01和lb02上配置Keepalived服務(wù)

此處使用單實(shí)例為例進(jìn)行配置說(shuō)明。lb01上Keepalived服務(wù)單實(shí)例主節(jié)點(diǎn)的配置如下:

[root@lb01 keepalived]# cat keepalived.conf
global_defs {
    router_id lb01
}
vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 150
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.9.210/24 dev eth0 label eth0:3
    }
}

提示:VIP為192.168.9.210,即工作時(shí)需要把Nginx負(fù)載均衡代理的www.etiantian.org解析到這個(gè)VIP。
lb02上Keepalived服務(wù)單實(shí)例備節(jié)點(diǎn)的配置如下:

[root@lb02 keepalived]# cat keepalived.conf
global_defs {
    router_id lb02
}
vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.9.210/24 dev eth0 label eth0:3
    }
}

1.3 用戶訪問(wèn)準(zhǔn)備及模擬實(shí)際訪問(wèn)

準(zhǔn)備工作如下:
1)在客戶端hosts文件里把www.etiantian.org域名解析到VIP 192.168.9.210上,正式場(chǎng)景需通過(guò)DNS解析。

192.168.9.210  www.etiantian.org

2)兩臺(tái)服務(wù)器也要配好Nginx負(fù)載均衡服務(wù),并且確保后面代理的Web節(jié)點(diǎn)可以測(cè)試訪問(wèn)。

[root@lb01 conf]# tail -1 /etc/hosts
192.168.9.210   www.etiantian.org
[root@lb01 keepalived]# netstat -lntup|grep nginx
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      28553/nginx: master
[root@lb01 keepalived]# ip addr|grep 192.168.9.210
    inet 192.168.9.210/24 scope global secondary eth0:3

下面模擬實(shí)際的訪問(wèn)。
1)通過(guò)在客戶端瀏覽器輸入“www.etiantian.org”測(cè)試訪問(wèn),按Ctrl+F5組合鍵刷新幾次,正常應(yīng)該可以出現(xiàn)如下圖所示的兩種訪問(wèn)結(jié)果。

客戶端訪問(wèn)負(fù)載均衡測(cè)試結(jié)果1

客戶端訪問(wèn)負(fù)載均衡測(cè)試結(jié)果2

2)此時(shí)停止lb01服務(wù)器或者停掉Keepalived服務(wù),觀察業(yè)務(wù)是否正常:

[root@lb01 conf]# systemctl stop keepalived
[root@lb01 conf]# ip addr|grep 192.168.9.210

提示:嚴(yán)格來(lái)講應(yīng)該關(guān)掉服務(wù)器來(lái)模擬效果才是最佳的。
3)觀察lb02備節(jié)點(diǎn)是否接管VIP 192.168.9.210。

[root@lb02 keepalived]# ip addr|grep 192.168.9.210
    inet 192.168.9.210/24 scope global secondary eth0:3

再次在客戶端瀏覽器輸入“www.etiantian.org”測(cè)試訪問(wèn),按Ctrl+F5組合鍵刷新幾次,正常應(yīng)該可以出現(xiàn)和切換lb02前相同的訪問(wèn)結(jié)果。
4)開(kāi)啟lb01的Keepalived服務(wù)。

[root@lb01 conf]# systemctl start keepalived
[root@lb01 conf]# ip addr|grep 192.168.9.210
    inet 192.168.9.210/24 scope global secondary eth0:3

可以看到,VIP很快就接管回來(lái)了,此時(shí)瀏覽器訪問(wèn)結(jié)果依然正常。

2. 解決服務(wù)監(jiān)聽(tīng)的網(wǎng)卡上不存在IP地址的問(wèn)題

如果配置使用“l(fā)isten 192.168.9.210:80;”的方式指定IP監(jiān)聽(tīng)服務(wù),而本地的網(wǎng)卡上沒(méi)有192.168.9.210這個(gè)IP,Nginx就會(huì)報(bào)錯(cuò):

[root@lb01 conf]# nginx
nginx: [emerg] bind() to 192.168.9.210:80 failed (99: Cannot assign requested address)

如果要實(shí)施雙主(即主備)同時(shí)提供不同的服務(wù),配置文件里指定了IP監(jiān)聽(tīng),備節(jié)點(diǎn)則會(huì)因?yàn)榫W(wǎng)卡實(shí)際不存在VIP報(bào)錯(cuò)。
出現(xiàn)上面問(wèn)題的原因就是在物理網(wǎng)卡上沒(méi)有與配置文件里監(jiān)聽(tīng)的IP相應(yīng)的IP,我們要讓Nginx服務(wù)在網(wǎng)卡上沒(méi)有指定監(jiān)聽(tīng)的IP時(shí)也能啟動(dòng),不報(bào)錯(cuò),解決辦法是在/etc/sysctl.conf中加入如下內(nèi)核參數(shù)配置:

net.ipv4.ip_nonlocal_bind = 1
---此項(xiàng)表示啟動(dòng)Nginx而忽略配置中監(jiān)聽(tīng)的IP是否存在,它同樣適合Haproxy。

最后執(zhí)行sysctl -p使上述修改生效。

3. 解決高可用服務(wù)只是針對(duì)物理服務(wù)器的問(wèn)題

默認(rèn)情況下,Keepalived軟件僅僅在對(duì)方機(jī)器宕機(jī)或者Keepalived停掉的時(shí)候才會(huì)接管業(yè)務(wù)。但在實(shí)際工作中,有業(yè)務(wù)服務(wù)停止而Keepalived服務(wù)還在工作的情況,這就會(huì)導(dǎo)致用戶訪問(wèn)的VIP無(wú)法找到對(duì)應(yīng)的服務(wù),那么,如何解決業(yè)務(wù)宕機(jī)就可以將IP漂移到備節(jié)點(diǎn)接管提供服務(wù)呢?
方法一:可以寫(xiě)守護(hù)進(jìn)程腳本來(lái)處理。當(dāng)Nginx業(yè)務(wù)出現(xiàn)問(wèn)題時(shí)就停掉本地的Keepalived服務(wù),實(shí)現(xiàn)IP漂移到對(duì)端接管提供服務(wù)。實(shí)際工作中部署及開(kāi)發(fā)的示例腳本如下:

[root@lb01 conf]# cd /server/scripts/
[root@lb01 scripts]# vim check_nginx.sh
#!/bin/bash
while true
do
    if [ 'netstat -lntup|grep nginx|wc -l' != 1 ];then
        systemctl stop keepalived
        break
    fi
        sleep 5
done

此腳本的基本思想是若沒(méi)有80端口存在,就停掉Keepalived服務(wù)實(shí)現(xiàn)釋放本地的VIP。
在后臺(tái)執(zhí)行上述腳本并檢查:

[root@lb01 scripts]# sh /server/scripts/check_nginx.sh &
[1] 29858
[root@lb01 scripts]# ps -ef|grep check
root      29858  19844  0 16:49 pts/0    00:00:00 sh /server/scripts/check_nginx.sh

確認(rèn)Nginx以及Keepalived服務(wù)是正常的。

[root@lb01 scripts]# lsof -i :80
COMMAND   PID  USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
nginx   29398  root    6u  IPv4 168351      0t0  TCP www.etiantian.org:http (LISTEN)
nginx   29399 nginx    6u  IPv4 168351      0t0  TCP www.etiantian.org:http (LISTEN)
[root@lb01 scripts]# ps -ef|grep keep|grep -v grep
root      29656      1  0 15:29 ?        00:00:00 /usr/sbin/keepalived -D
root      29657  29656  0 15:29 ?        00:00:00 /usr/sbin/keepalived -D
root      29658  29656  0 15:29 ?        00:00:00 /usr/sbin/keepalived -D

然后模擬Nginx服務(wù)停掉,看IP是否發(fā)生切換。

[root@lb01 scripts]# nginx -s stop
[root@lb01 scripts]# systemctl status keepalived
---keepalived已停
[root@lb01 scripts]# netstat -lntup|grep nginx

此時(shí),備節(jié)點(diǎn)已接管:

[root@lb02 conf]# ip addr|grep 192.168.9.210
    inet 192.168.9.210/24 scope global secondary eth0:3

方法二:可以使用Keepalived的配置文件參數(shù)觸發(fā)寫(xiě)好的監(jiān)測(cè)服務(wù)腳本。
首先要開(kāi)發(fā)監(jiān)測(cè)服務(wù)腳本,注意這個(gè)腳本與上一個(gè)腳本的區(qū)別。

[root@lb01 scripts]# vim chk_nginx_proxy.sh
#!/bin/bash
if [ 'netstat -lntup|grep nginx|wc -l' != 1 ];then
    systemctl stop keepalived
fi
[root@lb01 scripts]# chmod +x chk_nginx_proxy.sh 
[root@lb01 scripts]# ls -l chk_nginx_proxy.sh 
-rwxr-xr-x 1 root root 94 Apr  3 17:21 chk_nginx_proxy.sh

此時(shí),Keepalived服務(wù)的完整配置如下:

[root@lb01 keepalived]# cat keepalived.conf
global_defs {
    router_id lb01
    script_user root    ---指定檢查腳本的用戶
    enable_script_security
}
vrrp_script chk_nginx_proxy {    ---定義VRRP腳本,檢測(cè)HTTP端口
    script "/server/scripts/chk_nginx_proxy.sh"    ---執(zhí)行腳本,當(dāng)Nginx服務(wù)出現(xiàn)問(wèn)題時(shí),就停掉Keepalived服務(wù)
    interval 2    ---間隔2秒
    weight 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 150
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.9.210/24 dev eth0 label eth0:3
    }
}

下面測(cè)試接管結(jié)果。

[root@lb01 keepalived]# lsof -i :80
COMMAND   PID  USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
nginx   31922  root    6u  IPv4 237693      0t0  TCP www.etiantian.org:http (LISTEN)
nginx   31923 nginx    6u  IPv4 237693      0t0  TCP www.etiantian.org:http (LISTEN)
[root@lb01 keepalived]# ps -ef|grep keep
root      31914      1  0 11:40 ?        00:00:00 /usr/sbin/keepalived -D
root      31915  31914  0 11:40 ?        00:00:00 /usr/sbin/keepalived -D
root      31916  31914  0 11:40 ?        00:00:00 /usr/sbin/keepalived -D
root      31945  31806  0 11:40 pts/0    00:00:00 grep --color=auto keep
[root@lb01 keepalived]# ip addr|grep 192.168.9.210
    inet 192.168.9.210/24 scope global secondary eth0:3
[root@lb01 keepalived]# nginx -s stop
[root@lb01 keepalived]# ip addr|grep 192.168.9.210
[root@lb01 keepalived]# ps -ef|grep keep
root      31945  31806  0 11:40 pts/0    00:00:00 grep --color=auto keep

當(dāng)停掉Nginx的時(shí)候,Keepalived 2秒鐘內(nèi)會(huì)被自動(dòng)停掉,IP被釋放,由對(duì)端接管,這樣就實(shí)現(xiàn)了即使服務(wù)宕機(jī)也進(jìn)行IP漂移、業(yè)務(wù)切換,如果此時(shí)把Nginx和Keepalived同時(shí)啟動(dòng),IP又會(huì)被接管回來(lái)。
lb01的日志是Keepalived停掉的消息。

4. 解決多組Keepalived服務(wù)器在一個(gè)局域網(wǎng)內(nèi)沖突的問(wèn)題

當(dāng)在同一個(gè)局域網(wǎng)內(nèi)部署了多組Keepalived服務(wù)器而又未使用專(zhuān)門(mén)的心跳線通信時(shí),可能會(huì)發(fā)生高可用接管的嚴(yán)重故障問(wèn)題。前面已經(jīng)講解過(guò)Keepalived高可用功能是通過(guò)VRRP協(xié)議實(shí)現(xiàn)的,VRRP協(xié)議默認(rèn)通過(guò)IP多播的形式實(shí)現(xiàn)高可用服務(wù)對(duì)之間的通信,如果同一個(gè)局域網(wǎng)內(nèi)存在多組Keepalived服務(wù)器對(duì),就會(huì)造成IP多播地址沖突問(wèn)題,導(dǎo)致接管錯(cuò)亂,不同組的Keepalived都會(huì)使用默認(rèn)的224.0.0.18作為多播地址。此時(shí)的解決辦法是在同組的Keepalived服務(wù)器所有的配置文件里指定各自獨(dú)一無(wú)二的多播地址,配置如下:

global_defs {
  router_id LVS_19
  vrrp_mcast_group4 224.0.0.19    ---這個(gè)就是指定多播地址的配置
}

提示
1)不同實(shí)例的通信認(rèn)證密碼也最好不同,以確保接管正常。
2)另一款高可用軟件Hearbeat,如果采用多播方式實(shí)現(xiàn)主備通信,同樣會(huì)有多播地址沖突問(wèn)題。

5. 配置指定文件接收Keepalived服務(wù)日志

默認(rèn)情況下,Keepalived服務(wù)日志會(huì)輸出到系統(tǒng)日志/var/log/messages,和其他日志信息混合在一起,查看起來(lái)很不方便,可以將其調(diào)整成由獨(dú)立的文件記錄Keepalived服務(wù)日志。操作步驟如下:
1)編輯配置文件/etc/sysconfig/keepalived,將“KEEPALIVED_OPTIONS="-D"”修改為“KEEPALIVED_OPTIONS="-D -d -S 0"”,快速修改方法如下:

[root@lb01 keepalived]# sed -i '14 s#KEEPALIVED_OPTIONS="-D"#KEEPALIVED_OPTIONS="-D -d -S 0"#g' /etc/sysconfig/keepalived 
[root@lb01 keepalived]# sed -n '14p' /etc/sysconfig/keepalived 
KEEPALIVED_OPTIONS="-D -d -S 0"
說(shuō)明:可以查看/etc/sysconfig/keepalived里注釋獲得上述參數(shù)的說(shuō)明
--dump-conf    -d    導(dǎo)出備份配置數(shù)據(jù)
--log-detail    -D    詳細(xì)日志
--log-facility    -S    設(shè)置本地的syslog設(shè)備,編號(hào)0-7(default=LOG_DAEMON)
-S 0    表示指定為local0設(shè)備

2)修改rsyslog的配置文件vi /etc/rsyslog.conf,在結(jié)尾處加入如下兩行內(nèi)容:

#keepalived
local0.*                                                /var/log/keepalived.log

上述配置表示來(lái)自local0設(shè)備的所有日志信息都記錄到了/var/log/keepalived.log文件中。
然后在如下信息的第一列結(jié)尾加入“;local0.none”,注意有分號(hào)。

*.info;mail.none;authpriv.none;cron.none;local0.none /var/log/messages

上述配置表示來(lái)自local0設(shè)備的所有日志信息不再記錄于/var/log/messages里。
3)配置完成后,重啟rsyslog服務(wù)。

[root@lb01 keepalived]# systemctl restart rsyslog.service

4)測(cè)試Keepalived日志記錄結(jié)果。在重啟Keepalived服務(wù)后,就會(huì)把日志信息輸出到rsyslog定義的/var/log/keepalived.log文件中:

[root@lb01 keepalived]# systemctl restart keepalived
[root@lb01 keepalived]# tail /var/log/keepalived.log 
Apr  4 12:15:52 lb01 Keepalived_vrrp[32113]: Sending gratuitous ARP on eth0 for 192.168.9.210
Apr  4 12:15:52 lb01 Keepalived_vrrp[32113]: Sending gratuitous ARP on eth0 for 192.168.9.210
Apr  4 12:15:52 lb01 Keepalived_vrrp[32113]: Sending gratuitous ARP on eth0 for 192.168.9.210
Apr  4 12:15:52 lb01 Keepalived_vrrp[32113]: Sending gratuitous ARP on eth0 for 192.168.9.210
Apr  4 12:15:57 lb01 Keepalived_vrrp[32113]: Sending gratuitous ARP on eth0 for 192.168.9.210
Apr  4 12:15:57 lb01 Keepalived_vrrp[32113]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on eth0 for 192.168.9.210
Apr  4 12:15:57 lb01 Keepalived_vrrp[32113]: Sending gratuitous ARP on eth0 for 192.168.9.210
Apr  4 12:15:57 lb01 Keepalived_vrrp[32113]: Sending gratuitous ARP on eth0 for 192.168.9.210
Apr  4 12:15:57 lb01 Keepalived_vrrp[32113]: Sending gratuitous ARP on eth0 for 192.168.9.210
Apr  4 12:15:57 lb01 Keepalived_vrrp[32113]: Sending gratuitous ARP on eth0 for 192.168.9.210

如果要求更高,還可以在/var/log/messages上設(shè)置對(duì)/var/log/keepalived.log進(jìn)行輪詢,以防單個(gè)日志文件變得太大。

6. 開(kāi)發(fā)監(jiān)測(cè)Keepalived“裂腦”的腳本

檢測(cè)思路:在備節(jié)點(diǎn)上執(zhí)行腳本,如果可以ping通主節(jié)點(diǎn)并且備節(jié)點(diǎn)有VIP就報(bào)警,讓工作人員檢查是否裂腦。
1)在lb02備節(jié)點(diǎn)開(kāi)發(fā)腳本并執(zhí)行。

[root@lb02 scripts]# cat check_split_brain.sh 
#!/bin/bash
lb01_vip=192.168.9.210
lb01_ip=192.168.9.81
while true
do
ping -c 2 -W 3 $lb01_ip &>/dev/null
    if [ $? -eq 0 -a `ip addr|grep "$lb01_vip"|wc -l` -eq 1 ]
        then
            echo "ha is split brain.warning."
else
    echo "ha is ok."
fi
sleep 5
done
[root@lb02 scripts]# sh check_split_brain.sh 
ha is ok.
ha is ok.

正常情況下,主節(jié)點(diǎn)活著,VIP 192.168.9.210在主節(jié)點(diǎn),因此不會(huì)報(bào)警,提示“ha is ok.”。
2)停止Keepalived服務(wù)看lb02腳本執(zhí)行情況。lb01上:

[root@lb01 keepalived]# systemctl stop keepalived
[root@lb01 keepalived]# ip addr|grep 192.168.9.210

在lb02上觀察即可,此前腳本已經(jīng)執(zhí)行。

[root@lb02 scripts]# sh check_split_brain.sh 
ha is ok.
ha is ok.
ha is ok.
ha is ok.
ha is ok.
ha is ok.
ha is ok.
ha is ok.
ha is split brain.warning.
ha is split brain.warning.
ha is split brain.warning.
ha is split brain.warning.

3)關(guān)掉lb01服務(wù)器,然后再觀察lb02腳本的輸出。

[root@lb02 scripts]# sh check_split_brain.sh 
ha is ok.
ha is ok.
ha is ok.
ha is ok.
ha is ok.
ha is ok.
ha is ok.
ha is ok.
ha is split brain.warning.
ha is split brain.warning.
ha is split brain.warning.
ha is split brain.warning.
ha is split brain.warning.
ha is split brain.warning.
ha is split brain.warning.
ha is split brain.warning.
ha is ok.
ha is ok.

裂腦報(bào)警恢復(fù)了。
4)可以將此腳本整合到Nagios或Zabbix監(jiān)控服務(wù)里,進(jìn)行監(jiān)控報(bào)警。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容