[toc]
1. 概念
1.1 簡述
IPVS:IP虛擬服務(wù)器(IP Virtual Server)。是運(yùn)行在LVS下的提供負(fù)載平衡功能的一種技術(shù)。
作為傳輸層(四層)負(fù)載均衡,可以將基于TCP和UDP的服務(wù)請(qǐng)求轉(zhuǎn)發(fā)到真實(shí)服務(wù)器上,并使真實(shí)服務(wù)器的服務(wù)在單個(gè) IP 地址上顯示為虛擬服務(wù)。
1.2 輪詢策略
實(shí)際就是lvs的輪詢策略
輪詢調(diào)度 rr
這種算法是最簡單的,就是按依次循環(huán)的方式將請(qǐng)求調(diào)度到不同的服務(wù)器上,該算法最大的特點(diǎn)就是簡單。輪詢算法假設(shè)所有的服務(wù)器處理請(qǐng)求的能力都是一樣的,調(diào)度器會(huì)將所有的請(qǐng)求平均分配給每個(gè)真實(shí)服務(wù)器,不管后端 RS 配置和處理能力,非常均衡地分發(fā)下去。加權(quán)輪詢 wrr
這種算法比 rr 的算法多了一個(gè)權(quán)重的概念,可以給 RS 設(shè)置權(quán)重,權(quán)重越高,那么分發(fā)的請(qǐng)求數(shù)越多,權(quán)重的取值范圍 0 – 100。主要是對(duì)rr算法的一種優(yōu)化和補(bǔ)充, LVS 會(huì)考慮每臺(tái)服務(wù)器的性能,并給每臺(tái)服務(wù)器添加要給權(quán)值,如果服務(wù)器A的權(quán)值為1,服務(wù)器B的權(quán)值為2,則調(diào)度到服務(wù)器B的請(qǐng)求會(huì)是服務(wù)器A的2倍。權(quán)值越高的服務(wù)器,處理的請(qǐng)求越多。最少鏈接 lc
這個(gè)算法會(huì)根據(jù)后端 RS 的連接數(shù)來決定把請(qǐng)求分發(fā)給誰,比如 RS1 連接數(shù)比 RS2 連接數(shù)少,那么請(qǐng)求就優(yōu)先發(fā)給 RS1加權(quán)最少鏈接 wlc
這個(gè)算法比 lc 多了一個(gè)權(quán)重的概念。基于局部性的最少連接調(diào)度算法 lblc
這個(gè)算法是請(qǐng)求數(shù)據(jù)包的目標(biāo) IP 地址的一種調(diào)度算法,該算法先根據(jù)請(qǐng)求的目標(biāo) IP 地址尋找最近的該目標(biāo) IP 地址所有使用的服務(wù)器,如果這臺(tái)服務(wù)器依然可用,并且有能力處理該請(qǐng)求,調(diào)度器會(huì)盡量選擇相同的服務(wù)器,否則會(huì)繼續(xù)選擇其它可行的服務(wù)器復(fù)雜的基于局部性最少的連接算法 lblcr
記錄的不是要給目標(biāo) IP 與一臺(tái)服務(wù)器之間的連接記錄,它會(huì)維護(hù)一個(gè)目標(biāo) IP 到一組服務(wù)器之間的映射關(guān)系,防止單點(diǎn)服務(wù)器負(fù)載過高。目標(biāo)地址散列調(diào)度算法 dh
該算法是根據(jù)目標(biāo) IP 地址通過散列函數(shù)將目標(biāo) IP 與服務(wù)器建立映射關(guān)系,出現(xiàn)服務(wù)器不可用或負(fù)載過高的情況下,發(fā)往該目標(biāo) IP 的請(qǐng)求會(huì)固定發(fā)給該服務(wù)器。源地址散列調(diào)度算法 sh
與目標(biāo)地址散列調(diào)度算法類似,但它是根據(jù)源地址散列算法進(jìn)行靜態(tài)分配固定的服務(wù)器資源。
2. ipvsadm 命令
2.1 --help
[root@DoM01 ~]# ipvsadm --help
ipvsadm v1.27 2008/5/15 (compiled with popt and IPVS v1.2.1)
Usage:
ipvsadm -A|E -t|u|f service-address [-s scheduler] [-p [timeout]] [-M netmask] [--pe persistence_engine] [-b sched-flags]
ipvsadm -D -t|u|f service-address
ipvsadm -C
ipvsadm -R
ipvsadm -S [-n]
ipvsadm -a|e -t|u|f service-address -r server-address [options]
ipvsadm -d -t|u|f service-address -r server-address
ipvsadm -L|l [options]
ipvsadm -Z [-t|u|f service-address]
ipvsadm --set tcp tcpfin udp
ipvsadm --start-daemon state [--mcast-interface interface] [--syncid sid]
ipvsadm --stop-daemon state
ipvsadm -h
Commands:
Either long or short options are allowed.
--add-service -A add virtual service with options
--edit-service -E edit virtual service with options
--delete-service -D delete virtual service
--clear -C clear the whole table
--restore -R restore rules from stdin
--save -S save rules to stdout
--add-server -a add real server with options
--edit-server -e edit real server with options
--delete-server -d delete real server
--list -L|-l list the table
--zero -Z zero counters in a service or all services
--set tcp tcpfin udp set connection timeout values
--start-daemon start connection sync daemon
--stop-daemon stop connection sync daemon
--help -h display this help message
Options:
--tcp-service -t service-address service-address is host[:port]
--udp-service -u service-address service-address is host[:port]
--fwmark-service -f fwmark fwmark is an integer greater than zero
--ipv6 -6 fwmark entry uses IPv6
--scheduler -s scheduler one of rr|wrr|lc|wlc|lblc|lblcr|dh|sh|sed|nq,
the default scheduler is wlc.
--pe engine alternate persistence engine may be sip,
not set by default.
--persistent -p [timeout] persistent service
--netmask -M netmask persistent granularity mask
--real-server -r server-address server-address is host (and port)
--gatewaying -g gatewaying (direct routing) (default)
--ipip -i ipip encapsulation (tunneling)
--masquerading -m masquerading (NAT)
--weight -w weight capacity of real server
--u-threshold -x uthreshold upper threshold of connections
--l-threshold -y lthreshold lower threshold of connections
--mcast-interface interface multicast interface for connection sync
--syncid sid syncid for connection sync (default=255)
--connection -c output of current IPVS connections
--timeout output of timeout (tcp tcpfin udp)
--daemon output of daemon information
--stats output of statistics information
--rate output of rate information
--exact expand numbers (display exact values)
--thresholds output of thresholds information
--persistent-conn output of persistent connection info
--nosort disable sorting output of service/server entries
--sort does nothing, for backwards compatibility
--ops -o one-packet scheduling
--numeric -n numeric output of addresses and ports
--sched-flags -b flags scheduler flags (comma-separated)
2.2 常用組合
2.2.1 ipvsadm -ln
[root@DoM01 ~]# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 127.0.0.1:30011 rr
-> 10.244.3.5:6380 Masq 1 0 0
-> 10.244.6.83:6380 Masq 1 0 0
-> 10.244.8.140:6380 Masq 1 0 0
TCP 127.0.0.1:30521 rr
-> 10.244.4.34:30521 Masq 1 0 0
TCP 127.0.0.1:30569 rr
-> 10.244.6.165:80 Masq 1 0 0
TCP 127.0.0.1:30572 rr
-> 10.244.9.20:8720 Masq 1 0 0
TCP 172.17.0.1:30006 rr
-> 10.244.5.159:3306 Masq 1 0 0
-> 10.244.8.229:3306 Masq 1 0 0
-> 10.244.10.236:3306 Masq 1 0 0
......
下邊以 30006 這一條為例說明
-
172.17.0.1
訪問的IP,這里是docker0 網(wǎng)卡的
[root@DoM01 ~]# ip a
......
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:dc:95:0b:b3 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::42:dcff:fe95:bb3/64 scope link
valid_lft forever preferred_lft forever
......
-
30006
這個(gè)是nodePort的端口,我們查看一下:
[root@DoM01 ~]# kubectl get service -A|grep 30006
mysql mysqlha-readonly NodePort 10.1.61.20 <none> 3306:30006/TCP 155d
定位到了mysql這各namespace 下的 mysqlha-readonly 這個(gè)service。
rr
輪巡方式,見 110.244.5.159:3306
后邊3行是后端 3個(gè)pod的 ip:端口
[root@DoM01 ~]# kubectl get pod -n mysql -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mysqld-exporter-657cd49787-bg228 1/1 Running 0 62d 10.244.10.234 don08 <none> <none>
mysqlha-0 2/2 Running 0 3d2h 10.244.10.236 don08 <none> <none>
mysqlha-1 2/2 Running 0 22d 10.244.8.229 don06 <none> <none>
mysqlha-2 2/2 Running 0 21d 10.244.5.159 don03 <none> <none>
phpmyadmin-579d966787-9gcpr 1/1 Running 1 155d 10.244.8.139 don06 <none> <none>
2.2.2 ipvsadm -l --rate
[root@DoM01 ~]# ipvsadm -l --rate
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port CPS InPPS OutPPS InBPS OutBPS
-> RemoteAddress:Port
TCP 10.10.239.100:30018 0 37 29 11695 8651
-> 10.244.4.68:8848 0 32 26 10726 7469
TCP 10.10.239.100:30019 0 0 0 0 0
-> 10.244.4.68:7848 0 0 0 0 0
TCP 10.10.239.100:30020 0 1 1 48 499
-> 10.244.8.143:6379 0 1 1 36 375
......
說明:
CPS(current connection rate) 每秒連接數(shù)
InPPS(current in packet rate) 每秒的入包個(gè)數(shù)
OutPPS(current out packet rate) 每秒的出包個(gè)數(shù)
InBPS(current in byte rate) 每秒入流量(字節(jié))
OutBPS(current out byte rate) 每秒入流量(字節(jié))
2.2.3 ipvsadm -l --stats
[root@DoM01 ~]# ipvsadm -l --stats
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Conns InPkts OutPkts InBytes OutBytes
-> RemoteAddress:Port
......
TCP 10.10.239.100:30018 1640595 174592K 140093K 54052M 42661M
-> 10.244.4.68:8848 921414 114163K 92214336 35976M 28035M
TCP 10.10.239.100:30019 0 0 0 0 0
-> 10.244.4.68:7848 0 0 0 0 0
TCP 10.10.239.100:30020 197157 3886077 3422894 247816K 2565M
-> 10.244.8.143:6379 197157 3886077 3422894 247816K 2565M
TCP 10.10.239.100:30021 0 0 0 0 0
-> 10.244.8.143:26379 0 0 0 0 0
TCP 10.10.239.100:30022 197165 4075671 3501341 257677K 2500M
-> 10.244.6.80:6379 197166 4075691 3501358 257678K 2500M
......
說明:
Conns(connections scheduled) 已經(jīng)轉(zhuǎn)發(fā)過的連接數(shù)
InPkts(incoming packets) 入包個(gè)數(shù)
OutPkts(outgoing packets) 出包個(gè)數(shù)
InBytes(incoming bytes) 入流量(字節(jié))
OutBytes(outgoing bytes) 出流量(字節(jié)
2.2.4 超時(shí)時(shí)間
- 查看超時(shí)時(shí)間
[root@DoM01 ~]# ipvsadm -ln --timeout
Timeout (tcp tcpfin udp): 900 120 300
說明:
tcpfin, 對(duì)于本端斷開的socket連接,TCP保持在FIN_WAIT_2狀態(tài)的時(shí)間。
- 設(shè)置超時(shí)時(shí)間
# ipvsadm --set 900 60 300