
Redis 5 集群選舉原理分析
Redis系統(tǒng)介紹:
Redis的基礎(chǔ)介紹與安裝使用步驟:http://www.itdecent.cn/p/2a23257af57b
Redis的基礎(chǔ)數(shù)據(jù)結(jié)構(gòu)與使用:http://www.itdecent.cn/p/c95c8450c5b6
Redis核心原理:http://www.itdecent.cn/p/4e6b7809e10a
Redis 5 之后版本的高可用集群搭建:http://www.itdecent.cn/p/8045b92fafb2
Redis 5 版本的高可用集群的水平擴(kuò)展:http://www.itdecent.cn/p/6355d0827aea
Redis 5 集群選舉原理分析:http://www.itdecent.cn/p/e6894713a6d5
Redis 5 通信協(xié)議解析以及手寫一個(gè)Jedis客戶端:http://www.itdecent.cn/p/575544f68615
說下這個(gè)參數(shù)
cluster-node-timeout
真實(shí)世界的機(jī)房網(wǎng)絡(luò)往往并不是風(fēng)平浪靜的,它們經(jīng)常會(huì)發(fā)生各種各樣的小問題。比如網(wǎng)絡(luò)抖動(dòng)就是非常常見的一種現(xiàn)象,突然之間部分連接變得不可訪問,然后很快又恢復(fù)正常。
為解決這種問題,Redis Cluster 提供了一種選項(xiàng)cluster-node-timeout,表示當(dāng)某個(gè)節(jié)點(diǎn)持續(xù) timeout 的時(shí)間失聯(lián)時(shí),才可以認(rèn)定該節(jié)點(diǎn)出現(xiàn)故障,需要進(jìn)行主從切換。如果沒有這個(gè)選項(xiàng),網(wǎng)絡(luò)抖動(dòng)會(huì)導(dǎo)致主從頻繁切換 (數(shù)據(jù)的重新復(fù)制)。
開始,先重現(xiàn)主從切換
先查看一下當(dāng)前集群節(jié)點(diǎn)信息
[root@localhost redis-cluster]# /usr/local/redis/redis-5.0.2/src/redis-cli -c -h 192.168.5.100 -p 8001
192.168.5.100:8001> cluster nodes
412b26f846f27a63484594af931b9fb3b612ee9c 192.168.5.100:8003@18003 master - 0 1544973259457 3 connected 10923-16383
6c2000bf49a6e8229e518432c74d222521ff2f41 192.168.5.100:8005@18005 slave 1204327cbb9eaf4c9be0b90880531f6861e65f13 0 1544973258448 5 connected
52c723f6d391bc2975b27a5210451a2cf590d939 192.168.5.100:8002@18002 master - 0 1544973258000 2 connected 5461-10922
1d8f23d4ed7ccdfb5a6f2d8b9b6286cfd25d1d4b 192.168.5.100:8006@18006 slave 52c723f6d391bc2975b27a5210451a2cf590d939 0 1544973260466 6 connected
1204327cbb9eaf4c9be0b90880531f6861e65f13 192.168.5.100:8001@18001 myself,master - 0 1544973257000 1 connected 0-5460
886c96bfdef55b2fbcfee6eceb77fcf294fdfe33 192.168.5.100:8004@18004 slave 412b26f846f27a63484594af931b9fb3b612ee9c 0 1544973257440 4 connected
192.168.5.100:8001>
這時(shí)候我們kill掉一個(gè)master,再查看一下節(jié)點(diǎn)信息
[root@localhost redis-cluster]# /usr/local/redis/redis-5.0.2/src/redis-cli -c -h 192.168.5.100 -p 8002
192.168.5.100:8002> cluster nodes
6c2000bf49a6e8229e518432c74d222521ff2f41 192.168.5.100:8005@18005 slave 1204327cbb9eaf4c9be0b90880531f6861e65f13 0 1544973521650 5 connected
1d8f23d4ed7ccdfb5a6f2d8b9b6286cfd25d1d4b 192.168.5.100:8006@18006 slave 52c723f6d391bc2975b27a5210451a2cf590d939 0 1544973521000 6 connected
886c96bfdef55b2fbcfee6eceb77fcf294fdfe33 192.168.5.100:8004@18004 slave 412b26f846f27a63484594af931b9fb3b612ee9c 0 1544973522659 4 connected
412b26f846f27a63484594af931b9fb3b612ee9c 192.168.5.100:8003@18003 master - 0 1544973520000 3 connected 10923-16383
1204327cbb9eaf4c9be0b90880531f6861e65f13 192.168.5.100:8001@18001 master - 1544973509291 1544973508479 1 disconnected 0-5460
52c723f6d391bc2975b27a5210451a2cf590d939 192.168.5.100:8002@18002 myself,master - 0 1544973522000 2 connected 5461-10922
192.168.5.100:8002> cluster nodes
6c2000bf49a6e8229e518432c74d222521ff2f41 192.168.5.100:8005@18005 master - 0 1544973547997 7 connected 0-5460
1d8f23d4ed7ccdfb5a6f2d8b9b6286cfd25d1d4b 192.168.5.100:8006@18006 slave 52c723f6d391bc2975b27a5210451a2cf590d939 0 1544973549010 6 connected
886c96bfdef55b2fbcfee6eceb77fcf294fdfe33 192.168.5.100:8004@18004 slave 412b26f846f27a63484594af931b9fb3b612ee9c 0 1544973546000 4 connected
412b26f846f27a63484594af931b9fb3b612ee9c 192.168.5.100:8003@18003 master - 0 1544973543000 3 connected 10923-16383
1204327cbb9eaf4c9be0b90880531f6861e65f13 192.168.5.100:8001@18001 master,fail - 1544973509291 1544973508479 1 disconnected
52c723f6d391bc2975b27a5210451a2cf590d939 192.168.5.100:8002@18002 myself,master - 0 1544973548000 2 connected 5461-10922
192.168.5.100:8002>
我們看到這時(shí)候,8001顯示的是fail,而8005則選舉為master,這時(shí)候我們將8001重新啟動(dòng),查看一下節(jié)點(diǎn)信息
192.168.5.100:8002> cluster nodes
6c2000bf49a6e8229e518432c74d222521ff2f41 192.168.5.100:8005@18005 master - 0 1544973819402 7 connected 0-5460
1d8f23d4ed7ccdfb5a6f2d8b9b6286cfd25d1d4b 192.168.5.100:8006@18006 slave 52c723f6d391bc2975b27a5210451a2cf590d939 0 1544973821440 6 connected
886c96bfdef55b2fbcfee6eceb77fcf294fdfe33 192.168.5.100:8004@18004 slave 412b26f846f27a63484594af931b9fb3b612ee9c 0 1544973822000 4 connected
412b26f846f27a63484594af931b9fb3b612ee9c 192.168.5.100:8003@18003 master - 0 1544973821000 3 connected 10923-16383
1204327cbb9eaf4c9be0b90880531f6861e65f13 192.168.5.100:8001@18001 slave 6c2000bf49a6e8229e518432c74d222521ff2f41 0 1544973822450 7 connected
52c723f6d391bc2975b27a5210451a2cf590d939 192.168.5.100:8002@18002 myself,master - 0 1544973821000 2 connected 5461-10922
192.168.5.100:8002>
這時(shí)候8001位slave,它的master:6c2000bf49a6e8229e518432c74d222521ff2f41,也就是8005,可見redis的高可用還是靠譜的。
原理分析:
當(dāng)slave發(fā)現(xiàn)自己的master變?yōu)镕AIL狀態(tài)時(shí),便嘗試進(jìn)行Failover,以期成為新的master。由于掛掉的master可能會(huì)有多個(gè)slave,從而存在多個(gè)slave競爭成為master節(jié)點(diǎn)的過程, 其過程如下:
1.slave發(fā)現(xiàn)自己的master變?yōu)镕AIL
2.將自己記錄的集群currentEpoch加1,并廣播FAILOVER_AUTH_REQUEST信息
3.其他節(jié)點(diǎn)收到該信息,只有master響應(yīng),判斷請(qǐng)求者的合法性,并發(fā)送FAILOVER_AUTH_ACK,對(duì)每一個(gè)epoch只發(fā)送一次ack
4.嘗試failover的slave收集FAILOVER_AUTH_ACK
5.超過半數(shù)后變成新Master
6.廣播Pong通知其他集群節(jié)點(diǎn)。
從節(jié)點(diǎn)并不是在主節(jié)點(diǎn)一進(jìn)入 FAIL 狀態(tài)就馬上嘗試發(fā)起選舉,而是有一定延遲,一定的延遲確保我們等待FAIL狀態(tài)在集群中傳播,slave如果立即嘗試選舉,其它masters或許尚未意識(shí)到FAIL狀態(tài),可能會(huì)拒絕投票
延遲計(jì)算公式:
DELAY = 500ms + random(0 ~ 500ms) + SLAVE_RANK * 1000ms
SLAVE_RANK表示此slave已經(jīng)從master復(fù)制數(shù)據(jù)的總量的rank。Rank越小代表已復(fù)制的數(shù)據(jù)越新。這種方式下,持有最新數(shù)據(jù)的slave將會(huì)首先發(fā)起選舉(理論上)。
補(bǔ)充之前的一個(gè)問題:
跳轉(zhuǎn)重定位
當(dāng)客戶端向一個(gè)錯(cuò)誤的節(jié)點(diǎn)發(fā)出了指令,該節(jié)點(diǎn)會(huì)發(fā)現(xiàn)指令的 key 所在的槽位并不歸自己管理,這時(shí)它會(huì)向客戶端發(fā)送一個(gè)特殊的跳轉(zhuǎn)指令攜帶目標(biāo)操作的節(jié)點(diǎn)地址,告訴客戶端去連這個(gè)節(jié)點(diǎn)去獲取數(shù)據(jù)??蛻舳耸盏街噶詈蟪颂D(zhuǎn)到正確的節(jié)點(diǎn)上去操作,還會(huì)同步更新糾正本地的槽位映射表緩存,后續(xù)所有 key 將使用新的槽位映射表。
[root@localhost 8001]# /usr/local/redis/redis-5.0.2/src/redis-cli -c -h 192.168.5.100 -p 8003
192.168.5.100:8003> get name
-> Redirected to slot [5798] located at 192.168.5.100:8002
"xxx"
192.168.5.100:8002>