Redis分布式緩存搭建

花了兩天時(shí)間整理了之前記錄的Redis單體與哨兵模式的搭建與使用,又補(bǔ)齊了集群模式的使用和搭建經(jīng)驗(yàn),并對(duì)集群的一些個(gè)原理做了理解。

1、安裝Redis
$ wget http://download.redis.io/releases/redis-6.0.3.tar.gz 
$ tar -xzf redis-6.0.3.tar.gz 
$ cd redis-6.0.3 
$ make
$ make install

筆者安裝中遇到的一些問題:

如果make報(bào)錯(cuò),可能是沒裝gcc或者gcc++編輯器,安裝之 yum -y install gcc gcc-c++ kernel-devel,有可能還是提示一些個(gè)c文件編譯不過,gcc -v查看下版本,如果不到5.3那么升級(jí)一下gcc:

yum -y install centos-release-scl 
yum -y install devtoolset-9-gcc devtoolset-9-gcc-c++ devtoolset-9-binutils

/etc/profile追加一行 source /opt/rh/devtoolset-9/enable

scl enable devtoolset-9 bash

重新make clean, make

這回編譯通過了,提示讓你最好make test一下/

執(zhí)行make test ,如果提示You need tcl 8.5 or newer in order to run the Redis test

那就升級(jí)tcl, yum install tcl

重新make test,如果還有error就刪了目錄,重新tar包解壓重新make , make test

\o/ All tests passed without errors!,表示編譯成功。

然后make install即可。

2、啟動(dòng)Redis

直接運(yùn)行命令: ./redis-server /usr/redis-6.0.3/redis.conf &

[root@VM_0_11_centos src]# ./redis-server /usr/redis-6.0.3/redis.conf &
[1] 4588
[root@VM_0_11_centos src]# 4588:C 22 May 2020 19:45:15.179 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
4588:C 22 May 2020 19:45:15.179 # Redis version=6.0.3, bits=64, commit=00000000, modified=0, pid=4588, just started
4588:C 22 May 2020 19:45:15.179 # Configuration loaded
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 6.0.3 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 4588
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'

4588:M 22 May 2020 19:45:15.180 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
4588:M 22 May 2020 19:45:15.180 # Server initialized
4588:M 22 May 2020 19:45:15.180 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
4588:M 22 May 2020 19:45:15.180 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
4588:M 22 May 2020 19:45:15.180 * Loading RDB produced by version 6.0.3
4588:M 22 May 2020 19:45:15.180 * RDB age 44 seconds
4588:M 22 May 2020 19:45:15.180 * RDB memory usage when created 0.77 Mb
4588:M 22 May 2020 19:45:15.180 * DB loaded from disk: 0.000 seconds
4588:M 22 May 2020 19:45:15.180 * Ready to accept connections

redis.conf配置文件里bind 0.0.0.0設(shè)置外部訪問, requirepass xxxx 設(shè)置密碼。

3、Redis高可用

redis高可用方案有兩種:

  • Replication-Sentinel 主從復(fù)制+哨兵

  • cluster 集群模式

常用搭建方案為1主1從或1主2從+3哨兵監(jiān)控主節(jié)點(diǎn), 以及3主3從6節(jié)點(diǎn)集群。

(1)sentinel哨兵

/usr/redis-6.0.3/src/redis-sentinel /usr/redis-6.0.3/sentinel2.conf &

sentinel2.conf配置:

port 26380   #本哨兵的端口

daemonize yes

pidfile "/var/run/redis-sentinel2.pid"  #哨兵daemonize模式需要的pid文件

logfile ""

dir "/tmp"

sentinel myid 5736b9ca22cf0899276316e71810566044d75d14

sentinel deny-scripts-reconfig yes

sentinel monitor mymaster 122.xx.xxx.xxx 6379 2 #至少2個(gè)哨兵投票選舉認(rèn)為master掛了

sentinel auth-pass mymaster xxxxxxx #哨兵連接master的密碼

sentinel down-after-milliseconds mymaster 30000 #30秒無應(yīng)答認(rèn)為master掛了

sentinel failover-timeout mymaster 30000 #如果在該30秒內(nèi)未能完成failover操作,則認(rèn)為該failover失敗

sentinel config-epoch mymaster 0

protected-mode no

user default on nopass ~* +@all

sentinel leader-epoch mymaster 0

sentinel current-epoch 0

坑1:master節(jié)點(diǎn)也會(huì)在故障轉(zhuǎn)移后成為從節(jié)點(diǎn),也需要配置masterauth

當(dāng)kill master進(jìn)程之后,經(jīng)過sentinel選舉,slave成為了新的master,再次啟動(dòng)原master,提示如下錯(cuò)誤:

692:S 06 Jun 2020 13:19:35.280 * (Non critical) Master does not understand REPLCONF listening-port: -NOAUTH Authentication required.
692:S 06 Jun 2020 13:19:35.280 * (Non critical) Master does not understand REPLCONF capa: -NOAUTH Authentication required.
692:S 06 Jun 2020 13:19:35.280 * Partial resynchronization not possible (no cached master)
692:S 06 Jun 2020 13:19:35.280 # Unexpected reply to PSYNC from master: -NOAUTH Authentication required.
692:S 06 Jun 2020 13:19:35.280 * Retrying with SYNC...
692:S 06 Jun 2020 13:19:35.280 # MASTER aborted replication with an error: NOAUTH Authentication required.
692:S 06 Jun 2020 13:19:36.282 * Connecting to MASTER 127.0.0.1:7001
692:S 06 Jun 2020 13:19:36.282 * MASTER <-> REPLICA sync started
692:S 06 Jun 2020 13:19:36.282 * Non blocking connect for SYNC fired the event.
692:S 06 Jun 2020 13:19:36.282 * Master replied to PING, replication can continue...

原因是此時(shí)的master再次啟動(dòng)已經(jīng)是slave了,需要向現(xiàn)在的新master輸入密碼,所以需要在master.conf
中配置:

masterauth xxxx   #xxxx是在slave.conf中的requirepass xxxx 密碼

坑2:哨兵配置文件要暴露客戶端可以訪問到的master地址

sentinel.conf配置文件的sentinel monitor mymaster 122.xx.xxx.xxx 6379 2 中,配置該哨兵對(duì)應(yīng)的master名字、master地址和端口,以及達(dá)到多少個(gè)哨兵選舉通過認(rèn)為master掛掉。其中master地址要站在redis訪問者(也就是客戶端)的角度、配置訪問者能訪問的地址,例如sentinel與master在一臺(tái)服務(wù)器(122.xx.xxx.xxx)上,那么相對(duì)sentinel其master在本機(jī)也就是127.0.0.1上,這樣sentinel monitor mymaster 127.0.0.1 6379 2邏輯上沒有問題,但是如果另外服務(wù)器上的springboot通過lettuce訪問這個(gè)redis哨兵,則得到的master地址為127.0.0.1,也就是springboot所在服務(wù)器本機(jī),這顯然就有問題了。

附springboot2.1 redis哨兵配置:

spring.redis.sentinel.master=mymaster
spring.redis.sentinel.nodes=122.xx.xxx.xxx:26379,122.xx.xxx.xxx:26380,122.xx.xxx.xxx:26381
#spring.redis.host=122.xx.xxx.xxx  #單機(jī)模式
#spring.redis.port=6379     #單機(jī)模式
spring.redis.timeout=6000
spring.redis.password=xxxxxx
#spring.redis.lettuce.pool.max-active=16 #lettuce底層使用Netty,連接共享,一般不需要連接池
#spring.redis.lettuce.pool.max-wait=3000
#spring.redis.letture.pool.max-idle=12
#spring.redis.lettuce.pool.min-idle=4

坑3:要注意配置文件.conf會(huì)被哨兵修改

redis-cli -h localhost -p 26379 ,可以登到sentinel上用info命令查看一下哨兵的信息。

曾經(jīng)遇到過這樣一個(gè)問題,大致的信息如下

master0:name=mymaster,status=down,address=127.0.0.1:7001,slaves=2,sentinels=3

slaves莫名其妙多了一個(gè),master的地址也明明改了真實(shí)對(duì)外的地址,這里又變成127.0.0.1 !
最后,把5個(gè)redis進(jìn)程都停掉,逐個(gè)檢查配置文件,發(fā)現(xiàn)redis的配置文件在主從哨兵模式會(huì)被修改,master的配置文件最后邊莫名其妙多了一行replicaof 127.0.0.1 7001, 懷疑應(yīng)該是之前配置錯(cuò)誤的時(shí)候(見坑2)被哨兵動(dòng)態(tài)加上去的! 總之,實(shí)踐中一定要多注意配置文件的變化。

(2)集群

當(dāng)數(shù)據(jù)量大到一定程度,比如幾十上百G,哨兵模式不夠用了需要做水平拆分,早些年是使用codis,twemproxy這些第三方中間件來做分片的,即客戶端 -> 中間件 -> Redis server這樣的模式,中間件使用一致性Hash算法來確定key在哪個(gè)分片上。后來Redis官方提供了方案,大家就都采用官方的Redis Cluster方案了。

Redis Cluster從邏輯上分16384個(gè)hash slot,分片算法是 CRC16(key) mod 16384 得到key應(yīng)該對(duì)應(yīng)哪個(gè)slot,據(jù)此判斷這個(gè)slot屬于哪個(gè)節(jié)點(diǎn)。

每個(gè)節(jié)點(diǎn)可以設(shè)置1或多個(gè)從節(jié)點(diǎn),常用的是3主節(jié)點(diǎn)3從節(jié)點(diǎn)的方案。

reshard,重新分片,可以指定從哪幾個(gè)節(jié)點(diǎn)移動(dòng)一些hash槽到另一個(gè)節(jié)點(diǎn)去。重新分片的過程對(duì)客戶端透明,不影響線上業(yè)務(wù)。

搭建Redis cluster

redis.conf文件關(guān)鍵的幾個(gè)配置:

port 7001  # 端口,每個(gè)配置文件不同7001-7006
cluster-enabled yes # 啟用集群模式
cluster-config-file nodes-7001.conf #節(jié)點(diǎn)配置文件
cluster-node-timeout 15000 # 超時(shí)時(shí)間
appendonly yes # 打開aof持久化
daemonize yes # 后臺(tái)運(yùn)行
pidfile  /var/run/redis_7001.pid # 根據(jù)端口修改
dir /usr/redis-6.0.3/cluster-data/7001 # redis實(shí)例數(shù)據(jù)配置存儲(chǔ)位置

啟動(dòng)6個(gè)集群節(jié)點(diǎn)

/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7001/redis.conf &
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7002/redis.conf &
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7003/redis.conf &
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7004/redis.conf &
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7005/redis.conf &
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7006/redis.conf &

[root@VM_0_11_centos redis-6.0.3]# ps -ef|grep redis
root 5508 1 0 21:25 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7001 [cluster]
root 6903 1 0 21:32 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7002 [cluster]
root 6939 1 0 21:33 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7003 [cluster]
root 6966 1 0 21:33 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7004 [cluster]
root 6993 1 0 21:33 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7005 [cluster]
root 7015 1 0 21:33 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7006 [cluster]

這時(shí)候這6個(gè)節(jié)點(diǎn)還是獨(dú)立的,要把他們配置成集群:

redis-cli -a xxxx --cluster create --cluster-replicas 1 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 127.0.0.1:7006

說明: -a xxxx是因?yàn)楣P者在redis.conf中配置了requirepass xxxx密碼,然后--cluster-replicas 1中的1表示每個(gè)master節(jié)點(diǎn)有1個(gè)從節(jié)點(diǎn)。

上述命令執(zhí)行完以后會(huì)有一個(gè)詢問:Can I set the above configuration? yes同意自動(dòng)做好的分片即可。

>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 127.0.0.1:7005 to 127.0.0.1:7001
Adding replica 127.0.0.1:7006 to 127.0.0.1:7002
Adding replica 127.0.0.1:7004 to 127.0.0.1:7003
>>> Trying to optimize slaves allocation for anti-affinity
[WARNING] Some slaves are in the same host as their master
M: 91143d1715cb3d5234f1ab67559b621ec51475c9 127.0.0.1:7001
   slots:[0-5460] (5461 slots) master
M: 04acd6b82ec44ff9ff738a1dd0c77a2efc29f494 127.0.0.1:7002
   slots:[5461-10922] (5462 slots) master
M: 3adde17eca1e7e7baf03fd2220565c0f2b6c7bbf 127.0.0.1:7003
   slots:[10923-16383] (5461 slots) master
S: d8df06b10d75613328b30f38d458b0ca094fc997 127.0.0.1:7004
   replicates 04acd6b82ec44ff9ff738a1dd0c77a2efc29f494
S: 45f8a0253fc5653fdab27dc4a0455e7a92dae88d 127.0.0.1:7005
   replicates 3adde17eca1e7e7baf03fd2220565c0f2b6c7bbf
S: 80350bc7927c943ffd4afdc014448be5407c258b 127.0.0.1:7006
   replicates 91143d1715cb3d5234f1ab67559b621ec51475c9
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
.......
>>> Performing Cluster Check (using node 127.0.0.1:7001)
M: 91143d1715cb3d5234f1ab67559b621ec51475c9 127.0.0.1:7001
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: 3adde17eca1e7e7baf03fd2220565c0f2b6c7bbf 127.0.0.1:7003
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: d8df06b10d75613328b30f38d458b0ca094fc997 127.0.0.1:7004
   slots: (0 slots) slave
   replicates 04acd6b82ec44ff9ff738a1dd0c77a2efc29f494
S: 45f8a0253fc5653fdab27dc4a0455e7a92dae88d 127.0.0.1:7005
   slots: (0 slots) slave
   replicates 3adde17eca1e7e7baf03fd2220565c0f2b6c7bbf
M: 04acd6b82ec44ff9ff738a1dd0c77a2efc29f494 127.0.0.1:7002
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 80350bc7927c943ffd4afdc014448be5407c258b 127.0.0.1:7006
   slots: (0 slots) slave
   replicates 91143d1715cb3d5234f1ab67559b621ec51475c9
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

最后All 16384 slots covered.表示集群中16384個(gè)slot中的每一個(gè)都有至少有1個(gè)master節(jié)點(diǎn)在處理,集群啟動(dòng)成功。

查看集群狀態(tài):

[root@VM_0_11_centos redis-6.0.3]# redis-cli -c -p 7001
127.0.0.1:7001> auth xxxx
OK
127.0.0.1:7001> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:859
cluster_stats_messages_pong_sent:893
cluster_stats_messages_sent:1752
cluster_stats_messages_ping_received:888
cluster_stats_messages_pong_received:859
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:1752

坑1:暴露給客戶端的節(jié)點(diǎn)地址不對(duì)

使用lettuce連接發(fā)現(xiàn)連不上,查看日志Connection refused: no further information: /127.0.0.1:7002,跟之前哨兵配置文件sentinel.conf里邊配置master地址犯的錯(cuò)誤一樣,集群啟動(dòng)的時(shí)候帶的地址應(yīng)該是提供給客戶端訪問的地址。

我們要重建集群:先把6個(gè)redis進(jìn)程停掉,然后刪除nodes-7001.conf這些節(jié)點(diǎn)配置文件,刪除持久化文件dump.rdb、appendonly.aof,重新啟動(dòng)6個(gè)進(jìn)程,在重新建立集群:

redis-cli -a xxpwdxx --cluster create --cluster-replicas 1 122.xx.xxx.xxx:7001 122.xx.xxx.xxx:7002 122.xx.xxx.xxx:7003 122.xx.xxx.xxx:7004 122.xx.xxx.xxx:7005 122.xx.xxx.xxx:7006

然后,還是連不上,這次報(bào)錯(cuò)connection timed out: /172.xx.0.xx:7004,發(fā)現(xiàn)連到企鵝云服務(wù)器的內(nèi)網(wǎng)地址上了!

解決辦法,修改每個(gè)節(jié)點(diǎn)的redis.conf配置文件,找到如下說明:

# In certain deployments, Redis Cluster nodes address discovery fails, because
# addresses are NAT-ted or because ports are forwarded (the typical case is
# Docker and other containers).
#
# In order to make Redis Cluster working in such environments, a static
# configuration where each node knows its public address is needed. The
# following two options are used for this scope, and are:
#
# * cluster-announce-ip
# * cluster-announce-port
# * cluster-announce-bus-port

所以增加配置:

cluster-announce-ip 122.xx.xxx.xxx
cluster-announce-port 7001
cluster-announce-bus-port 17001

然后再重新構(gòu)建集群,停進(jìn)程、改配置、刪除節(jié)點(diǎn)文件和持久化文件、啟動(dòng)進(jìn)程、配置集群。。。再來一套(累死了)

重新使用Lettuce測(cè)試,這次終于連上了!

坑2:Lettuce客戶端在master節(jié)點(diǎn)故障時(shí)沒有自動(dòng)切換到從節(jié)點(diǎn)

name這個(gè)key在7002上,kill這個(gè)進(jìn)程模擬master下線,然后Lettuce一直重連。我們期望的是應(yīng)該能自動(dòng)切換到其slave 7006上去,如下圖:

重新啟動(dòng)7002進(jìn)程,

127.0.0.1:7001> cluster nodes
4b66a7d28ebbfa149f7f2ad1dd1d3cbbc1e79659 122.51.112.187:7003@17003 master - 0 1638243049258 3 connected 10923-16383
16a3da4143ee873b9ed82d217db9819c8d945d30 122.51.112.187:7005@17005 slave bfdb90a0b0e3217fad5e5eb44ec253531930a418 0 1638243052264 5 connected
110d047d5b6c827c018dbebf83d9db350f12b931 122.51.112.187:7004@17004 slave 4b66a7d28ebbfa149f7f2ad1dd1d3cbbc1e79659 0 1638243051000 4 connected
e4406cfeb0e6944bfd6c5af82ba5f4f1ab38190d 122.51.112.187:7002@17002 slave c7cd30d3843f9f4b113614672cabce193b1bc7b9 0 1638243054267 7 connected
c7cd30d3843f9f4b113614672cabce193b1bc7b9 122.51.112.187:7006@17006 master - 0 1638243053266 7 connected 5461-10922
bfdb90a0b0e3217fad5e5eb44ec253531930a418 122.51.112.187:7001@17001 myself,master - 0 1638243052000 1 connected 0-5460

7006已成為新master,7002成為它的slave,然后Lettuce也能連接上了。
解決辦法,修改Lettuce的配置:

import java.time.Duration;
import java.util.Arrays;

import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.redis.connection.RedisClusterConfiguration;
import org.springframework.data.redis.connection.RedisConnectionFactory;
import org.springframework.data.redis.connection.lettuce.LettuceClientConfiguration;
import org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.data.redis.serializer.RedisSerializer;
import org.springframework.data.redis.serializer.StringRedisSerializer;

import io.lettuce.core.cluster.ClusterClientOptions;
import io.lettuce.core.cluster.ClusterTopologyRefreshOptions;


@Configuration
public class RedisConfig {
    
    @Value("${spring.redis.cluster.nodes:nocluster}")
    private String clusterNodes;
    
    @Value("${spring.redis.password:123456}")
    private String password;

    @Bean
    public RedisTemplate<String, Object> redisTemplate(RedisConnectionFactory factory) {

        RedisTemplate<String, Object> redisTemplate = new RedisTemplate<>();
        redisTemplate.setConnectionFactory(factory);
        
        RedisSerializer<String> stringRedisSerializer = new StringRedisSerializer();
        redisTemplate.setKeySerializer(stringRedisSerializer);
        redisTemplate.setValueSerializer(stringRedisSerializer);
        return redisTemplate;
    }
    
    /**
     * 如果是Lettuce集群模式則重新構(gòu)建RedisConnectionFactory并注入Spring
     * */
    @Bean
    @ConditionalOnProperty(prefix="spring.redis.cluster" , name="nodes")
    public RedisConnectionFactory redisConnectionFactory() {
        
        ClusterTopologyRefreshOptions clusterTopologyRefreshOptions =  ClusterTopologyRefreshOptions.builder()
                .enableAllAdaptiveRefreshTriggers() // 開啟自適應(yīng)刷新,自適應(yīng)刷新不開啟,Redis集群變更時(shí)將會(huì)導(dǎo)致連接異常
                .adaptiveRefreshTriggersTimeout(Duration.ofSeconds(30)) //自適應(yīng)刷新超時(shí)時(shí)間(默認(rèn)30秒),默認(rèn)關(guān)閉開啟后時(shí)間為30秒
                .enablePeriodicRefresh(Duration.ofSeconds(60))  // 默認(rèn)關(guān)閉開啟后時(shí)間為60秒 
                .build();
        ClusterClientOptions clientOptions = ClusterClientOptions.builder()
                .topologyRefreshOptions(clusterTopologyRefreshOptions)
                .build();
        LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
                .clientOptions(clientOptions)
                .build();
        
        String[] nodes = clusterNodes.split(",");
        RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration(Arrays.asList(nodes));
        redisClusterConfiguration.setPassword(password);
        return new LettuceConnectionFactory(redisClusterConfiguration , clientConfig);
    }
}

筆者用的是springboot 2.1 spring-boot-starter-data-redis 默認(rèn)的Lettuce客戶端,當(dāng)使用Redis cluster集群模式時(shí),需要配置一下RedisConnectionFactory開啟自適應(yīng)刷新來做故障轉(zhuǎn)移時(shí)的自動(dòng)切換從節(jié)點(diǎn)進(jìn)行連接。

重新測(cè)試:停掉master 7006,這次Lettuce可以正常切換連到7002slave上去了。(仍然會(huì)不斷的在日志里報(bào)連接錯(cuò)誤,因?yàn)樾枰恢眹L試重連7006,但因?yàn)橛?002從節(jié)點(diǎn)頂上了、所以應(yīng)用是可以正常使用的)

Redis不保證數(shù)據(jù)的強(qiáng)一致性

Redis并不保證數(shù)據(jù)的強(qiáng)一致性,也就是取CAP定理中的AP

  • 主從節(jié)點(diǎn)之間使用的是異步復(fù)制 , 為了保證高性能,Redis主從同步使用的是異步復(fù)制的方式,主節(jié)點(diǎn)不會(huì)等從節(jié)點(diǎn)同步成功而是馬上就返回客戶端。這樣當(dāng)返回客戶端后還沒來得及同步從節(jié)點(diǎn)成功、這時(shí)候如果主節(jié)點(diǎn)掛了,那么就會(huì)發(fā)生數(shù)據(jù)丟失。
  • 發(fā)生了網(wǎng)絡(luò)分區(qū)時(shí)的一種情況,整個(gè)集群分為了隔離的兩個(gè)大小分區(qū),小分區(qū)中有某個(gè)主節(jié)點(diǎn)A0,大分區(qū)中會(huì)從A的從節(jié)點(diǎn)中選出新的主節(jié)點(diǎn)A1,但此時(shí)A0并未掛掉還是正常能接受客戶端(這個(gè)客戶端也在小分區(qū)里)請(qǐng)求的,這樣分區(qū)故障的這段時(shí)間針對(duì)A分片就有兩個(gè)主節(jié)點(diǎn)了、這就是所謂的“腦裂”現(xiàn)象。等網(wǎng)絡(luò)分區(qū)故障恢復(fù)之后,A0會(huì)稱為A1的從節(jié)點(diǎn)、清空自己的數(shù)據(jù)重新從A1上同步數(shù)據(jù)。這樣在小分區(qū)時(shí)代從客戶端寫入的數(shù)據(jù)就丟失了。
Redis集群核心原理
1、Redis cluster沒采用一致性Hash算法,添加集群節(jié)點(diǎn)或刪除節(jié)點(diǎn)需要手工維護(hù)slot遷移,然后怎么做到熱遷移對(duì)線上業(yè)務(wù)無影響的?

關(guān)于一致性Hash算法,可以參考一致性Hash算法 - 簡書 (jianshu.com)

Redis cluster使用的是hash slot算法,跟一致性Hash算法不太一樣,固定16384個(gè)hash槽,然后計(jì)算key落在哪個(gè)slot里邊(計(jì)算key的CRC16值再對(duì)16384取模),key找的是slot而不是節(jié)點(diǎn),而slot與節(jié)點(diǎn)的對(duì)應(yīng)關(guān)系可以通過reshard改變并通過gossip協(xié)議擴(kuò)散到集群中的每一個(gè)節(jié)點(diǎn)、進(jìn)而可以為客戶端獲知,這樣key的節(jié)點(diǎn)尋址就跟具體的節(jié)點(diǎn)個(gè)數(shù)沒關(guān)系了。也同樣解決了普通hash取模算法當(dāng)節(jié)點(diǎn)個(gè)數(shù)發(fā)生變化時(shí),大量key對(duì)應(yīng)的尋址都發(fā)生改動(dòng)導(dǎo)致緩存失效的問題。

比如集群增加了1個(gè)節(jié)點(diǎn),這時(shí)候如果不做任何操作,那么新增加的這個(gè)節(jié)點(diǎn)上是沒有slot的,所有slot都在原來的節(jié)點(diǎn)上且對(duì)應(yīng)關(guān)系不變、所以沒有因?yàn)楣?jié)點(diǎn)個(gè)數(shù)變動(dòng)而緩存失效,當(dāng)reshard一部分slot到新節(jié)點(diǎn)后,客戶端獲取到新遷移的這部分slot與新節(jié)點(diǎn)的對(duì)應(yīng)關(guān)系、尋址到新節(jié)點(diǎn),而沒遷移的slot仍然尋址到原來的節(jié)點(diǎn)。

關(guān)于熱遷移,猜想,內(nèi)部應(yīng)該是先做復(fù)制遷移,等遷移完了,再切換slot與節(jié)點(diǎn)的對(duì)應(yīng)關(guān)系,復(fù)制沒有完成之前仍按照原來的slot與節(jié)點(diǎn)對(duì)應(yīng)關(guān)系去原節(jié)點(diǎn)訪問。復(fù)制結(jié)束之后,再刪除原節(jié)點(diǎn)上已經(jīng)遷移的slot所對(duì)應(yīng)的key。

2、當(dāng)主節(jié)點(diǎn)出現(xiàn)故障時(shí),集群是如何識(shí)別和如何選主的?

與哨兵模式比較類似,當(dāng)1個(gè)節(jié)點(diǎn)發(fā)現(xiàn)某個(gè)master節(jié)點(diǎn)故障了、會(huì)對(duì)這個(gè)故障節(jié)點(diǎn)進(jìn)行pfail主觀宕機(jī),然后會(huì)通過gossip協(xié)議通知到集群中的其他節(jié)點(diǎn)、其他節(jié)點(diǎn)也執(zhí)行判斷pfail并gossip擴(kuò)散廣播這一過程,當(dāng)超過半數(shù)節(jié)點(diǎn)pfail時(shí)那么故障節(jié)點(diǎn)就是fail客觀宕機(jī)。接下來所有的master節(jié)點(diǎn)會(huì)在故障節(jié)點(diǎn)的從節(jié)點(diǎn)中選出一個(gè)新的主節(jié)點(diǎn),此時(shí)所有的master節(jié)點(diǎn)中超過半數(shù)的都投票選舉了故障節(jié)點(diǎn)的某個(gè)從節(jié)點(diǎn),那么這個(gè)從節(jié)點(diǎn)當(dāng)選新的master節(jié)點(diǎn)。

3、去中心化設(shè)計(jì)與gossip協(xié)議

所有節(jié)點(diǎn)都持有元數(shù)據(jù),節(jié)點(diǎn)之間通過gossip這種二進(jìn)制協(xié)議進(jìn)行通信、發(fā)送自己的元數(shù)據(jù)信息給其他節(jié)點(diǎn)、故障檢測(cè)、集群配置更新、故障轉(zhuǎn)移授權(quán)等等。

這種去中心化的分布式節(jié)點(diǎn)之間內(nèi)部協(xié)調(diào),包括故障識(shí)別、故障轉(zhuǎn)移、選主等等,核心在于gossip擴(kuò)散協(xié)議,能夠支撐這樣的廣播協(xié)議在于所有的節(jié)點(diǎn)都持有一份完整的集群元數(shù)據(jù),即所有的節(jié)點(diǎn)都知悉當(dāng)前集群全局的情況。

參考:

Redis高可用方案 - 簡書 (jianshu.com)

面試題:Redis 集群模式的工作原理能說一下么 - 云+社區(qū) - 騰訊云 (tencent.com)

深度圖解Redis Cluster原理 - detectiveHLH - 博客園 (cnblogs.com)

Redis學(xué)習(xí)筆記之集群重啟和遇到的坑-阿里云開發(fā)者社區(qū) (aliyun.com)

云服務(wù)器Redis集群部署及客戶端通過公網(wǎng)IP連接問題

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容