我們都知道通過(guò)cluster nodes可以查看集群列表,當(dāng)遇到機(jī)器下線或者機(jī)器物理故障的時(shí)候需要置換機(jī)器。但是通過(guò)cluster nodes查看的時(shí)候還可以看到原來(lái)的無(wú)效ip, 所幸redis提供了cluster forget xx這個(gè)命令。
突然有一次執(zhí)行完cluster forget后,經(jīng)過(guò)短暫的幾秒后,依然可以查到該無(wú)效ip,但是節(jié)點(diǎn)狀態(tài)變成了"handshake"握手狀態(tài),而且nodeId在不停的發(fā)生變化。
后面經(jīng)查證,是因?yàn)榧核泄?jié)點(diǎn)都持有該節(jié)點(diǎn)的信息,不停的在發(fā)起重連操作。而且redis作者也針對(duì)這種情況給出了結(jié)論:
There are only two ways this can happen:
1. You fail to send CLUSTER FORGET to all the nodes in the cluster. So eventually there are nodes that still has a clue about this other node, and it will inform the other nodes via gossip. Make sure to send CLUSTER FORGET to every single node in the cluster.
2. Or alternatively, there is an instance running in 10.15.107.150 but you said there is not.
也就是需要在redis cluster所有節(jié)點(diǎn)上(包括從節(jié)點(diǎn))執(zhí)行cluster forget xx操作,才能徹底的移除掉無(wú)效節(jié)點(diǎn)列表,問(wèn)題才得以解決。