問(wèn)題
-
es集群只配置一個(gè)節(jié)點(diǎn),client是否能夠自動(dòng)發(fā)現(xiàn)集群中的所有節(jié)點(diǎn)?是如何發(fā)現(xiàn)的?如下配置了一個(gè)節(jié)點(diǎn):
單個(gè)node配置 - es client如何做到負(fù)載均衡?
- 一個(gè)es node掛掉之后,es client如何摘掉該節(jié)點(diǎn)?
- es client node檢測(cè)分為兩種模式,有什么不同?
核心類
- TransportClient : es client對(duì)外API類
- TransportClientNodesService : 維護(hù)node節(jié)點(diǎn)的類
- ScheduledNodeSampler : 定期維護(hù)正常節(jié)點(diǎn)類
- NettyTransport : 進(jìn)行數(shù)據(jù)傳輸
- NodeSampler : 節(jié)點(diǎn)嗅探類
Client初始化過(guò)程
初始化代碼
Settings.Builder builder = Settings.settingsBuilder()
.put("cluster.name", clusterName)
.put("client.transport.sniff", true);
Settings settings = builder.build();
TransportClient client = TransportClient.builder().settings(settings).build();
for (TransportAddress transportAddress : transportAddresses) {
client.addTransportAddress(transportAddress);
}
- ES 通過(guò)builder模式構(gòu)造了基礎(chǔ)的配置參數(shù);
- 通過(guò)build構(gòu)造了client,這個(gè)時(shí)候包括構(gòu)造client、初始化ThreadPool、構(gòu)造TransportClientNodesService、啟動(dòng)定時(shí)任務(wù)、定制化嗅探類型;
- 添加集群可用地址,比如我只配了集群中的一個(gè)節(jié)點(diǎn);
構(gòu)建client
調(diào)用build API

其中,關(guān)于依賴注入的簡(jiǎn)單說(shuō)明:Guice 是 Google 用于 Java? 開(kāi)發(fā)的開(kāi)放源碼依賴項(xiàng)注入框架(感興趣的可以了解下,這里不做重點(diǎn)講解),具體可參考如下:
初始化TransportClientNodesService
在上一幅圖的modules.createInjector對(duì)TransportClientNodesService進(jìn)行實(shí)例化,在TransportClient進(jìn)行注入,可以看到TransportClient里邊的絕大部分API都是通過(guò)TransportClientNodesService進(jìn)行代理的:

Guice通過(guò)注解進(jìn)行注入

在上圖中:注入了集群名稱、線程池等,重點(diǎn)是如下代碼:該段代碼選擇了節(jié)點(diǎn)嗅探器的類型 嗅探同一集群中的所有節(jié)點(diǎn)SniffNodesSampler或者是只關(guān)注配置文件配置的節(jié)點(diǎn)SimpleNodeSampler
if (this.settings.getAsBoolean("client.transport.sniff", false)) {
this.nodesSampler = new SniffNodesSampler();
} else {
this.nodesSampler = new SimpleNodeSampler();
}
特點(diǎn)
SniffNodesSampler:client會(huì)主動(dòng)發(fā)現(xiàn)集群里的其他節(jié)點(diǎn),會(huì)創(chuàng)建fully connect(什么叫fully connect?后邊說(shuō))
SimpleNodeSampler:ping listedNodes中的所有node,區(qū)別在于這里創(chuàng)建的都是light connect;
其中TransportClientNodesService維護(hù)了三個(gè)節(jié)點(diǎn)存儲(chǔ)數(shù)據(jù)結(jié)構(gòu):
// nodes that are added to be discovered
1 private volatileListlistedNodes= Collections.emptyList();
2 private volatileListnodes= Collections.emptyList();
3 private volatileListfilteredNodes= Collections.emptyList();
- 代表配置文件中主動(dòng)加入的節(jié)點(diǎn);
- 代表參與請(qǐng)求的節(jié)點(diǎn);
- 過(guò)濾掉的不能進(jìn)行請(qǐng)求處理的節(jié)點(diǎn);
Client如何做到負(fù)載均衡

如上圖,我們發(fā)現(xiàn)每次 execute 的時(shí)候,是從 nodes 這個(gè)數(shù)據(jù)結(jié)構(gòu)中獲取節(jié)點(diǎn),然后通過(guò)簡(jiǎn)單的 rouund-robbin 獲取節(jié)點(diǎn)服務(wù)器,核心代碼如下:
private final AtomicInteger randomNodeGenerator = new AtomicInteger();
......
private int getNodeNumber() {
int index = randomNodeGenerator.incrementAndGet();
if (index < 0) {
index = 0;
randomNodeGenerator.set(0);
}
return index;
}
然后通過(guò)netty的channel將數(shù)據(jù)寫入,核心代碼如下:
public void sendRequest(final DiscoveryNode node, final long requestId, final String action, final TransportRequest request, TransportRequestOptions options)
throws IOException, TransportException {
1 Channel targetChannel = nodeChannel(node, options);
if (compress) {
options = TransportRequestOptions.builder(options).withCompress(true).build();
}
byte status = 0;
status = TransportStatus.setRequest(status);
ReleasableBytesStreamOutput bStream = new ReleasableBytesStreamOutput(bigArrays);
boolean addedReleaseListener = false;
try {
bStream.skip(NettyHeader.HEADER_SIZE);
StreamOutput stream = bStream;
// only compress if asked, and, the request is not bytes, since then only
// the header part is compressed, and the "body" can't be extracted as compressed
if (options.compress() && (!(request instanceof BytesTransportRequest))) {
status = TransportStatus.setCompress(status);
stream = CompressorFactory.defaultCompressor().streamOutput(stream);
}
// we pick the smallest of the 2, to support both backward and forward compatibility
// note, this is the only place we need to do this, since from here on, we use the serialized version
// as the version to use also when the node receiving this request will send the response with
Version version = Version.smallest(this.version, node.version());
stream.setVersion(version);
stream.writeString(action);
ReleasablePagedBytesReference bytes;
ChannelBuffer buffer;
// it might be nice to somehow generalize this optimization, maybe a smart "paged" bytes output
// that create paged channel buffers, but its tricky to know when to do it (where this option is
// more explicit).
if (request instanceof BytesTransportRequest) {
BytesTransportRequest bRequest = (BytesTransportRequest) request;
assert node.version().equals(bRequest.version());
bRequest.writeThin(stream);
stream.close();
bytes = bStream.bytes();
ChannelBuffer headerBuffer = bytes.toChannelBuffer();
ChannelBuffer contentBuffer = bRequest.bytes().toChannelBuffer();
buffer = ChannelBuffers.wrappedBuffer(NettyUtils.DEFAULT_GATHERING, headerBuffer, contentBuffer);
} else {
request.writeTo(stream);
stream.close();
bytes = bStream.bytes();
buffer = bytes.toChannelBuffer();
}
NettyHeader.writeHeader(buffer, requestId, status, version);
2 ChannelFuture future = targetChannel.write(buffer);
ReleaseChannelFutureListener listener= new ReleaseChannelFutureListener(bytes);
future.addListener(listener);
addedReleaseListener = true;
transportServiceAdapter.onRequestSent(node, requestId, action, request, options);
} finally {
if (!addedReleaseListener) {
Releasables.close(bStream.bytes());
}
}
}
其中最重要的就是1和2
- 1代表拿到一個(gè)連接;
- 2代表通過(guò)拿到的連接寫數(shù)據(jù);
這時(shí)候就會(huì)有新的問(wèn)題
- nodes的數(shù)據(jù)是何時(shí)寫入的?
- 連接是什么時(shí)候創(chuàng)建的?
Nodes數(shù)據(jù)何時(shí)寫入
核心是調(diào)用doSampler,代碼如下:
protected void doSample() {
// the nodes we are going to ping include the core listed nodes that were added
// and the last round of discovered nodes
SetnodesToPing = Sets.newHashSet();
for (DiscoveryNode node : listedNodes) {
nodesToPing.add(node);
}
for (DiscoveryNode node : nodes) {
nodesToPing.add(node);
}
final CountDownLatch latch = new CountDownLatch(nodesToPing.size());
final ConcurrentMapclusterStateResponses = ConcurrentCollections.newConcurrentMap();
for (final DiscoveryNode listedNode : nodesToPing) {
threadPool.executor(ThreadPool.Names.MANAGEMENT).execute(new Runnable() {
@Override
public void run() {
try {
if (!transportService.nodeConnected(listedNode)) {
try {
// if its one of the actual nodes we will talk to, not to listed nodes, fully connect
if (nodes.contains(listedNode)) {
logger.trace("connecting to cluster node [{}]", listedNode);
transportService.connectToNode(listedNode);
} else {
// its a listed node, light connect to it...
logger.trace("connecting to listed node (light) [{}]", listedNode);
transportService.connectToNodeLight(listedNode);
}
} catch (Exception e) {
logger.debug("failed to connect to node [{}], ignoring...", e, listedNode);
latch.countDown();
return;
}
}
//核心是在這里,剛剛開(kāi)始初始化的時(shí)候,可能只有配置的一個(gè)節(jié)點(diǎn),這個(gè)時(shí)候會(huì)通過(guò)這個(gè)地址發(fā)送一個(gè)state狀態(tài)監(jiān)測(cè)
//"cluster:monitor/state"
transportService.sendRequest(listedNode, ClusterStateAction.NAME,
headers.applyTo(Requests.clusterStateRequest().clear().nodes(true).local(true)),
TransportRequestOptions.builder().withType(TransportRequestOptions.Type.STAE).withTimeout(pingTimeout).build(),
new BaseTransportResponseHandler() {
@Override
public ClusterStateResponse newInstance() {
return new ClusterStateResponse();
}
@Override
public String executor() {
return ThreadPool.Names.SAME;
}
@Override
public void handleResponse(ClusterStateResponse response) {
/*通過(guò)回調(diào),會(huì)在這個(gè)地方返回集群中類似下邊所有節(jié)點(diǎn)的信息
{ "version" : 27, "state_uuid" : "YSI9d_HiQJ-FFAtGFCVOlw", "master_node" : "TXHHx-XRQaiXAxtP1EzXMw", "blocks" : { }, "nodes" : { "poxubF0LTVue84GMrZ7rwA" : { "name" : "node1", "transport_address" : "1.1.1.1:8888", "attributes" : { "data" : "false", "master" : "true" } }, "9Cz8m3GkTza7vgmpf3L65Q" : { "name" : "node2", "transport_address" : "1.1.1.2:8889", "attributes" : { "master" : "false" } } }, "metadata" : { "cluster_uuid" : "_na_", "templates" : { }, "indices" : { } }, "routing_table" : { "indices" : { } }, "routing_nodes" : { "unassigned" : [ ], "nodes" : { "lZqD-WExRu-gaSUiCXaJcg" : [ ], "hR6PbFrgQVSY0MHajNDmgA" : [ ], } }}*/
clusterStateResponses.put(listedNode, response);
latch.countDown();
}
@Override
public void handleException(TransportException e) { logger.info("failed to get local cluster state for {}, disconnecting...", e, listedNode); transportService.disconnectFromNode(listedNode); latch.countDown();
}
});} catch (Throwable e) {
logger.info("failed to get local cluster state info for {}, disconnecting...", e, listedNode);
transportService.disconnectFromNode(listedNode); latch.countDown();
}}});}
try {
latch.await();
} catch (InterruptedException e) {
return;
}
HashSetnewNodes = new HashSet<>(); HashSetnewFilteredNodes = new HashSet<>();
for (Map.Entryentry : clusterStateResponses.entrySet()) {
if (!ignoreClusterName &&!clusterName.equals(entry.getValue().getClusterName())) {
logger.warn("node {} not part of the cluster {}, ignoring...",
entry.getValue().getState().nodes().localNode(), clusterName);
newFilteredNodes.add(entry.getKey());
continue;
}
//接下來(lái)在這個(gè)地方拿到所有的data nodes 寫入到nodes節(jié)點(diǎn)里邊
for (ObjectCursorcursor : entry.getValue().getState().nodes().dataNodes().values()){
newNodes.add(cursor.value);}}
nodes = validateNewNodes(newNodes);
filteredNodes = Collections.unmodifiableList(new ArrayList<(newFilteredNodes));
}
其中調(diào)用時(shí)機(jī)分為兩部分:
- client.addTransportAddress(transportAddress);
- ScheduledNodeSampler,默認(rèn)每隔5s會(huì)進(jìn)行一次對(duì)各個(gè)節(jié)點(diǎn)的請(qǐng)求操作;
連接是何時(shí)創(chuàng)建的呢
也是在doSampler調(diào)用,最終由NettryTransport創(chuàng)建

這個(gè)時(shí)候發(fā)現(xiàn),如果是light則創(chuàng)建輕連接,也就是,否則創(chuàng)建fully connect,其中包括:
recovery:做數(shù)據(jù)恢復(fù)recovery,默認(rèn)個(gè)數(shù)2個(gè);
- bulk:用于bulk請(qǐng)求,默認(rèn)個(gè)數(shù)3個(gè);
- med/reg:典型的搜索和單doc索引,默認(rèn)個(gè)數(shù)6個(gè);
- high:如集群state的發(fā)送等,默認(rèn)個(gè)數(shù)1個(gè);
- ping:就是node之間的ping咯。默認(rèn)個(gè)數(shù)1個(gè);
對(duì)應(yīng)的代碼為:
public void start() {
List<Channel> newAllChannels = new ArrayList<>();
newAllChannels.addAll(Arrays.asList(recovery));
newAllChannels.addAll(Arrays.asList(bulk));
newAllChannels.addAll(Arrays.asList(reg));
newAllChannels.addAll(Arrays.asList(state));
newAllChannels.addAll(Arrays.asList(ping));
this.allChannels = Collections.unmodifiableList(newAllChannels);
}
END
