Basic Concepts基本概念:
There are a few concepts that are core to Elasticsearch. Understanding these concepts from the outset will tremendously help ease the learning process.
以下是Elasticsearch核心的一些概念。在教程伊始了解這些概念可以極好地幫助你學(xué)習(xí)接下來的課程。
Near Realtime(NRT)近實(shí)時(shí)性
Elasticsearch is? a near real time search platform.What this means is there is a slight latency(normally one second) from the time you index a document until the time it becomes searchable.
Elasticsearch是一個(gè)近乎實(shí)時(shí)搜索的平臺(tái)。換言之,自你導(dǎo)入一個(gè)文檔到它可以被搜索到的時(shí)候只有細(xì)微的延遲等待(通常在1s左右)。
Cluster集群
A cluster is a collection of one or more nodes(servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. A cluster is identified by a unique name which by default is "elasticsearch".This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.
集群是一個(gè)或多個(gè)節(jié)點(diǎn)(服務(wù)器)組成的,通過所有節(jié)點(diǎn)一起保存你的全部數(shù)據(jù)并提供聯(lián)合索引和搜索功能。每個(gè)集群都有一個(gè)唯一名稱作為身份標(biāo)識(shí),默認(rèn)為"elasticsearch"。這個(gè)名稱很重要,因?yàn)橹挥幸粋€(gè)節(jié)點(diǎn)以這個(gè)名稱加入集群,才能夠成為這個(gè)集群的一部分。
Make sure that you don't reuse the same cluster names in different environments,otherwise you might end up with nodes joining the wrong cluster.For instance you could use logging-dev , logging-stage, and logging-prod for the development,staging,and production clusters.
你沒有在不同環(huán)境下重復(fù)使用相同的集群名稱,否則你終將把節(jié)點(diǎn)加入錯(cuò)誤的集群。例如:你可以使用logging-dev、logging-stage、logging-prod 來為開發(fā)、演示、產(chǎn)出集群分別命名。
Note that it is valid and perfectly fine to have a cluster with only a single node in it. Furthermore, you may also have multiple independent clusters each with its own unique cluster name.
請(qǐng)注意:設(shè)立只有一個(gè)節(jié)點(diǎn)在內(nèi)的集群是有效且完全ok的。不過,你就需要為多種獨(dú)立存在的集群設(shè)置它們專有的集群名稱。
Node節(jié)點(diǎn)
A node is a single server that is part of your cluster, stores your data, and participates in the cluster's indexing and search capabilities.Just like a cluster, a node is identified by a name which by default is a random Univesally Unique Identifier(UUID) that is assigned to the node at startup. You can define any code name you want if you do not want the default.This name is important for administration purposes where you want to identify which servers in your network correspond to which nodes in your Elasticsearch cluster.
節(jié)點(diǎn)是組成你集群中的一個(gè)服務(wù)器,為你存儲(chǔ)數(shù)據(jù),參與集群的索引及搜索功能。類似集群,一個(gè)節(jié)點(diǎn)在建立之初也被分配一個(gè)代表身份標(biāo)識(shí)的名稱,默認(rèn)為一個(gè)隨機(jī)的UUID(普遍唯一標(biāo)識(shí)符)。如果你不想要這個(gè)默認(rèn)的名稱,也可以自己定義。這個(gè)名稱對(duì)你識(shí)別網(wǎng)絡(luò)上服務(wù)器對(duì)應(yīng)哪個(gè)Elasticsearch集群的節(jié)點(diǎn)有著重要的管理意義。
A node can be configured to join a specific cluster by the cluster name.By default, each node is set up to join a cluster named elasticsearch which means that if you start up a number of nodes on your network and——assuming they can discover each other——they will automatically form and join a single cluster named elasticsearch.
一個(gè)節(jié)點(diǎn)可以通過配置集群名稱來加入指定的集群。但默認(rèn)情況下,每個(gè)節(jié)點(diǎn)創(chuàng)建之初就被加入到一個(gè)名為elasticsearch的集群中。意味著,若你在網(wǎng)絡(luò)中創(chuàng)建了一些節(jié)點(diǎn),且假定它們能夠互相識(shí)別,它們將自動(dòng)排列并加入到名為elasticsearch的集群中。
Index索引
An index is a collection of documents that have somewhat similar characteristics.For example, you can have an index for customer data,another index for a product catalog, and yet another index for order data.An index is identified by a name (that must be all lowercase) and this name is used to refer to the index when performing indexing, search, update, and delete operations against the documents in it.
索引就是有著某些相似特性的文檔集合。例如,你有一個(gè)索引指向用戶數(shù)據(jù),一個(gè)指向產(chǎn)品分類,一個(gè)指向訂單數(shù)據(jù)。一個(gè)索引被一個(gè)名稱(名稱必須全部小寫)唯一標(biāo)識(shí),這個(gè)名稱將通過文檔去執(zhí)行索引、搜索、更新】刪除等操作。
In? a single cluster, you can define as many indexes as you want.
在一個(gè)集群中,你可以隨意定義諸多索引。
Type類型
warning: Deprecated in 6.0.0
警告:6.0.0版本不建議使用? ? ?Removal of mapping types
A type used to be a logical category/partition of your index to allow you to store different types of documents in the same index, eg one type for users, another type for blog posts.It is no longer possible to create multiple types in an index, and the whole concept of types will be removed in a later version. See?Removal of mapping types?for more.
類型就是索引中的一個(gè)邏輯分類/分區(qū),它的存在允許你在相同的索引中存儲(chǔ)不同類型的文檔,例如,一個(gè)用戶類型,一個(gè)博客文章類型。如今已不能在一個(gè)索引中創(chuàng)建多種類型,且整個(gè)類型的概念也將在之后的版本中移除。查看移除類型映射獲取更多信息。
Document文檔
A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order.This document is expressed in JSON (JavaScript Object Notation) which is? ubiquitous internet data interchange format.
文檔是一個(gè)可被檢索的信息基礎(chǔ)單元。例如,你可以為一個(gè)獨(dú)立用戶創(chuàng)建一個(gè)文檔,為一個(gè)產(chǎn)品創(chuàng)建一個(gè)文檔,一個(gè)訂單創(chuàng)建一個(gè)文檔。這個(gè)文檔以JSON(JavaScript對(duì)象標(biāo)記)形式呈現(xiàn),JSON是一種普遍的網(wǎng)絡(luò)數(shù)據(jù)交換格式。
Within an index/type, you can store as many as documents as you want.Note that although a document physically resides in an index, a document actually must be indexed/assigned to a type inside an index.
在一個(gè)索引/類型中, 你可以隨意存儲(chǔ)諸多文檔。注意,雖然一個(gè)文檔在物理屬性上屬于一個(gè)索引,但實(shí)際上必須被索引/指定到索引中的類型。
Shards & Replicas
An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to search requests from a single node alone.
索引可以潛藏可能超過一個(gè)節(jié)點(diǎn)硬盤限制的大量數(shù)據(jù)。例如,一個(gè)十億文檔索引將占據(jù)1TB的磁盤空間,但一個(gè)節(jié)點(diǎn)上但硬盤空間可能沒這么大,即使足夠承載,但從單一節(jié)點(diǎn)上發(fā)起搜索請(qǐng)求的響應(yīng)也會(huì)非常慢。
To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards.When you create an index, you can simply define the number of shards that you want.Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.
為解決這個(gè)問題,Elasticsearch 提供了將索引分割成多片區(qū)的功能,稱之為shards(分片)。當(dāng)你創(chuàng)建一個(gè)索引,你可以簡(jiǎn)單定義想要的分片數(shù)量。每個(gè)分片功能齊備且獨(dú)立于索引,能夠安放在集群的任一節(jié)點(diǎn)上。
Sharding is important for two primary reasons:
分片之所以重要的兩個(gè)主要原因:
? ? It allows you to horizontally split/scale your content volume.
? ? 允許你水平分割/縮放你的內(nèi)容冊(cè)
? ? It allows you to distribute and parallelize operations across shards(potentially on multiple nodes) thus increasing performance/throughput.
? ? 允許通過分片來分發(fā)和并行化操作以便提高表現(xiàn)/吞吐量。
The mechanics of how a shard is distributed and also how its documents are aggregated back into search requests are completely managed by Elasticsearch and is transparent to you as the user.
分片是如何被分發(fā)的操作流程,它的文檔又是如何被聚集到搜索請(qǐng)求里是完全由Elasticsearch管理的,且這些流程完全向用戶透明。
In a network/cloud environment where failures can be expected anytime, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index's shards into what are called replica shards, or replicas for short.
網(wǎng)絡(luò)/云環(huán)境下,故障隨時(shí)可能發(fā)生。以防一個(gè)分片/節(jié)點(diǎn)因某些原因下線或者消失了,強(qiáng)烈推薦一個(gè)非常好用的故障轉(zhuǎn)移機(jī)制。為達(dá)到目的,Elasticsearch 允許你將一個(gè)或多個(gè)索引的分片拷貝放入一個(gè)叫replica shards(復(fù)刻分片)的地方,簡(jiǎn)稱replica(復(fù)制)
Replication is important for two primary reasons:
復(fù)刻之所以重要主要源于以下兩點(diǎn):
? ? It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
? ? 一旦分片/節(jié)點(diǎn)掛了,它有著很高的可利用性。也因此,謹(jǐn)記不要將復(fù)制分片和原始分片分配到同一節(jié)點(diǎn)上。
? ? It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.
? ? 因搜索行為可以在分片的所有拷貝中并行執(zhí)行,它允許你的分片提供超出自身負(fù)荷的搜索。
To summarize, each index can be split into multiple shards. An index can also be replicated zero(meaning no replicas) or more times.Once replicated, each index will have primary shards(the original shards that were replicated from) and replica shards(the copies of the primary shards). The number of shards and replicas can be defined per index at the time the index is created. After the index is created, you may change the number of replicas dynamically anytime but you cannot change the number of shards after-the fact.
總結(jié)來說,每個(gè)索引可以被分割成多個(gè)分片。每個(gè)索引也可以被復(fù)制0(也就是沒有復(fù)制)到多次。一旦復(fù)刻,每個(gè)索引將會(huì)有原始分片(復(fù)刻產(chǎn)生的原始分片)和復(fù)刻分片(原始分片的拷貝)。分片和復(fù)刻分片的數(shù)量可以在每個(gè)索引創(chuàng)建的時(shí)候定義。索引創(chuàng)建后,你可以隨時(shí)動(dòng)態(tài)更改復(fù)刻分片的數(shù)量,但不能更改原始分片但數(shù)量。
By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica which means that if you have at least two nodes in your cluster,your index will have 5 primary shards and another 5 replica shards(1complete replica) for a total of 10 shards per index.
默認(rèn)情況下,Elasticsearch的每個(gè)索引都分發(fā)了5個(gè)原始分片和一個(gè)復(fù)刻,意味著你的集群里有至少兩個(gè)節(jié)點(diǎn),你的索引里有5個(gè)原始分片和另外5個(gè)復(fù)刻分片(1個(gè)完整復(fù)刻)也就是每個(gè)索引有10個(gè)分片。
Each Elasticsearch shard is a Lucene index. There is a maximum number of documents you can have in a single Lucene index. As of?Lucene-5843, the limit is 2,127,483,519(=Integer.MAX_VALUE - 128) documents.You can monitor shard sizes using the?_cat/shards?API.
每個(gè)Elasticsearch分片都是一個(gè)Lucene索引。一個(gè)Lucene索引中都有一個(gè)文檔數(shù)量的最大值。截至Lucene-5843,限制2,127,483,519(=Integer.MAX_VALUE - 128) 個(gè)文檔。你可以使用_cat/shards來監(jiān)測(cè)分片大小。
? ??