前言
DSL全稱 Domain Specific language,即特定領(lǐng)域?qū)S谜Z言
1.全局操作
1.1 查詢集群健康情況
GET /_cat/health?v ?v表示顯示頭信息
集群的健康狀態(tài)有紅、黃、綠三個狀態(tài):
綠 – 一切正常(集群功能齊全)
黃 – 所有數(shù)據(jù)可用,但有些副本尚未分配(集群功能完全)
紅 – 有些數(shù)據(jù)不可用(集群部分功能)
1.2 查詢各個節(jié)點狀態(tài)
GET /_cat/nodes?v
2. 對索引的操作
2.1 查詢各個索引狀態(tài)
GET /_cat/indices?v
ES中會默認存在一些索引
| health | green(集群完整) yellow(單點正常、集群不完整) red(單點不正常) |
|---|---|
| status | 是否能使用 |
| index | 索引名 |
| uuid | 索引統(tǒng)一編號 |
| pri | 主節(jié)點幾個分片 |
| rep | 從節(jié)點幾個(副本數(shù)) |
| docs.count | 文檔數(shù) |
| docs.deleted | 文檔被刪了多少 |
| store.size | 整體占空間大小 |
| pri.store.size | 主節(jié)點占空間大小 |
2.2 創(chuàng)建索引
API:PUT 索引名?pretty
PUT movie_index?pretty
使用PUT創(chuàng)建名為“movie_index”的索引。末尾追加pretty,可以漂亮地打印JSON響應(yīng)(如果有的話)。
索引名命名要求:
僅可能為小寫字母,不能下劃線開頭
不能包括 , /, *, ?, ", <, >, |, 空格, 逗號, #
7.0版本之前可以使用冒號:,但不建議使用并在7.0版本之后不再支持
不能以這些字符 -, _, + 開頭
不能包括 . 或 …
長度不能超過 255 個字符
2.3 查詢某個索引的分片情況
API:GET /_cat/shards/索引名
GET /_cat/shards/movie_index
默認5個分片,1個副本。所以看到一共有10個分片,5個主,每一個主分片對應(yīng)一個副本,注意:同一個分片的主和副本肯定不在同一個節(jié)點上
2.4 刪除索引
API:DELETE /索引名
DELETE /movie_index
3. 對文檔進行操作
3.1 創(chuàng)建文檔
向索引movie_index中放入文檔,文檔ID分別為1,2,3
5.API: PUT /索引名/類型名/文檔id
注意:文檔id和文檔中的屬性”id”不是一回事
PUT /movie_index/movie/1
{ "id":100,
"name":"operation red sea",
"doubanScore":8.5,
"actorList":[
{"id":1,"name":"zhang yi"},
{"id":2,"name":"hai qing"},
{"id":3,"name":"zhang han yu"}
]
}
PUT /movie_index/movie/2
{
"id":200,
"name":"operation meigong river",
"doubanScore":8.0,
"actorList":[
{"id":3,"name":"zhang han yu"}
]
}
PUT /movie_index/movie/3
{
"id":300,
"name":"incident red sea",
"doubanScore":5.0,
"actorList":[
{"id":4,"name":"zhang san feng"}
]
}
注意,Elasticsearch并不要求,先要有索引,才能將文檔編入索引。創(chuàng)建文檔時,如果指定索引不存在,將自動創(chuàng)建。默認創(chuàng)建的索引分片是5,副本是1,我們創(chuàng)建的文檔會在其中的某一個分片上存一份,副本上存一份,所以看到的響應(yīng)_shards-total:2
3.2 根據(jù)文檔id查看文檔
API:GET /索引名/類型名/文檔id
GET /movie_index/movie/1?pretty
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "1",
"_version" : 2,
"_seq_no" : 1,
"_primary_term" : 1,
"found" : true,
"_source" : {
"id" : 100,
"name" : "operation red sea",
"doubanScore" : 8.5,
"actorList" : [
{
"id" : 1,
"name" : "zhang yi"
},
{
"id" : 2,
"name" : "hai qing"
},
{
"id" : 3,
"name" : "zhang han yu"
}
]
}
}
這里有一個字段found為真,表示找到了一個ID為3的文檔,另一個字段_source,該字段返回完整JSON文檔。
3.3 查詢所有文檔
API:GET /索引名/_search
Kinana中默認顯示10條,可以通過size控制
GET /movie_index/_search
{
"size":10
}
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"id" : 200,
"name" : "operation meigong river",
"doubanScore" : 8.0,
"actorList" : [
{
"id" : 3,
"name" : "zhang han yu"
}
]
}
},
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : 100,
"name" : "operation red sea",
"doubanScore" : 8.5,
"actorList" : [
{
"id" : 1,
"name" : "zhang yi"
},
{
"id" : 2,
"name" : "hai qing"
},
{
"id" : 3,
"name" : "zhang han yu"
}
]
}
},
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"id" : 300,
"name" : "incident red sea",
"doubanScore" : 5.0,
"actorList" : [
{
"id" : 4,
"name" : "zhang san feng"
}
]
}
}
]
}
}
took:執(zhí)行查詢花費的時間毫秒數(shù)
_shards=>total:搜索了多少個分片(當前表示搜索了全部5個分片)
3.4 根據(jù)文檔id刪除文檔
API: DELETE /索引名/類型名/文檔id
DELETE /movie_index/movie/3
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "3",
"_version" : 2,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 4,
"_primary_term" : 1
}
注意:刪除索引和刪除文檔的區(qū)別?
刪除索引是會立即釋放空間的,不存在所謂的“標記”邏輯。
刪除文檔的時候,是將新文檔寫入,同時將舊文檔標記為已刪除。 磁盤空間是否釋放取決于新舊文檔是否在同一個segment file里面,因此ES后臺的segment merge在合并segment file的過程中有可能觸發(fā)舊文檔的物理刪除。
也可以手動執(zhí)行POST /_forcemerge進行合并觸發(fā)
3.5 替換文檔
- PUT(冪等性操作)
當我們通過執(zhí)行PUT 索引名/類型名/文檔id 命令的添加時候,如果文檔id已經(jīng)存在,那么再次執(zhí)行上面的命令,ElasticSearch將替換現(xiàn)有文檔。
PUT /movie_index/movie/3
{
"id":300,
"name":"incident red sea",
"doubanScore":5.0,
"actorList":[
{"id":4,"name":"zhang cuishan"}
]
}
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "3",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 6,
"_primary_term" : 1
}
文檔id3已經(jīng)存在,會替換原來的文檔內(nèi)容
-
POST(非冪等性操作)
創(chuàng)建文檔時,ID部分是可選的。如果沒有指定,Elasticsearch將生成一個隨機ID,然后使用它來引用文檔。
POST /movie_index/movie/ { "id":300, "name":"incident red sea", "doubanScore":5.0, "actorList":[ {"id":4,"name":"zhang cuishan"} ] }{ "_index" : "movie_index", "_type" : "movie", "_id" : "jyVMMHUBFYRAUn5_l-Ap", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "_seq_no" : 7, "_primary_term" : 1 }
3.6 根據(jù)文檔id更新文檔
除了創(chuàng)建和替換文檔外,ES還可以更新文檔中的某一個字段內(nèi)容。
注意,Elasticsearch實際上并沒有在底層執(zhí)行就地更新,而是先刪除舊文檔,再添加新文檔。
API:
POST /索引名/類型名/文檔id/_update?pretty
{
"doc": { "字段名": "新的字段值" } doc固定寫法
}
需求:把文檔ID為3中的name字段更改為“wudang”:
POST /movie_index/movie/3/_update?pretty
{
"doc": {"name":"wudang"}
}
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "3",
"_version" : 3,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 8,
"_primary_term" : 1
}
3.7 根據(jù)條件更新文檔(了解)
POST /movie_index/_update_by_query
{
"query": {
"match":{
"actorList.id":1
}
},
"script": {
"lang": "painless",
"source":"for(int i=0;i<ctx._source.actorList.length;i++){if(ctx._source.actorList[i].id==3){ctx._source.actorList[i].name='tttt'}}"
}
}
{
"took" : 118,
"timed_out" : false,
"total" : 1,
"updated" : 1,
"deleted" : 0,
"batches" : 1,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
3.8 刪除文檔屬性(了解)
POST /movie_index/movie/1/_update
{
"script" : "ctx._source.remove('name')"
}
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "1",
"_version" : 4,
"_seq_no" : 3,
"_primary_term" : 1,
"found" : true,
"_source" : {
"doubanScore" : 8.5,
"actorList" : [
{
"name" : "zhang yi",
"id" : 1
},
{
"name" : "hai qing",
"id" : 2
},
{
"name" : "tttt",
"id" : 3
}
],
"id" : 100
}
}
3.9 根據(jù)條件刪除文檔(了解)
POST /movie_index/_delete_by_query
{
"query": {
"match_all": {}
}
}
{
"took" : 25,
"timed_out" : false,
"total" : 4,
"deleted" : 4,
"batches" : 1,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
3.10 批處理
除了對單個文檔執(zhí)行創(chuàng)建、更新和刪除之外,ElasticSearch還提供了使用_bulk API批量執(zhí)行上述操作的能力。
API: POST /索引名/類型名/_bulk?pretty _bulk表示批量操作
注意:Kibana要求批量操作的json內(nèi)容寫在同一行
需求1:在索引中批量創(chuàng)建兩個文檔
POST /movie_index/movie/_bulk
{"index":{"_id":66}}
{"id":300,"name":"incident red sea","doubanScore":5.0,"actorList":[{"id":4,"name":"zhang cuishan"}]}
{"index":{"_id":88}}
{"id":300,"name":"incident red sea","doubanScore":5.0,"actorList":[{"id":4,"name":"zhang cuishan"}]}
{
"took" : 5,
"errors" : false,
"items" : [
{
"index" : {
"_index" : "movie_index",
"_type" : "movie",
"_id" : "66",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 5,
"_primary_term" : 1,
"status" : 201
}
},
{
"index" : {
"_index" : "movie_index",
"_type" : "movie",
"_id" : "88",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
}
]
}
需求2:在一個批量操作中,先更新第一個文檔(ID為66),再刪除第二個文檔(ID為88)
POST /movie_index/movie/_bulk
{"update":{"_id":"66"}}
{"doc": { "name": "wudangshanshang" } }
{"delete":{"_id":"88"}}
{
"took" : 8,
"errors" : false,
"items" : [
{
"update" : {
"_index" : "movie_index",
"_type" : "movie",
"_id" : "66",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 6,
"_primary_term" : 1,
"status" : 200
}
},
{
"delete" : {
"_index" : "movie_index",
"_type" : "movie",
"_id" : "88",
"_version" : 2,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1,
"status" : 200
}
}
]
}
4. 查詢操作
4.1 搜索參數(shù)傳遞有2種方法
- URI發(fā)送搜索參數(shù)查詢所有數(shù)據(jù)
GET /索引名/_search?q=* &pretty
例如:GET /movie_index/_search?q=_id:66
這種方式不太適合復(fù)雜查詢場景,了解
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html
- 請求體(request body)發(fā)送搜索參數(shù)查詢所有數(shù)據(jù)
GET /movie_index/_search
{
"query": {
"match_all": {}
}
}
4.2 按條件查詢(全部)
GET movie_index/movie/_search
{
"query":{
"match_all": {}
}
}
4.3 按分詞查詢(必須使用分詞text類型)
測試前:將movie_index索引中的數(shù)據(jù)恢復(fù)到初始的3條
GET movie_index/movie/_search
{
"query":{
"match": {"name":"operation red sea"}
}
}
ES中,name屬性會進行分詞,底層以倒排索引的形式進行存儲,對查詢的內(nèi)容也會進行分詞,然后和文檔的name屬性內(nèi)容進行匹配,所以命中3次,不過命中的分值不同。
注意:ES底層在保存字符串數(shù)據(jù)的時候,會有兩種類型text和keyword
text:分詞
keyword:不分詞
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.8630463,
"hits" : [
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "1",
"_score" : 0.8630463,
"_source" : {
"id" : 100,
"name" : "operation red sea",
"doubanScore" : 8.5,
"actorList" : [
{
"id" : 1,
"name" : "zhang yi"
},
{
"id" : 2,
"name" : "hai qing"
},
{
"id" : 3,
"name" : "zhang han yu"
}
]
}
},
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "3",
"_score" : 0.5753642,
"_source" : {
"id" : 300,
"name" : "incident red sea",
"doubanScore" : 5.0,
"actorList" : [
{
"id" : 4,
"name" : "zhang san feng"
}
]
}
},
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "2",
"_score" : 0.2876821,
"_source" : {
"id" : 200,
"name" : "operation meigong river",
"doubanScore" : 8.0,
"actorList" : [
{
"id" : 3,
"name" : "zhang han yu"
}
]
}
}
]
}
}
4.4 按分詞子屬性查詢
GET movie_index/movie/_search
{
"query":{
"match": {"actorList.name":"zhang han yu"}
}
}
返回3條件結(jié)果
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.970927,
"hits" : [
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "1",
"_score" : 0.970927,
"_source" : {
"id" : 100,
"name" : "operation red sea",
"doubanScore" : 8.5,
"actorList" : [
{
"id" : 1,
"name" : "zhang yi"
},
{
"id" : 2,
"name" : "hai qing"
},
{
"id" : 3,
"name" : "zhang han yu"
}
]
}
},
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "2",
"_score" : 0.8630463,
"_source" : {
"id" : 200,
"name" : "operation meigong river",
"doubanScore" : 8.0,
"actorList" : [
{
"id" : 3,
"name" : "zhang han yu"
}
]
}
},
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"id" : 300,
"name" : "incident red sea",
"doubanScore" : 5.0,
"actorList" : [
{
"id" : 4,
"name" : "zhang san feng"
}
]
}
}
]
}
}
4.5 按短語查詢(相當于like %短語%)
按短語查詢,不再利用分詞技術(shù),直接用短語在原始數(shù)據(jù)中匹配
把演員名包含zhang han yu的查詢出來
GET movie_index/movie/_search
{
"query":{
"match_phrase": {"actorList.name":"zhang han yu"}
}
}
返回2條結(jié)果,把演員名包含zhang han yu的查詢出來
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.8630463,
"hits" : [
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "2",
"_score" : 0.8630463,
"_source" : {
"id" : 200,
"name" : "operation meigong river",
"doubanScore" : 8.0,
"actorList" : [
{
"id" : 3,
"name" : "zhang han yu"
}
]
}
},
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "1",
"_score" : 0.8630463,
"_source" : {
"id" : 100,
"name" : "operation red sea",
"doubanScore" : 8.5,
"actorList" : [
{
"id" : 1,
"name" : "zhang yi"
},
{
"id" : 2,
"name" : "hai qing"
},
{
"id" : 3,
"name" : "zhang han yu"
}
]
}
}
]
}
}
4.6 通過term精準搜索匹配(必須使用keyword類型)
GET movie_index/movie/_search
{
"query":{
"term":{
"actorList.name.keyword":"zhang han yu"
}
}
}
返回2條結(jié)果,把演員中完全匹配zhang han yu的查詢出來
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "2",
"_score" : 0.2876821,
"_source" : {
"id" : 200,
"name" : "operation meigong river",
"doubanScore" : 8.0,
"actorList" : [
{
"id" : 3,
"name" : "zhang han yu"
}
]
}
},
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"id" : 100,
"name" : "operation red sea",
"doubanScore" : 8.5,
"actorList" : [
{
"id" : 1,
"name" : "zhang yi"
},
{
"id" : 2,
"name" : "hai qing"
},
{
"id" : 3,
"name" : "zhang han yu"
}
]
}
}
]
}
}
4.7 fuzzy查詢(容錯匹配)
校正匹配分詞,當一個單詞都無法準確匹配,ES通過一種算法對非常接近的單詞也給與一定的評分,能夠查詢出來,但是消耗更多的性能,對中文來講,實現(xiàn)不是特別好。
GET movie_index/movie/_search
{
"query":{
"fuzzy": {"name":"rad"}
}
}
返回2個結(jié)果,會把incident red sea和operation red sea匹配上
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.19178805,
"hits" : [
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "1",
"_score" : 0.19178805,
"_source" : {
"id" : 100,
"name" : "operation red sea",
"doubanScore" : 8.5,
"actorList" : [
{
"id" : 1,
"name" : "zhang yi"
},
{
"id" : 2,
"name" : "hai qing"
},
{
"id" : 3,
"name" : "zhang han yu"
}
]
}
},
{
"_index" : "movie_index",
"_type" : "movie",
"_id" : "3",
"_score" : 0.19178805,
"_source" : {
"id" : 300,
"name" : "incident red sea",
"doubanScore" : 5.0,
"actorList" : [
{
"id" : 4,
"name" : "zhang san feng"
}
]
}
}
]
}
}
4.8 過濾—先匹配,再過濾
GET movie_index/movie/_search
{
"query":{
"match": {"name":"red"}
},
"post_filter":{
"term": {
"actorList.id": 3
}
}
}
4.9 過濾—匹配和過濾同時(推薦使用)
GET movie_index/movie/_search
{
"query": {
"bool": {
"must": [
{"match": {
"name": "red"
}}
],
"filter": [
{"term": { "actorList.id": "1"}},
{"term": {"actorList.id": "3"}}
]
}
}
}
4.10 過濾--按范圍過濾
GET movie_index/movie/_search
{
"query": {
"range": {
"doubanScore": {
"gte": 6,
"lte": 8.5
}
}
}
}
關(guān)于范圍操作符:
| gt | 大于 |
|---|---|
| lt | 小于 |
| gte | 大于等于 great than or equals |
| lte | 小于等于 less than or equals |
4.11 排序
GET movie_index/movie/_search
{
"query":{
"match": {"name":"red sea"}
},
"sort":
{
"doubanScore": {
"order": "desc"
}
}
}
4.12 分頁查詢
from參數(shù)(基于0)指定從哪個文檔序號開始
size參數(shù)指定返回多少個文檔
這兩個參數(shù)對于搜索結(jié)果分頁非常有用。
注意,如果沒有指定from,則默認值為0。
GET movie_index/movie/_search
{
"query": { "match_all": {} },
"from": 1,
"size": 1
}
4.13 指定查詢的字段
GET movie_index/movie/_search
{
"query": { "match_all": {} },
"_source": ["name", "doubanScore"]
}
只顯示name和doubanScore字段
4.14 高亮
GET movie_index/movie/_search
{
"query":{
"match": {"name":"red sea"}
},
"highlight": {
"fields": {"name":{} }
}
}
對命中的詞進行高亮顯示
4.15 聚合
聚合提供了對數(shù)據(jù)進行分組、統(tǒng)計的能力,類似于SQL中Group By和SQL聚合函數(shù)。在ElasticSearch中,可以同時返回搜索結(jié)果及其聚合計算結(jié)果,這是非常強大和高效的。
需求1:取出每個演員共參演了多少部電影
GET movie_index/movie/_search
{
"aggs": {
"myAGG": {
"terms": {
"field": "actorList.name.keyword"
}
}
}
}
aggs : 表示聚合
myAGG:給聚合取的名字,
trems:表示分組,相當于groupBy
field:指定分組字段
需求2:每個演員參演電影的平均分是多少,并按評分排序
GET movie_index/movie/_search
{
"aggs": {
"groupby_actor_id": {
"terms": {
"field": "actorList.name.keyword" ,
"order": {
"avg_score": "desc"
}
},
"aggs": {
"avg_score":{
"avg": {
"field": "doubanScore"
}
}
}
}
}
}
.keyword 是某個字符串字段,專門儲存不分詞格式的副本,在某些場景中只允許只用不分詞的格式,
比如過濾filter比如聚合aggs, 所以字段要加上.keyword的后綴。
5. 分詞
5.1 查看英文單詞默認分詞情況
GET _analyze
{
"text":"hello world"
}
按照空格對單詞進行切分
{
"tokens" : [
{
"token" : "hello",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "world",
"start_offset" : 6,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}
5.2 查看中文默認分詞情況
GET _analyze
{
"text":"小米手機"
}
默認手機是按照每個漢字進行切分
{
"tokens" : [
{
"token" : "小",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
},
{
"token" : "米",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
},
{
"token" : "手",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 2
},
{
"token" : "機",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<IDEOGRAPHIC>",
"position" : 3
}
]
}
5.3 中文分詞器
通過上面的查詢,我們可以看到ES本身自帶的中文分詞,就是單純把中文一個字一個字的分開,根本沒有詞匯的概念。
但是實際應(yīng)用中,用戶都是以詞匯為條件,進行查詢匹配的,如果能夠把文章以詞匯為單位切分開,那么與用戶的查詢條件能夠更貼切的匹配上,查詢速度也更加快速。
常見的一些開源分詞器對比,我們使用IK分詞器
| 分詞器 | 優(yōu)勢 | 劣勢 |
|---|---|---|
| Smart Chinese Analysis | 官方插件 | 中文分詞效果慘不忍睹 |
| IKAnalyzer | 簡單易用,支持自定義詞典和遠程詞典 | 詞庫需要自行維護,不支持詞性識別 |
| 結(jié)巴分詞 | 新詞識別功能 | 不支持詞性識別 |
| Ansj中文分詞 | 分詞精準度不錯,支持詞性識別 | 對標hanlp詞庫略少,學(xué)習(xí)成本高 |
| Hanlp | 目前詞庫最完善,支持的特性非常多 | 需要更優(yōu)的分詞效果,學(xué)習(xí)成本高 |
5.4 IK分詞器的安裝及使用
- 下載地址
https://github.com/medcl/elasticsearch-analysis-ik
將相關(guān)上傳到/opt/software
-
解壓zip文件
unzip elasticsearch-analysis-ik-6.6.0.zip -d /opt/module/elasticsearch/plugins/ik注意
使用unzip進行解壓
-d指定解壓后的目錄
必須放到ES的plugins目錄下,并在plugins目錄下創(chuàng)建單獨的目錄
查看/opt/module/elasticsearch/plugins/ik/conf下的文件,分詞就是將所有詞匯分好放到文件中
-
分發(fā)
[root@node03 elasticsearch]# scp -r /opt/module/elasticsearch/plugins/ik root@node04:/opt/module/elasticsearch/plugins/ik [root@node03 elasticsearch]# scp -r /opt/module/elasticsearch/plugins/ik root@node05:/opt/module/elasticsearch/plugins/ik -
重啟ES
es-cluster.sh stop es-cluster.sh start -
測試使用
ik_smart
GET movie_index/_analyze { "analyzer": "ik_smart", "text": "我是中國人" }
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "中國人",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
}
]
}
ik_max_word
GET movie_index/_analyze
{
"analyzer": "ik_max_word",
"text": "我是中國人"
}
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "中國人",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "中國",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "國人",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 4
}
]
}
5.5 自定義詞庫-本地指定
有的時候,詞庫提供的詞并不包含項目中使用到的一些專業(yè)術(shù)語或者新興網(wǎng)絡(luò)用語,需要我們對詞庫進行補充。
具體步驟
-
通過配置本地目錄直接指定自定義詞庫
修改/opt/module/elasticsearch/plugins/ik/config/中的IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer 擴展配置</comment> <!--用戶可以在這里配置自己的擴展字典 --> <entry key="ext_dict">./myword.txt</entry> <!--用戶可以在這里配置自己的擴展停止詞字典--> <entry key="ext_stopwords"></entry> <!--用戶可以在這里配置遠程擴展字典 --> <!-- <entry key="remote_ext_dict">words_location</entry> --> <!--用戶可以在這里配置遠程擴展停止詞字典--> <!-- <entry key="remote_ext_stopwords">words_location</entry> --> </properties> -
在/opt/module/elasticsearch/plugins/ik/config/當前目錄下創(chuàng)建myword.txt
[root@node03 config]# vim myword.txt 藍瘦 藍瘦香菇 -
分發(fā)配置文件以及myword.txt
[root@node03 elasticsearch]# scp -r /opt/module/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml root@node04:/opt/module/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml [root@node03 elasticsearch]# scp -r /opt/module/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml root@node05:/opt/module/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml [root@node03 elasticsearch]# scp -r /opt/module/elasticsearch/plugins/ik/config/myword.txt root@node04:/opt/module/elasticsearch/plugins/ik/config/myword.txt [root@node03 elasticsearch]# scp -r /opt/module/elasticsearch/plugins/ik/config/myword.txt root@node05:/opt/module/elasticsearch/plugins/ik/config/myword.txt 重啟ES服務(wù)
es-cluster.sh stop
es-cluster.sh start
- 測試分詞效果
GET movie_index/_analyze
{
"analyzer": "ik_smart",
"text": "藍瘦香菇"
}
5.6 自定義詞庫-遠程指定
遠程配置一般是如下流程,我們這里簡易通過nginx模擬

-
修改/opt/module/elasticsearch/plugins/ik/config/中的IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer 擴展配置</comment> <!--用戶可以在這里配置自己的擴展字典 --> <!--<entry key="ext_dict"> </entry>--> <!--用戶可以在這里配置自己的擴展停止詞字典--> <!--<entry key="ext_stopwords"></entry>--> <!--用戶可以在這里配置遠程擴展字典 --> <entry key="remote_ext_dict">http://node03/fenci/myword.txt</entry> <!--用戶可以在這里配置遠程擴展停止詞字典--> <!-- <entry key="remote_ext_stopwords">words_location</entry> --> </properties>注意:將本地配置注釋掉
分發(fā)配置文件
在nginx.conf文件中配置靜態(tài)資源路徑
pwd
/opt/module/nginx/conf
[atguigu@node03 conf]$ vim nginx.conf
location /fenci{
root es;
}
- 在/opt/module/nginx/目錄下創(chuàng)建es/fenci目錄,并在es/fenci目錄下創(chuàng)建myword.txt
pwd
/opt/module/nginx/es/fenci
vim myword.txt
藍瘦
藍瘦香菇
- 啟動nginx
/opt/module/nginx/sbin/nginx
- 重啟ES服務(wù)測試nginx是否能夠訪問
es-cluster.sh stop
es-cluster.sh start
- 測試分詞效果
更新完成后,ES只會對新增的數(shù)據(jù)用新詞分詞。歷史數(shù)據(jù)是不會重新分詞的。如果想要歷史數(shù)據(jù)重新分詞。需要執(zhí)行:
POST movies_index_chn/_update_by_query?conflicts=proceed
6 關(guān)于mapping
Type可以理解為關(guān)系型數(shù)據(jù)庫的Table,那每個字段的數(shù)據(jù)類型是如何定義的?
實際上每個Type中的字段是什么數(shù)據(jù)類型,由mapping定義,如果我們在創(chuàng)建Index的時候,沒有設(shè)定mapping,系統(tǒng)會自動根據(jù)一條數(shù)據(jù)的格式來推斷出該數(shù)據(jù)對應(yīng)的字段類型,具體推斷類型如下:
true/false → boolean
1020 → long
20.1 → float
“2018-02-01” → date
“hello world” → text +keyword
默認只有text會進行分詞,keyword是不會分詞的字符串。mapping除了自動定義,還可以手動定義,但是只能對新加的、沒有數(shù)據(jù)的字段進行定義,一旦有了數(shù)據(jù)就無法再做修改了。
6.1 基于中文分詞搭建索引-自動定義mapping
- 直接創(chuàng)建Document
這個時候index不存在,建立文檔的時候自動創(chuàng)建index,同時mapping會自動定義
PUT /movie_chn_1/movie/1
{ "id":1,
"name":"紅海行動",
"doubanScore":8.5,
"actorList":[
{"id":1,"name":"張譯"},
{"id":2,"name":"海清"},
{"id":3,"name":"張涵予"}
]
}
PUT /movie_chn_1/movie/2
{
"id":2,
"name":"湄公河行動",
"doubanScore":8.0,
"actorList":[
{"id":3,"name":"張涵予"}
]
}
PUT /movie_chn_1/movie/3
{
"id":3,
"name":"紅海事件",
"doubanScore":5.0,
"actorList":[
{"id":4,"name":"張三豐"}
]
}
-
查看測試
GET /movie_chn_1/movie/_search { "query": { "match": { "name": "海行" } } }{ "took" : 23, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 0.5753642, "hits" : [ { "_index" : "movie_chn_1", "_type" : "movie", "_id" : "1", "_score" : 0.5753642, "_source" : { "id" : 1, "name" : "紅海行動", "doubanScore" : 8.5, "actorList" : [ { "id" : 1, "name" : "張譯" }, { "id" : 2, "name" : "海清" }, { "id" : 3, "name" : "張涵予" } ] } }, { "_index" : "movie_chn_1", "_type" : "movie", "_id" : "2", "_score" : 0.2876821, "_source" : { "id" : 2, "name" : "湄公河行動", "doubanScore" : 8.0, "actorList" : [ { "id" : 3, "name" : "張涵予" } ] } }, { "_index" : "movie_chn_1", "_type" : "movie", "_id" : "3", "_score" : 0.2876821, "_source" : { "id" : 3, "name" : "紅海事件", "doubanScore" : 5.0, "actorList" : [ { "id" : 4, "name" : "張三豐" } ] } } ] } }
-
分析結(jié)論
上面查詢“海行”命中了三條記錄,是因為我們在定義的Index的時候,沒有指定分詞器,使用的是默認的分詞器,對中文是按照每個漢字進行分詞的。
6.2 基于中文分詞搭建索引-手動定義mapping
-
定義Index,指定mapping
PUT movie_chn_2 { "mappings": { "movie":{ "properties": { "id":{ "type": "long" }, "name":{ "type": "text", "analyzer": "ik_smart" }, "doubanScore":{ "type": "double" }, "actorList":{ "properties": { "id":{ "type":"long" }, "name":{ "type":"keyword" } } } } } } } 向Index中放入Document
PUT /movie_chn_2/movie/1
{ "id":1,
"name":"紅海行動",
"doubanScore":8.5,
"actorList":[
{"id":1,"name":"張譯"},
{"id":2,"name":"海清"},
{"id":3,"name":"張涵予"}
]
}
PUT /movie_chn_2/movie/2
{
"id":2,
"name":"湄公河行動",
"doubanScore":8.0,
"actorList":[
{"id":3,"name":"張涵予"}
]
}
PUT /movie_chn_2/movie/3
{
"id":3,
"name":"紅海事件",
"doubanScore":5.0,
"actorList":[
{"id":4,"name":"張三豐"}
]
}}
- 查看手動定義的mapping
GET movie_chn_2/_mapping
{
"movie_chn_2" : {
"mappings" : {
"movie" : {
"properties" : {
"actorList" : {
"properties" : {
"id" : {
"type" : "long"
},
"name" : {
"type" : "keyword"
}
}
},
"doubanScore" : {
"type" : "double"
},
"id" : {
"type" : "long"
},
"name" : {
"type" : "text",
"analyzer" : "ik_smart"
}
}
}
}
}
}
- 分析結(jié)論
上面查詢沒有命中任何記錄,是因為我們在創(chuàng)建Index的時候,指定使用ik分詞器進行分詞
6.3 索引數(shù)據(jù)拷貝
ElasticSearch雖然強大,但是卻不能動態(tài)修改mapping到時候我們有時候需要修改結(jié)構(gòu)的時候不得不重新創(chuàng)建索引;
ElasticSearch為我們提供了一個reindex的命令,就是會將一個索引的快照數(shù)據(jù)copy到另一個索引,默認情況下存在相同的_id會進行覆蓋(一般不會發(fā)生,除非是將兩個索引的數(shù)據(jù)copy到一個索引中),可以使用POST _reindex命令將索引快照進行copy
POST _reindex
{
"source": {
"index": "my_index_name"
},
"dest": {
"index": "my_index_name_new"
}
}
7. 索引別名 _aliases
索引別名就像一個快捷方式或軟連接,可以指向一個或多個索引,也可以給任何一個需要索引名的API來使用。
7.1 創(chuàng)建索引別名
- 創(chuàng)建Index的時候聲明
PUT 索引名
{
"aliases": {
"索引別名": {}
}
}
#創(chuàng)建索引的時候,手動mapping,并指定別名
PUT movie_chn_3
{
"aliases": {
"movie_chn_3_aliase": {}
},
"mappings": {
"movie":{
"properties": {
"id":{
"type": "long"
},
"name":{
"type": "text",
"analyzer": "ik_smart"
},
"doubanScore":{
"type": "double"
},
"actorList":{
"properties": {
"id":{
"type":"long"
},
"name":{
"type":"keyword"
}
}
}
}
}
}
}
- 為已存在的索引增加別名
POST _aliases
{
"actions": [
{ "add":{ "index": "索引名", "alias": "索引別名" }}
]
}
#給movie_chn_3添加別名
POST _aliases
{
"actions": [
{ "add":{ "index": "movie_chn_3", "alias": "movie_chn_3_a2" }}
]
}
7.2 查詢別名列表
GET _cat/aliases?v
alias index filter routing.index routing.search
movie_chn_3_a2 movie_chn_3 - - -
movie_chn_3_aliase movie_chn_3 - - -
.kibana .kibana_1 - - -
7.3 使用索引別名查詢
與使用普通索引沒有區(qū)別
GET 索引別名/_search
7.4 刪除某個索引的別名
POST _aliases
{
"actions": [
{ "remove": { "index": "索引名", "alias": "索引別名" }}
]
}
POST _aliases
{
"actions": [
{ "remove": { "index": "movie_chn_3", "alias": "movie_chn_3_aliase" }}
]
}
7.5 使用場景
- 給多個索引分組 (例如, last_three_months)
POST _aliases
{
"actions": [
{ "add": { "index": "movie_chn_1", "alias": "movie_chn_query" }},
{ "add": { "index": "movie_chn_2", "alias": "movie_chn_query" }}
]
}
GET movie_chn_query/_search
- 給索引的一個子集創(chuàng)建視圖
相當于給Index加了一些過濾條件,縮小查詢范圍
POST _aliases
{
"actions": [
{
"add":
{
"index": "movie_chn_1",
"alias": "movie_chn_1_sub_query",
"filter": {
"term": { "actorList.id": "4"}
}
}
}
]
}
GET movie_chn_1_sub_query/_search
- 在運行的集群中可以無縫的從一個索引切換到另一個索引
POST /_aliases
{
"actions": [
{ "remove": { "index": "movie_chn_1", "alias": "movie_chn_query" }},
{ "remove": { "index": "movie_chn_2", "alias": "movie_chn_query" }},
{ "add": { "index": "movie_chn_3", "alias": "movie_chn_query" }}
]
}
整個操作都是原子的,不用擔(dān)心數(shù)據(jù)丟失或者重復(fù)的問題
8 索引模板
8.1 創(chuàng)建索引模板
PUT _template/template_movie2020
{
"index_patterns": ["movie_test*"],
"settings": {
"number_of_shards": 1
},
"aliases" : {
"{index}-query": {},
"movie_test-query":{}
},
"mappings": {
"_doc": {
"properties": {
"id": {
"type": "keyword"
},
"movie_name": {
"type": "text",
"analyzer": "ik_smart"
}
}
}
}
}
其中 "index_patterns": ["movie_test*"]的含義就是凡是往movie_test開頭的索引寫入數(shù)據(jù)時,如果索引不存在,那么ES會根據(jù)此模板自動建立索引。
在 "aliases" 中用{index}表示,獲得真正的創(chuàng)建的索引名。aliases中會創(chuàng)建兩個別名,一個是根據(jù)當前索引創(chuàng)建的,另一個是全局固定的別名。
8.2 測試
- 向索引中添加數(shù)據(jù)
POST movie_test_202011/_doc
{
"id":"333",
"name":"zhang3"
}
- 查詢Index的mapping,就是使用我們的索引模板創(chuàng)建的
GET movie_test_202011-query/_mapping
- 根據(jù)模板中取的別名查詢數(shù)據(jù)
GET movie_test-query/_search
8.3 查看系統(tǒng)中已有的模板清單
GET _cat/templates
8.4 查看某個模板詳情
GET _template/template_movie2020
或者
GET _template/template_movie*
8.5 使用場景
- 分割索引
分割索引就是根據(jù)時間間隔把一個業(yè)務(wù)索引切分成多個索引。
比如 把order_info 變成 order_info_20200101,order_info_20200102 …..
這樣做的好處有兩個:
-
結(jié)構(gòu)變化的靈活性
因為ES不允許對數(shù)據(jù)結(jié)構(gòu)進行修改。但是實際使用中索引的結(jié)構(gòu)和配置難免變化,那么只要對下一個間隔的索引進行修改,原來的索引維持原狀。這樣就有了一定的靈活性。
要想實現(xiàn)這個效果,我們只需要在需要變化的索引那天將模板重新建立即可。
-
查詢范圍優(yōu)化
因為一般情況并不會查詢?nèi)繒r間周期的數(shù)據(jù),那么通過切分索引,物理上減少了掃描數(shù)據(jù)的范圍,也是對性能的優(yōu)化。
8.6 注意
使用索引模板,一般在向索引中插入第一條數(shù)據(jù)創(chuàng)建索引,如果ES中的Shard特別多,有可能創(chuàng)建索引會變慢,如果延遲不能接受,可以不使用模板,使用定時腳本在頭一天提前建立第二天的索引。