ElasticSearch 系列 - RestFulAPI(DSL)

前言

DSL全稱 Domain Specific language,即特定領(lǐng)域?qū)S谜Z言

1.全局操作

1.1 查詢集群健康情況

GET /_cat/health?v          ?v表示顯示頭信息
集群的健康狀態(tài)有紅、黃、綠三個狀態(tài):
綠 – 一切正常(集群功能齊全)
黃 – 所有數(shù)據(jù)可用,但有些副本尚未分配(集群功能完全)
紅 – 有些數(shù)據(jù)不可用(集群部分功能)

1.2 查詢各個節(jié)點狀態(tài)

GET /_cat/nodes?v

2. 對索引的操作

2.1 查詢各個索引狀態(tài)

GET /_cat/indices?v

ES中會默認存在一些索引

health green(集群完整) yellow(單點正常、集群不完整) red(單點不正常)
status 是否能使用
index 索引名
uuid 索引統(tǒng)一編號
pri 主節(jié)點幾個分片
rep 從節(jié)點幾個(副本數(shù))
docs.count 文檔數(shù)
docs.deleted 文檔被刪了多少
store.size 整體占空間大小
pri.store.size 主節(jié)點占空間大小

2.2 創(chuàng)建索引

API:PUT 索引名?pretty
PUT movie_index?pretty

使用PUT創(chuàng)建名為“movie_index”的索引。末尾追加pretty,可以漂亮地打印JSON響應(yīng)(如果有的話)。

索引名命名要求:

  • 僅可能為小寫字母,不能下劃線開頭

  • 不能包括 , /, *, ?, ", <, >, |, 空格, 逗號, #

  • 7.0版本之前可以使用冒號:,但不建議使用并在7.0版本之后不再支持

  • 不能以這些字符 -, _, + 開頭

  • 不能包括 . 或 …

  • 長度不能超過 255 個字符

2.3 查詢某個索引的分片情況

API:GET /_cat/shards/索引名
GET /_cat/shards/movie_index

默認5個分片,1個副本。所以看到一共有10個分片,5個主,每一個主分片對應(yīng)一個副本,注意:同一個分片的主和副本肯定不在同一個節(jié)點上

2.4 刪除索引

API:DELETE /索引名
DELETE /movie_index

3. 對文檔進行操作

3.1 創(chuàng)建文檔

向索引movie_index中放入文檔,文檔ID分別為1,2,3

5.API:  PUT /索引名/類型名/文檔id
注意:文檔id和文檔中的屬性”id”不是一回事

PUT /movie_index/movie/1
{ "id":100,
  "name":"operation red sea",
  "doubanScore":8.5,
  "actorList":[  
{"id":1,"name":"zhang yi"},
{"id":2,"name":"hai qing"},
{"id":3,"name":"zhang han yu"}
]
}
PUT /movie_index/movie/2
{
  "id":200,
  "name":"operation meigong river",
  "doubanScore":8.0,
  "actorList":[  
{"id":3,"name":"zhang han yu"}
]
}

PUT /movie_index/movie/3
{
  "id":300,
  "name":"incident red sea",
  "doubanScore":5.0,
  "actorList":[  
{"id":4,"name":"zhang san feng"}
]
}

注意,Elasticsearch并不要求,先要有索引,才能將文檔編入索引。創(chuàng)建文檔時,如果指定索引不存在,將自動創(chuàng)建。默認創(chuàng)建的索引分片是5,副本是1,我們創(chuàng)建的文檔會在其中的某一個分片上存一份,副本上存一份,所以看到的響應(yīng)_shards-total:2

3.2 根據(jù)文檔id查看文檔

API:GET /索引名/類型名/文檔id
GET /movie_index/movie/1?pretty
{
  "_index" : "movie_index",
  "_type" : "movie",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 1,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "id" : 100,
    "name" : "operation red sea",
    "doubanScore" : 8.5,
    "actorList" : [
      {
        "id" : 1,
        "name" : "zhang yi"
      },
      {
        "id" : 2,
        "name" : "hai qing"
      },
      {
        "id" : 3,
        "name" : "zhang han yu"
      }
    ]
  }
}

這里有一個字段found為真,表示找到了一個ID為3的文檔,另一個字段_source,該字段返回完整JSON文檔。

3.3 查詢所有文檔

API:GET /索引名/_search 
Kinana中默認顯示10條,可以通過size控制
GET /movie_index/_search
{
    "size":10
}
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "id" : 200,
          "name" : "operation meigong river",
          "doubanScore" : 8.0,
          "actorList" : [
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "id" : 100,
          "name" : "operation red sea",
          "doubanScore" : 8.5,
          "actorList" : [
            {
              "id" : 1,
              "name" : "zhang yi"
            },
            {
              "id" : 2,
              "name" : "hai qing"
            },
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "id" : 300,
          "name" : "incident red sea",
          "doubanScore" : 5.0,
          "actorList" : [
            {
              "id" : 4,
              "name" : "zhang san feng"
            }
          ]
        }
      }
    ]
  }
}

took:執(zhí)行查詢花費的時間毫秒數(shù)

_shards=>total:搜索了多少個分片(當前表示搜索了全部5個分片)

3.4 根據(jù)文檔id刪除文檔

API:    DELETE /索引名/類型名/文檔id
DELETE /movie_index/movie/3
{
  "_index" : "movie_index",
  "_type" : "movie",
  "_id" : "3",
  "_version" : 2,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 4,
  "_primary_term" : 1
}

注意:刪除索引和刪除文檔的區(qū)別?

  • 刪除索引是會立即釋放空間的,不存在所謂的“標記”邏輯。

  • 刪除文檔的時候,是將新文檔寫入,同時將舊文檔標記為已刪除。 磁盤空間是否釋放取決于新舊文檔是否在同一個segment file里面,因此ES后臺的segment merge在合并segment file的過程中有可能觸發(fā)舊文檔的物理刪除。

  • 也可以手動執(zhí)行POST /_forcemerge進行合并觸發(fā)

3.5 替換文檔

  1. PUT(冪等性操作)

當我們通過執(zhí)行PUT 索引名/類型名/文檔id 命令的添加時候,如果文檔id已經(jīng)存在,那么再次執(zhí)行上面的命令,ElasticSearch將替換現(xiàn)有文檔。

PUT /movie_index/movie/3
{
  "id":300,
  "name":"incident red sea",
  "doubanScore":5.0,
  "actorList":[  
{"id":4,"name":"zhang cuishan"}
]
}
{
  "_index" : "movie_index",
  "_type" : "movie",
  "_id" : "3",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 6,
  "_primary_term" : 1
}

文檔id3已經(jīng)存在,會替換原來的文檔內(nèi)容

  1. POST(非冪等性操作)

    創(chuàng)建文檔時,ID部分是可選的。如果沒有指定,Elasticsearch將生成一個隨機ID,然后使用它來引用文檔。

    POST /movie_index/movie/
    {
      "id":300,
      "name":"incident red sea",
      "doubanScore":5.0,
      "actorList":[  
    {"id":4,"name":"zhang cuishan"}
    ]
    }
    
    {
      "_index" : "movie_index",
      "_type" : "movie",
      "_id" : "jyVMMHUBFYRAUn5_l-Ap",
      "_version" : 1,
      "result" : "created",
      "_shards" : {
        "total" : 2,
        "successful" : 2,
        "failed" : 0
      },
      "_seq_no" : 7,
      "_primary_term" : 1
    }
    
    

3.6 根據(jù)文檔id更新文檔

除了創(chuàng)建和替換文檔外,ES還可以更新文檔中的某一個字段內(nèi)容。

注意,Elasticsearch實際上并沒有在底層執(zhí)行就地更新,而是先刪除舊文檔,再添加新文檔。

API:
POST /索引名/類型名/文檔id/_update?pretty
{
  "doc": { "字段名": "新的字段值" }   doc固定寫法
}
需求:把文檔ID為3中的name字段更改為“wudang”:
POST /movie_index/movie/3/_update?pretty
{
  "doc": {"name":"wudang"}
}
{
  "_index" : "movie_index",
  "_type" : "movie",
  "_id" : "3",
  "_version" : 3,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 8,
  "_primary_term" : 1
}

3.7 根據(jù)條件更新文檔(了解)

POST /movie_index/_update_by_query
{
    "query": {
      "match":{
        "actorList.id":1
      }  
    },
    "script": {
      "lang": "painless",
      "source":"for(int i=0;i<ctx._source.actorList.length;i++){if(ctx._source.actorList[i].id==3){ctx._source.actorList[i].name='tttt'}}"
    }
}
{
  "took" : 118,
  "timed_out" : false,
  "total" : 1,
  "updated" : 1,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

3.8 刪除文檔屬性(了解)

POST /movie_index/movie/1/_update
{
    "script" : "ctx._source.remove('name')"
}
{
  "_index" : "movie_index",
  "_type" : "movie",
  "_id" : "1",
  "_version" : 4,
  "_seq_no" : 3,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "doubanScore" : 8.5,
    "actorList" : [
      {
        "name" : "zhang yi",
        "id" : 1
      },
      {
        "name" : "hai qing",
        "id" : 2
      },
      {
        "name" : "tttt",
        "id" : 3
      }
    ],
    "id" : 100
  }
}

3.9 根據(jù)條件刪除文檔(了解)

POST /movie_index/_delete_by_query
{
  "query": {
    "match_all": {}
  }
}
{
  "took" : 25,
  "timed_out" : false,
  "total" : 4,
  "deleted" : 4,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

3.10 批處理

除了對單個文檔執(zhí)行創(chuàng)建、更新和刪除之外,ElasticSearch還提供了使用_bulk API批量執(zhí)行上述操作的能力。

API:    POST /索引名/類型名/_bulk?pretty      _bulk表示批量操作
注意:Kibana要求批量操作的json內(nèi)容寫在同一行

需求1:在索引中批量創(chuàng)建兩個文檔

POST /movie_index/movie/_bulk
{"index":{"_id":66}}
{"id":300,"name":"incident red sea","doubanScore":5.0,"actorList":[{"id":4,"name":"zhang cuishan"}]}
{"index":{"_id":88}}
{"id":300,"name":"incident red sea","doubanScore":5.0,"actorList":[{"id":4,"name":"zhang cuishan"}]}
{
  "took" : 5,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "66",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 2,
          "failed" : 0
        },
        "_seq_no" : 5,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "88",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 2,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}

需求2:在一個批量操作中,先更新第一個文檔(ID為66),再刪除第二個文檔(ID為88)

POST /movie_index/movie/_bulk
{"update":{"_id":"66"}}
{"doc": { "name": "wudangshanshang" } }
{"delete":{"_id":"88"}}
{
  "took" : 8,
  "errors" : false,
  "items" : [
    {
      "update" : {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "66",
        "_version" : 2,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 2,
          "failed" : 0
        },
        "_seq_no" : 6,
        "_primary_term" : 1,
        "status" : 200
      }
    },
    {
      "delete" : {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "88",
        "_version" : 2,
        "result" : "deleted",
        "_shards" : {
          "total" : 2,
          "successful" : 2,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 200
      }
    }
  ]
}

4. 查詢操作

4.1 搜索參數(shù)傳遞有2種方法

  1. URI發(fā)送搜索參數(shù)查詢所有數(shù)據(jù)
GET /索引名/_search?q=* &pretty         
例如:GET /movie_index/_search?q=_id:66
這種方式不太適合復(fù)雜查詢場景,了解
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html 
  1. 請求體(request body)發(fā)送搜索參數(shù)查詢所有數(shù)據(jù)
GET /movie_index/_search
{
  "query": {
    "match_all": {}
  }
}

4.2 按條件查詢(全部)

GET movie_index/movie/_search
{
  "query":{
    "match_all": {}
  }
}

4.3 按分詞查詢(必須使用分詞text類型)

測試前:將movie_index索引中的數(shù)據(jù)恢復(fù)到初始的3條

GET movie_index/movie/_search
{
  "query":{
    "match": {"name":"operation red sea"}
  }
}

ES中,name屬性會進行分詞,底層以倒排索引的形式進行存儲,對查詢的內(nèi)容也會進行分詞,然后和文檔的name屬性內(nèi)容進行匹配,所以命中3次,不過命中的分值不同。

注意:ES底層在保存字符串數(shù)據(jù)的時候,會有兩種類型text和keyword

text:分詞

keyword:不分詞

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.8630463,
    "hits" : [
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "1",
        "_score" : 0.8630463,
        "_source" : {
          "id" : 100,
          "name" : "operation red sea",
          "doubanScore" : 8.5,
          "actorList" : [
            {
              "id" : 1,
              "name" : "zhang yi"
            },
            {
              "id" : 2,
              "name" : "hai qing"
            },
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "3",
        "_score" : 0.5753642,
        "_source" : {
          "id" : 300,
          "name" : "incident red sea",
          "doubanScore" : 5.0,
          "actorList" : [
            {
              "id" : 4,
              "name" : "zhang san feng"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "2",
        "_score" : 0.2876821,
        "_source" : {
          "id" : 200,
          "name" : "operation meigong river",
          "doubanScore" : 8.0,
          "actorList" : [
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      }
    ]
  }
}

4.4 按分詞子屬性查詢

GET movie_index/movie/_search
{
  "query":{
    "match": {"actorList.name":"zhang han yu"}
  }
}

返回3條件結(jié)果

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.970927,
    "hits" : [
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "1",
        "_score" : 0.970927,
        "_source" : {
          "id" : 100,
          "name" : "operation red sea",
          "doubanScore" : 8.5,
          "actorList" : [
            {
              "id" : 1,
              "name" : "zhang yi"
            },
            {
              "id" : 2,
              "name" : "hai qing"
            },
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "2",
        "_score" : 0.8630463,
        "_source" : {
          "id" : 200,
          "name" : "operation meigong river",
          "doubanScore" : 8.0,
          "actorList" : [
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "id" : 300,
          "name" : "incident red sea",
          "doubanScore" : 5.0,
          "actorList" : [
            {
              "id" : 4,
              "name" : "zhang san feng"
            }
          ]
        }
      }
    ]
  }
}

4.5 按短語查詢(相當于like %短語%)

按短語查詢,不再利用分詞技術(shù),直接用短語在原始數(shù)據(jù)中匹配

把演員名包含zhang han yu的查詢出來

GET movie_index/movie/_search
{
  "query":{
    "match_phrase": {"actorList.name":"zhang han yu"}
  }
}

返回2條結(jié)果,把演員名包含zhang han yu的查詢出來

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.8630463,
    "hits" : [
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "2",
        "_score" : 0.8630463,
        "_source" : {
          "id" : 200,
          "name" : "operation meigong river",
          "doubanScore" : 8.0,
          "actorList" : [
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "1",
        "_score" : 0.8630463,
        "_source" : {
          "id" : 100,
          "name" : "operation red sea",
          "doubanScore" : 8.5,
          "actorList" : [
            {
              "id" : 1,
              "name" : "zhang yi"
            },
            {
              "id" : 2,
              "name" : "hai qing"
            },
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      }
    ]
  }
}

4.6 通過term精準搜索匹配(必須使用keyword類型)

GET movie_index/movie/_search
{
  "query":{
    "term":{
    "actorList.name.keyword":"zhang han yu"
    }
  }
}

返回2條結(jié)果,把演員中完全匹配zhang han yu的查詢出來

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "2",
        "_score" : 0.2876821,
        "_source" : {
          "id" : 200,
          "name" : "operation meigong river",
          "doubanScore" : 8.0,
          "actorList" : [
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "id" : 100,
          "name" : "operation red sea",
          "doubanScore" : 8.5,
          "actorList" : [
            {
              "id" : 1,
              "name" : "zhang yi"
            },
            {
              "id" : 2,
              "name" : "hai qing"
            },
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      }
    ]
  }
}

4.7 fuzzy查詢(容錯匹配)

校正匹配分詞,當一個單詞都無法準確匹配,ES通過一種算法對非常接近的單詞也給與一定的評分,能夠查詢出來,但是消耗更多的性能,對中文來講,實現(xiàn)不是特別好。

GET movie_index/movie/_search
{
    "query":{
      "fuzzy": {"name":"rad"}
    }
}

返回2個結(jié)果,會把incident red sea和operation red sea匹配上

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.19178805,
    "hits" : [
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "1",
        "_score" : 0.19178805,
        "_source" : {
          "id" : 100,
          "name" : "operation red sea",
          "doubanScore" : 8.5,
          "actorList" : [
            {
              "id" : 1,
              "name" : "zhang yi"
            },
            {
              "id" : 2,
              "name" : "hai qing"
            },
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "3",
        "_score" : 0.19178805,
        "_source" : {
          "id" : 300,
          "name" : "incident red sea",
          "doubanScore" : 5.0,
          "actorList" : [
            {
              "id" : 4,
              "name" : "zhang san feng"
            }
          ]
        }
      }
    ]
  }
}

4.8 過濾—先匹配,再過濾

GET movie_index/movie/_search
{
    "query":{
      "match": {"name":"red"}
    },
    "post_filter":{
      "term": {
        "actorList.id": 3
      }
    }
}

4.9 過濾—匹配和過濾同時(推薦使用)

GET movie_index/movie/_search
{
  "query": {
    "bool": {
       "must": [
        {"match": {
          "name": "red"
        }}
      ],
      "filter": [
        {"term": { "actorList.id": "1"}},
        {"term": {"actorList.id": "3"}}
      ]
    }
  }
}

4.10 過濾--按范圍過濾

GET movie_index/movie/_search
{
  "query": {
    "range": {
      "doubanScore": {
        "gte": 6,
        "lte": 8.5
      }
    }
  }
}

關(guān)于范圍操作符:

gt 大于
lt 小于
gte 大于等于 great than or equals
lte 小于等于 less than or equals

4.11 排序

GET movie_index/movie/_search
{
  "query":{
    "match": {"name":"red sea"}
  },
 "sort":
    {
      "doubanScore": {
        "order": "desc"
      }
    }
}

4.12 分頁查詢

from參數(shù)(基于0)指定從哪個文檔序號開始

size參數(shù)指定返回多少個文檔

這兩個參數(shù)對于搜索結(jié)果分頁非常有用。

注意,如果沒有指定from,則默認值為0。

GET movie_index/movie/_search
{
  "query": { "match_all": {} },
  "from": 1,
  "size": 1
}

4.13 指定查詢的字段

GET movie_index/movie/_search
{
  "query": { "match_all": {} },
  "_source": ["name", "doubanScore"]
}

只顯示name和doubanScore字段

4.14 高亮

GET movie_index/movie/_search
{
    "query":{
      "match": {"name":"red sea"}
    },
    "highlight": {
      "fields": {"name":{} }
    }
}

對命中的詞進行高亮顯示

4.15 聚合

聚合提供了對數(shù)據(jù)進行分組、統(tǒng)計的能力,類似于SQL中Group By和SQL聚合函數(shù)。在ElasticSearch中,可以同時返回搜索結(jié)果及其聚合計算結(jié)果,這是非常強大和高效的。

需求1:取出每個演員共參演了多少部電影

GET movie_index/movie/_search
{
  "aggs": {
    "myAGG": {
      "terms": {
        "field": "actorList.name.keyword"
      }
    }
  }
}

aggs : 表示聚合

myAGG:給聚合取的名字,

trems:表示分組,相當于groupBy

field:指定分組字段

需求2:每個演員參演電影的平均分是多少,并按評分排序

GET movie_index/movie/_search
{ 
  "aggs": {
    "groupby_actor_id": {
      "terms": {
        "field": "actorList.name.keyword" ,
        "order": {
          "avg_score": "desc"
          }
      },
      "aggs": {
        "avg_score":{
          "avg": {
            "field": "doubanScore" 
          }
        }
       }
    } 
  }
}

.keyword 是某個字符串字段,專門儲存不分詞格式的副本,在某些場景中只允許只用不分詞的格式,

比如過濾filter比如聚合aggs, 所以字段要加上.keyword的后綴。

5. 分詞

5.1 查看英文單詞默認分詞情況

GET _analyze
{
  "text":"hello world"
}

按照空格對單詞進行切分

{
  "tokens" : [
    {
      "token" : "hello",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "world",
      "start_offset" : 6,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

5.2 查看中文默認分詞情況

GET _analyze
{
  "text":"小米手機"
}

默認手機是按照每個漢字進行切分

{
  "tokens" : [
    {
      "token" : "小",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "米",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "手",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "機",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    }
  ]
}

5.3 中文分詞器

通過上面的查詢,我們可以看到ES本身自帶的中文分詞,就是單純把中文一個字一個字的分開,根本沒有詞匯的概念。

但是實際應(yīng)用中,用戶都是以詞匯為條件,進行查詢匹配的,如果能夠把文章以詞匯為單位切分開,那么與用戶的查詢條件能夠更貼切的匹配上,查詢速度也更加快速。

常見的一些開源分詞器對比,我們使用IK分詞器

分詞器 優(yōu)勢 劣勢
Smart Chinese Analysis 官方插件 中文分詞效果慘不忍睹
IKAnalyzer 簡單易用,支持自定義詞典和遠程詞典 詞庫需要自行維護,不支持詞性識別
結(jié)巴分詞 新詞識別功能 不支持詞性識別
Ansj中文分詞 分詞精準度不錯,支持詞性識別 對標hanlp詞庫略少,學(xué)習(xí)成本高
Hanlp 目前詞庫最完善,支持的特性非常多 需要更優(yōu)的分詞效果,學(xué)習(xí)成本高

5.4 IK分詞器的安裝及使用

  • 下載地址

https://github.com/medcl/elasticsearch-analysis-ik

  • 將相關(guān)上傳到/opt/software

  • 解壓zip文件

    unzip elasticsearch-analysis-ik-6.6.0.zip -d /opt/module/elasticsearch/plugins/ik
    

    注意

    使用unzip進行解壓

    -d指定解壓后的目錄

    必須放到ES的plugins目錄下,并在plugins目錄下創(chuàng)建單獨的目錄

  • 查看/opt/module/elasticsearch/plugins/ik/conf下的文件,分詞就是將所有詞匯分好放到文件中

  • 分發(fā)

    [root@node03 elasticsearch]# scp -r  /opt/module/elasticsearch/plugins/ik root@node04:/opt/module/elasticsearch/plugins/ik
    [root@node03 elasticsearch]# scp -r  /opt/module/elasticsearch/plugins/ik root@node05:/opt/module/elasticsearch/plugins/ik
    
  • 重啟ES

    es-cluster.sh stop
    es-cluster.sh start
    
  • 測試使用

    ik_smart

    GET movie_index/_analyze
    {  
      "analyzer": "ik_smart", 
      "text": "我是中國人"
    }
    
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中國人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

ik_max_word

GET movie_index/_analyze
{  
  "analyzer": "ik_max_word", 
  "text": "我是中國人"
}
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中國人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "中國",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "國人",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

5.5 自定義詞庫-本地指定

有的時候,詞庫提供的詞并不包含項目中使用到的一些專業(yè)術(shù)語或者新興網(wǎng)絡(luò)用語,需要我們對詞庫進行補充。

具體步驟

  1. 通過配置本地目錄直接指定自定義詞庫

    修改/opt/module/elasticsearch/plugins/ik/config/中的IKAnalyzer.cfg.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
    <properties>
            <comment>IK Analyzer 擴展配置</comment>
            <!--用戶可以在這里配置自己的擴展字典 -->
            <entry key="ext_dict">./myword.txt</entry>
             <!--用戶可以在這里配置自己的擴展停止詞字典-->
            <entry key="ext_stopwords"></entry>
            <!--用戶可以在這里配置遠程擴展字典 -->
            <!-- <entry key="remote_ext_dict">words_location</entry> -->
            <!--用戶可以在這里配置遠程擴展停止詞字典-->
            <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
    </properties>
    
  2. 在/opt/module/elasticsearch/plugins/ik/config/當前目錄下創(chuàng)建myword.txt

    [root@node03 config]# vim myword.txt
    
    藍瘦
    藍瘦香菇
    
  3. 分發(fā)配置文件以及myword.txt

    [root@node03 elasticsearch]# scp -r  /opt/module/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml root@node04:/opt/module/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml 
    [root@node03 elasticsearch]# scp -r  /opt/module/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml root@node05:/opt/module/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml 
    
    [root@node03 elasticsearch]# scp -r  /opt/module/elasticsearch/plugins/ik/config/myword.txt root@node04:/opt/module/elasticsearch/plugins/ik/config/myword.txt
    [root@node03 elasticsearch]# scp -r  /opt/module/elasticsearch/plugins/ik/config/myword.txt root@node05:/opt/module/elasticsearch/plugins/ik/config/myword.txt 
    
    
  4. 重啟ES服務(wù)

es-cluster.sh stop
es-cluster.sh start
  1. 測試分詞效果
GET movie_index/_analyze
{  
  "analyzer": "ik_smart", 
  "text": "藍瘦香菇"
}

5.6 自定義詞庫-遠程指定

遠程配置一般是如下流程,我們這里簡易通過nginx模擬

自定義詞庫遠程指定.png
  1. 修改/opt/module/elasticsearch/plugins/ik/config/中的IKAnalyzer.cfg.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
    <properties>
            <comment>IK Analyzer 擴展配置</comment>
            <!--用戶可以在這里配置自己的擴展字典 -->
            <!--<entry key="ext_dict"> </entry>-->
             <!--用戶可以在這里配置自己的擴展停止詞字典-->
            <!--<entry key="ext_stopwords"></entry>-->
            <!--用戶可以在這里配置遠程擴展字典 -->
            <entry key="remote_ext_dict">http://node03/fenci/myword.txt</entry>
            <!--用戶可以在這里配置遠程擴展停止詞字典-->
            <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
    </properties>
    

    注意:將本地配置注釋掉

  2. 分發(fā)配置文件

  3. 在nginx.conf文件中配置靜態(tài)資源路徑

pwd
/opt/module/nginx/conf
[atguigu@node03 conf]$ vim nginx.conf
location /fenci{
     root es;
}
  1. 在/opt/module/nginx/目錄下創(chuàng)建es/fenci目錄,并在es/fenci目錄下創(chuàng)建myword.txt
pwd
/opt/module/nginx/es/fenci

vim myword.txt
藍瘦
藍瘦香菇
  1. 啟動nginx
/opt/module/nginx/sbin/nginx
  1. 重啟ES服務(wù)測試nginx是否能夠訪問
es-cluster.sh stop
es-cluster.sh start
  1. 測試分詞效果

更新完成后,ES只會對新增的數(shù)據(jù)用新詞分詞。歷史數(shù)據(jù)是不會重新分詞的。如果想要歷史數(shù)據(jù)重新分詞。需要執(zhí)行:

POST movies_index_chn/_update_by_query?conflicts=proceed

6 關(guān)于mapping

Type可以理解為關(guān)系型數(shù)據(jù)庫的Table,那每個字段的數(shù)據(jù)類型是如何定義的?

實際上每個Type中的字段是什么數(shù)據(jù)類型,由mapping定義,如果我們在創(chuàng)建Index的時候,沒有設(shè)定mapping,系統(tǒng)會自動根據(jù)一條數(shù)據(jù)的格式來推斷出該數(shù)據(jù)對應(yīng)的字段類型,具體推斷類型如下:

  • true/false → boolean

  • 1020 → long

  • 20.1 → float

  • “2018-02-01” → date

  • “hello world” → text +keyword

默認只有text會進行分詞,keyword是不會分詞的字符串。mapping除了自動定義,還可以手動定義,但是只能對新加的、沒有數(shù)據(jù)的字段進行定義,一旦有了數(shù)據(jù)就無法再做修改了。

6.1 基于中文分詞搭建索引-自動定義mapping

  1. 直接創(chuàng)建Document

這個時候index不存在,建立文檔的時候自動創(chuàng)建index,同時mapping會自動定義

PUT /movie_chn_1/movie/1
{ "id":1,
  "name":"紅海行動",
  "doubanScore":8.5,
  "actorList":[  
  {"id":1,"name":"張譯"},
  {"id":2,"name":"海清"},
  {"id":3,"name":"張涵予"}
 ]
}
PUT /movie_chn_1/movie/2
{
  "id":2,
  "name":"湄公河行動",
  "doubanScore":8.0,
  "actorList":[  
{"id":3,"name":"張涵予"}
]
}

PUT /movie_chn_1/movie/3
{
  "id":3,
  "name":"紅海事件",
  "doubanScore":5.0,
  "actorList":[  
{"id":4,"name":"張三豐"}
]
}
  1. 查看測試

    GET /movie_chn_1/movie/_search
    {
      "query": {
        "match": {
          "name": "海行"
        }
      }
    }
    
    {
      "took" : 23,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 3,
        "max_score" : 0.5753642,
        "hits" : [
          {
            "_index" : "movie_chn_1",
            "_type" : "movie",
            "_id" : "1",
            "_score" : 0.5753642,
            "_source" : {
              "id" : 1,
              "name" : "紅海行動",
              "doubanScore" : 8.5,
              "actorList" : [
                {
                  "id" : 1,
                  "name" : "張譯"
                },
                {
                  "id" : 2,
                  "name" : "海清"
                },
                {
                  "id" : 3,
                  "name" : "張涵予"
                }
              ]
            }
          },
          {
            "_index" : "movie_chn_1",
            "_type" : "movie",
            "_id" : "2",
            "_score" : 0.2876821,
            "_source" : {
              "id" : 2,
              "name" : "湄公河行動",
              "doubanScore" : 8.0,
              "actorList" : [
                {
                  "id" : 3,
                  "name" : "張涵予"
                }
              ]
            }
          },
          {
            "_index" : "movie_chn_1",
            "_type" : "movie",
            "_id" : "3",
            "_score" : 0.2876821,
            "_source" : {
              "id" : 3,
              "name" : "紅海事件",
              "doubanScore" : 5.0,
              "actorList" : [
                {
                  "id" : 4,
                  "name" : "張三豐"
                }
              ]
            }
          }
        ]
      }
    }
    
    
  1. 分析結(jié)論

    上面查詢“海行”命中了三條記錄,是因為我們在定義的Index的時候,沒有指定分詞器,使用的是默認的分詞器,對中文是按照每個漢字進行分詞的。

6.2 基于中文分詞搭建索引-手動定義mapping

  1. 定義Index,指定mapping

    PUT movie_chn_2
    {
      "mappings": {
        "movie":{
          "properties": {
            "id":{
              "type": "long"
            },
            "name":{
              "type": "text", 
              "analyzer": "ik_smart"
            },
            "doubanScore":{
              "type": "double"
            },
            "actorList":{
              "properties": {
                "id":{
                  "type":"long"
                },
                "name":{
                  "type":"keyword"
                }
              }
            }
          }
        }
      }
    }
    
  2. 向Index中放入Document

PUT /movie_chn_2/movie/1
{ "id":1,
  "name":"紅海行動",
  "doubanScore":8.5,
  "actorList":[  
  {"id":1,"name":"張譯"},
  {"id":2,"name":"海清"},
  {"id":3,"name":"張涵予"}
 ]
}

PUT /movie_chn_2/movie/2
{
  "id":2,
  "name":"湄公河行動",
  "doubanScore":8.0,
  "actorList":[  
{"id":3,"name":"張涵予"}
]
}

PUT /movie_chn_2/movie/3
{
  "id":3,
  "name":"紅海事件",
  "doubanScore":5.0,
  "actorList":[  
{"id":4,"name":"張三豐"}
]
}}
  1. 查看手動定義的mapping
GET movie_chn_2/_mapping
{
  "movie_chn_2" : {
    "mappings" : {
      "movie" : {
        "properties" : {
          "actorList" : {
            "properties" : {
              "id" : {
                "type" : "long"
              },
              "name" : {
                "type" : "keyword"
              }
            }
          },
          "doubanScore" : {
            "type" : "double"
          },
          "id" : {
            "type" : "long"
          },
          "name" : {
            "type" : "text",
            "analyzer" : "ik_smart"
          }
        }
      }
    }
  }
}
  1. 分析結(jié)論

上面查詢沒有命中任何記錄,是因為我們在創(chuàng)建Index的時候,指定使用ik分詞器進行分詞

6.3 索引數(shù)據(jù)拷貝

ElasticSearch雖然強大,但是卻不能動態(tài)修改mapping到時候我們有時候需要修改結(jié)構(gòu)的時候不得不重新創(chuàng)建索引;

ElasticSearch為我們提供了一個reindex的命令,就是會將一個索引的快照數(shù)據(jù)copy到另一個索引,默認情況下存在相同的_id會進行覆蓋(一般不會發(fā)生,除非是將兩個索引的數(shù)據(jù)copy到一個索引中),可以使用POST _reindex命令將索引快照進行copy

POST _reindex
    {
      "source": {
        "index": "my_index_name"
      },
      "dest": {
        "index": "my_index_name_new"
      }
    }

7. 索引別名 _aliases

索引別名就像一個快捷方式或軟連接,可以指向一個或多個索引,也可以給任何一個需要索引名的API來使用。

7.1 創(chuàng)建索引別名

  1. 創(chuàng)建Index的時候聲明
PUT 索引名
{  
 "aliases": {
      "索引別名": {}
  }
}
#創(chuàng)建索引的時候,手動mapping,并指定別名

PUT movie_chn_3
{
  "aliases": {
      "movie_chn_3_aliase": {}
  },
  "mappings": {
    "movie":{
      "properties": {
        "id":{
          "type": "long"
        },
        "name":{
          "type": "text", 
          "analyzer": "ik_smart"
        },
        "doubanScore":{
          "type": "double"
        },
        "actorList":{
          "properties": {
            "id":{
              "type":"long"
            },
            "name":{
              "type":"keyword"
            }
          }
        }
      }
    }
  }
}
  1. 為已存在的索引增加別名
POST  _aliases
{
    "actions": [
        { "add":{ "index": "索引名", "alias": "索引別名" }}
    ]
}    
#給movie_chn_3添加別名

POST  _aliases
{
    "actions": [
        { "add":{ "index": "movie_chn_3", "alias": "movie_chn_3_a2" }}
    ]
}

7.2 查詢別名列表

GET  _cat/aliases?v
alias              index       filter routing.index routing.search
movie_chn_3_a2     movie_chn_3 -      -             -
movie_chn_3_aliase movie_chn_3 -      -             -
.kibana            .kibana_1   -      -             -

7.3 使用索引別名查詢

與使用普通索引沒有區(qū)別

GET 索引別名/_search

7.4 刪除某個索引的別名

POST  _aliases
{
    "actions": [
        { "remove":    { "index": "索引名", "alias": "索引別名" }}
    ]
}
POST  _aliases
{
    "actions": [
        { "remove":    { "index": "movie_chn_3", "alias": "movie_chn_3_aliase" }}
    ]
}

7.5 使用場景

  1. 給多個索引分組 (例如, last_three_months)
POST  _aliases
{
    "actions": [
        { "add":    { "index": "movie_chn_1", "alias": "movie_chn_query" }},
        { "add":    { "index": "movie_chn_2", "alias": "movie_chn_query" }}
    ]
}
GET movie_chn_query/_search
  1. 給索引的一個子集創(chuàng)建視圖

相當于給Index加了一些過濾條件,縮小查詢范圍

POST  _aliases
{
    "actions": [
        { 
          "add":    
          { 
            "index": "movie_chn_1", 
            "alias": "movie_chn_1_sub_query",
            "filter": {
                "term": {  "actorList.id": "4"}
            }
          }
        }
    ]
}
GET movie_chn_1_sub_query/_search
  1. 在運行的集群中可以無縫的從一個索引切換到另一個索引
POST /_aliases
{
    "actions": [
        { "remove": { "index": "movie_chn_1", "alias": "movie_chn_query" }},
        { "remove": { "index": "movie_chn_2", "alias": "movie_chn_query" }},
        { "add":    { "index": "movie_chn_3", "alias": "movie_chn_query" }}
    ]
}
整個操作都是原子的,不用擔(dān)心數(shù)據(jù)丟失或者重復(fù)的問題

8 索引模板

8.1 創(chuàng)建索引模板

PUT _template/template_movie2020
{
  "index_patterns": ["movie_test*"],                  
  "settings": {                                               
    "number_of_shards": 1
  },
  "aliases" : { 
    "{index}-query": {},
    "movie_test-query":{}
  },
  "mappings": {                                          
    "_doc": {
      "properties": {
        "id": {
          "type": "keyword"
        },
        "movie_name": {
          "type": "text",
          "analyzer": "ik_smart"
        }
      }
    }
  }
}

其中 "index_patterns": ["movie_test*"]的含義就是凡是往movie_test開頭的索引寫入數(shù)據(jù)時,如果索引不存在,那么ES會根據(jù)此模板自動建立索引。

在 "aliases" 中用{index}表示,獲得真正的創(chuàng)建的索引名。aliases中會創(chuàng)建兩個別名,一個是根據(jù)當前索引創(chuàng)建的,另一個是全局固定的別名。

8.2 測試

  1. 向索引中添加數(shù)據(jù)
POST movie_test_202011/_doc
{
  "id":"333",
  "name":"zhang3"
}
  1. 查詢Index的mapping,就是使用我們的索引模板創(chuàng)建的
GET movie_test_202011-query/_mapping
  1. 根據(jù)模板中取的別名查詢數(shù)據(jù)
GET movie_test-query/_search

8.3 查看系統(tǒng)中已有的模板清單

GET  _cat/templates

8.4 查看某個模板詳情

GET  _template/template_movie2020
或者
GET  _template/template_movie*

8.5 使用場景

  1. 分割索引

分割索引就是根據(jù)時間間隔把一個業(yè)務(wù)索引切分成多個索引。

比如 把order_info  變成 order_info_20200101,order_info_20200102 …..

這樣做的好處有兩個:

  • 結(jié)構(gòu)變化的靈活性

    因為ES不允許對數(shù)據(jù)結(jié)構(gòu)進行修改。但是實際使用中索引的結(jié)構(gòu)和配置難免變化,那么只要對下一個間隔的索引進行修改,原來的索引維持原狀。這樣就有了一定的靈活性。

    要想實現(xiàn)這個效果,我們只需要在需要變化的索引那天將模板重新建立即可。

  • 查詢范圍優(yōu)化

    因為一般情況并不會查詢?nèi)繒r間周期的數(shù)據(jù),那么通過切分索引,物理上減少了掃描數(shù)據(jù)的范圍,也是對性能的優(yōu)化。

8.6 注意

使用索引模板,一般在向索引中插入第一條數(shù)據(jù)創(chuàng)建索引,如果ES中的Shard特別多,有可能創(chuàng)建索引會變慢,如果延遲不能接受,可以不使用模板,使用定時腳本在頭一天提前建立第二天的索引。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容