前言

DSL全稱 Domain Specific language，即特定領(lǐng)域?qū)Ｓ谜Z言

1.全局操作

1.1 查詢集群健康情況

GET /_cat/health?v          ?v表示顯示頭信息

集群的健康狀態(tài)有紅、黃、綠三個狀態(tài)：
綠 – 一切正常(集群功能齊全)
黃 – 所有數(shù)據(jù)可用，但有些副本尚未分配(集群功能完全)
紅 – 有些數(shù)據(jù)不可用(集群部分功能)

1.2 查詢各個節(jié)點狀態(tài)

GET /_cat/nodes?v

2. 對索引的操作

2.1 查詢各個索引狀態(tài)

GET /_cat/indices?v

ES中會默認存在一些索引

health	green(集群完整) yellow(單點正常、集群不完整) red(單點不正常)
status	是否能使用
index	索引名
uuid	索引統(tǒng)一編號
pri	主節(jié)點幾個分片
rep	從節(jié)點幾個（副本數(shù)）
docs.count	文檔數(shù)
docs.deleted	文檔被刪了多少
store.size	整體占空間大小
pri.store.size	主節(jié)點占空間大小

2.2 創(chuàng)建索引

API：PUT 索引名?pretty
PUT movie_index?pretty

使用PUT創(chuàng)建名為“movie_index”的索引。末尾追加pretty，可以漂亮地打印JSON響應(yīng)(如果有的話)。

索引名命名要求：

僅可能為小寫字母，不能下劃線開頭
不能包括 , /, *, ?, ", <, >, |, 空格, 逗號, #
7.0版本之前可以使用冒號:，但不建議使用并在7.0版本之后不再支持
不能以這些字符 -, _, + 開頭
不能包括 . 或 …
長度不能超過 255 個字符

2.3 查詢某個索引的分片情況

API：GET /_cat/shards/索引名
GET /_cat/shards/movie_index

默認5個分片，1個副本。所以看到一共有10個分片，5個主，每一個主分片對應(yīng)一個副本，注意：同一個分片的主和副本肯定不在同一個節(jié)點上

2.4 刪除索引

API：DELETE /索引名
DELETE /movie_index

3. 對文檔進行操作

3.1 創(chuàng)建文檔

向索引movie_index中放入文檔，文檔ID分別為1，2，3

5.API:  PUT /索引名/類型名/文檔id
注意：文檔id和文檔中的屬性”id”不是一回事

PUT /movie_index/movie/1
{ "id":100,
  "name":"operation red sea",
  "doubanScore":8.5,
  "actorList":[  
{"id":1,"name":"zhang yi"},
{"id":2,"name":"hai qing"},
{"id":3,"name":"zhang han yu"}
]
}
PUT /movie_index/movie/2
{
  "id":200,
  "name":"operation meigong river",
  "doubanScore":8.0,
  "actorList":[  
{"id":3,"name":"zhang han yu"}
]
}

PUT /movie_index/movie/3
{
  "id":300,
  "name":"incident red sea",
  "doubanScore":5.0,
  "actorList":[  
{"id":4,"name":"zhang san feng"}
]
}

注意，Elasticsearch并不要求，先要有索引，才能將文檔編入索引。創(chuàng)建文檔時，如果指定索引不存在，將自動創(chuàng)建。默認創(chuàng)建的索引分片是5，副本是1，我們創(chuàng)建的文檔會在其中的某一個分片上存一份，副本上存一份，所以看到的響應(yīng)_shards-total:2

3.2 根據(jù)文檔id查看文檔

API：GET /索引名/類型名/文檔id
GET /movie_index/movie/1?pretty

{
  "_index" : "movie_index",
  "_type" : "movie",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 1,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "id" : 100,
    "name" : "operation red sea",
    "doubanScore" : 8.5,
    "actorList" : [
      {
        "id" : 1,
        "name" : "zhang yi"
      },
      {
        "id" : 2,
        "name" : "hai qing"
      },
      {
        "id" : 3,
        "name" : "zhang han yu"
      }
    ]
  }
}

這里有一個字段found為真，表示找到了一個ID為3的文檔，另一個字段_source，該字段返回完整JSON文檔。

3.3 查詢所有文檔

API：GET /索引名/_search 
Kinana中默認顯示10條，可以通過size控制
GET /movie_index/_search
{
    "size":10
}

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "id" : 200,
          "name" : "operation meigong river",
          "doubanScore" : 8.0,
          "actorList" : [
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "id" : 100,
          "name" : "operation red sea",
          "doubanScore" : 8.5,
          "actorList" : [
            {
              "id" : 1,
              "name" : "zhang yi"
            },
            {
              "id" : 2,
              "name" : "hai qing"
            },
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "id" : 300,
          "name" : "incident red sea",
          "doubanScore" : 5.0,
          "actorList" : [
            {
              "id" : 4,
              "name" : "zhang san feng"
            }
          ]
        }
      }
    ]
  }
}

took:執(zhí)行查詢花費的時間毫秒數(shù)

_shards=>total：搜索了多少個分片（當前表示搜索了全部5個分片）

3.4 根據(jù)文檔id刪除文檔

API:    DELETE /索引名/類型名/文檔id
DELETE /movie_index/movie/3

{
  "_index" : "movie_index",
  "_type" : "movie",
  "_id" : "3",
  "_version" : 2,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 4,
  "_primary_term" : 1
}

注意：刪除索引和刪除文檔的區(qū)別？

刪除索引是會立即釋放空間的，不存在所謂的“標記”邏輯。
刪除文檔的時候，是將新文檔寫入，同時將舊文檔標記為已刪除。磁盤空間是否釋放取決于新舊文檔是否在同一個segment file里面，因此ES后臺的segment merge在合并segment file的過程中有可能觸發(fā)舊文檔的物理刪除。
也可以手動執(zhí)行POST /_forcemerge進行合并觸發(fā)

3.5 替換文檔

PUT(冪等性操作)

當我們通過執(zhí)行PUT 索引名/類型名/文檔id 命令的添加時候，如果文檔id已經(jīng)存在，那么再次執(zhí)行上面的命令，ElasticSearch將替換現(xiàn)有文檔。

PUT /movie_index/movie/3
{
  "id":300,
  "name":"incident red sea",
  "doubanScore":5.0,
  "actorList":[  
{"id":4,"name":"zhang cuishan"}
]
}

{
  "_index" : "movie_index",
  "_type" : "movie",
  "_id" : "3",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 6,
  "_primary_term" : 1
}

文檔id3已經(jīng)存在，會替換原來的文檔內(nèi)容

POST(非冪等性操作)

創(chuàng)建文檔時，ID部分是可選的。如果沒有指定，Elasticsearch將生成一個隨機ID，然后使用它來引用文檔。

POST /movie_index/movie/
{
  "id":300,
  "name":"incident red sea",
  "doubanScore":5.0,
  "actorList":[  
{"id":4,"name":"zhang cuishan"}
]
}

{
  "_index" : "movie_index",
  "_type" : "movie",
  "_id" : "jyVMMHUBFYRAUn5_l-Ap",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 7,
  "_primary_term" : 1
}

3.6 根據(jù)文檔id更新文檔

除了創(chuàng)建和替換文檔外，ES還可以更新文檔中的某一個字段內(nèi)容。

注意，Elasticsearch實際上并沒有在底層執(zhí)行就地更新，而是先刪除舊文檔，再添加新文檔。

API：
POST /索引名/類型名/文檔id/_update?pretty
{
  "doc": { "字段名": "新的字段值" }   doc固定寫法
}
需求：把文檔ID為3中的name字段更改為“wudang”:

POST /movie_index/movie/3/_update?pretty
{
  "doc": {"name":"wudang"}
}

{
  "_index" : "movie_index",
  "_type" : "movie",
  "_id" : "3",
  "_version" : 3,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 8,
  "_primary_term" : 1
}

3.7 根據(jù)條件更新文檔（了解）

POST /movie_index/_update_by_query
{
    "query": {
      "match":{
        "actorList.id":1
      }  
    },
    "script": {
      "lang": "painless",
      "source":"for(int i=0;i<ctx._source.actorList.length;i++){if(ctx._source.actorList[i].id==3){ctx._source.actorList[i].name='tttt'}}"
    }
}

{
  "took" : 118,
  "timed_out" : false,
  "total" : 1,
  "updated" : 1,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

3.8 刪除文檔屬性（了解）

POST /movie_index/movie/1/_update
{
    "script" : "ctx._source.remove('name')"
}

{
  "_index" : "movie_index",
  "_type" : "movie",
  "_id" : "1",
  "_version" : 4,
  "_seq_no" : 3,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "doubanScore" : 8.5,
    "actorList" : [
      {
        "name" : "zhang yi",
        "id" : 1
      },
      {
        "name" : "hai qing",
        "id" : 2
      },
      {
        "name" : "tttt",
        "id" : 3
      }
    ],
    "id" : 100
  }
}

3.9 根據(jù)條件刪除文檔（了解）

POST /movie_index/_delete_by_query
{
  "query": {
    "match_all": {}
  }
}

{
  "took" : 25,
  "timed_out" : false,
  "total" : 4,
  "deleted" : 4,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

3.10 批處理

除了對單個文檔執(zhí)行創(chuàng)建、更新和刪除之外，ElasticSearch還提供了使用_bulk API批量執(zhí)行上述操作的能力。

API:    POST /索引名/類型名/_bulk?pretty      _bulk表示批量操作
注意：Kibana要求批量操作的json內(nèi)容寫在同一行

需求1：在索引中批量創(chuàng)建兩個文檔

POST /movie_index/movie/_bulk
{"index":{"_id":66}}
{"id":300,"name":"incident red sea","doubanScore":5.0,"actorList":[{"id":4,"name":"zhang cuishan"}]}
{"index":{"_id":88}}
{"id":300,"name":"incident red sea","doubanScore":5.0,"actorList":[{"id":4,"name":"zhang cuishan"}]}

{
  "took" : 5,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "66",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 2,
          "failed" : 0
        },
        "_seq_no" : 5,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "88",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 2,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}

需求2：在一個批量操作中，先更新第一個文檔(ID為66)，再刪除第二個文檔(ID為88)

POST /movie_index/movie/_bulk
{"update":{"_id":"66"}}
{"doc": { "name": "wudangshanshang" } }
{"delete":{"_id":"88"}}

{
  "took" : 8,
  "errors" : false,
  "items" : [
    {
      "update" : {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "66",
        "_version" : 2,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 2,
          "failed" : 0
        },
        "_seq_no" : 6,
        "_primary_term" : 1,
        "status" : 200
      }
    },
    {
      "delete" : {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "88",
        "_version" : 2,
        "result" : "deleted",
        "_shards" : {
          "total" : 2,
          "successful" : 2,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 200
      }
    }
  ]
}

4. 查詢操作

4.1 搜索參數(shù)傳遞有2種方法

URI發(fā)送搜索參數(shù)查詢所有數(shù)據(jù)

GET /索引名/_search?q=* &pretty         
例如：GET /movie_index/_search?q=_id:66
這種方式不太適合復(fù)雜查詢場景，了解
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html

請求體(request body)發(fā)送搜索參數(shù)查詢所有數(shù)據(jù)

GET /movie_index/_search
{
  "query": {
    "match_all": {}
  }
}

4.2 按條件查詢(全部)

GET movie_index/movie/_search
{
  "query":{
    "match_all": {}
  }
}

4.3 按分詞查詢(必須使用分詞text類型)

測試前：將movie_index索引中的數(shù)據(jù)恢復(fù)到初始的3條

GET movie_index/movie/_search
{
  "query":{
    "match": {"name":"operation red sea"}
  }
}

ES中，name屬性會進行分詞，底層以倒排索引的形式進行存儲，對查詢的內(nèi)容也會進行分詞，然后和文檔的name屬性內(nèi)容進行匹配，所以命中3次，不過命中的分值不同。

注意：ES底層在保存字符串數(shù)據(jù)的時候，會有兩種類型text和keyword

text：分詞

keyword：不分詞

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.8630463,
    "hits" : [
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "1",
        "_score" : 0.8630463,
        "_source" : {
          "id" : 100,
          "name" : "operation red sea",
          "doubanScore" : 8.5,
          "actorList" : [
            {
              "id" : 1,
              "name" : "zhang yi"
            },
            {
              "id" : 2,
              "name" : "hai qing"
            },
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "3",
        "_score" : 0.5753642,
        "_source" : {
          "id" : 300,
          "name" : "incident red sea",
          "doubanScore" : 5.0,
          "actorList" : [
            {
              "id" : 4,
              "name" : "zhang san feng"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "2",
        "_score" : 0.2876821,
        "_source" : {
          "id" : 200,
          "name" : "operation meigong river",
          "doubanScore" : 8.0,
          "actorList" : [
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      }
    ]
  }
}

4.4 按分詞子屬性查詢

GET movie_index/movie/_search
{
  "query":{
    "match": {"actorList.name":"zhang han yu"}
  }
}

返回3條件結(jié)果

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.970927,
    "hits" : [
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "1",
        "_score" : 0.970927,
        "_source" : {
          "id" : 100,
          "name" : "operation red sea",
          "doubanScore" : 8.5,
          "actorList" : [
            {
              "id" : 1,
              "name" : "zhang yi"
            },
            {
              "id" : 2,
              "name" : "hai qing"
            },
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "2",
        "_score" : 0.8630463,
        "_source" : {
          "id" : 200,
          "name" : "operation meigong river",
          "doubanScore" : 8.0,
          "actorList" : [
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "id" : 300,
          "name" : "incident red sea",
          "doubanScore" : 5.0,
          "actorList" : [
            {
              "id" : 4,
              "name" : "zhang san feng"
            }
          ]
        }
      }
    ]
  }
}

4.5 按短語查詢(相當于like %短語%)

按短語查詢，不再利用分詞技術(shù)，直接用短語在原始數(shù)據(jù)中匹配

把演員名包含zhang han yu的查詢出來

GET movie_index/movie/_search
{
  "query":{
    "match_phrase": {"actorList.name":"zhang han yu"}
  }
}

返回2條結(jié)果，把演員名包含zhang han yu的查詢出來

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.8630463,
    "hits" : [
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "2",
        "_score" : 0.8630463,
        "_source" : {
          "id" : 200,
          "name" : "operation meigong river",
          "doubanScore" : 8.0,
          "actorList" : [
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "1",
        "_score" : 0.8630463,
        "_source" : {
          "id" : 100,
          "name" : "operation red sea",
          "doubanScore" : 8.5,
          "actorList" : [
            {
              "id" : 1,
              "name" : "zhang yi"
            },
            {
              "id" : 2,
              "name" : "hai qing"
            },
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      }
    ]
  }
}

4.6 通過term精準搜索匹配（必須使用keyword類型）

GET movie_index/movie/_search
{
  "query":{
    "term":{
    "actorList.name.keyword":"zhang han yu"
    }
  }
}

返回2條結(jié)果，把演員中完全匹配zhang han yu的查詢出來

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "2",
        "_score" : 0.2876821,
        "_source" : {
          "id" : 200,
          "name" : "operation meigong river",
          "doubanScore" : 8.0,
          "actorList" : [
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "id" : 100,
          "name" : "operation red sea",
          "doubanScore" : 8.5,
          "actorList" : [
            {
              "id" : 1,
              "name" : "zhang yi"
            },
            {
              "id" : 2,
              "name" : "hai qing"
            },
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      }
    ]
  }
}

4.7 fuzzy查詢（容錯匹配）

校正匹配分詞，當一個單詞都無法準確匹配，ES通過一種算法對非常接近的單詞也給與一定的評分，能夠查詢出來，但是消耗更多的性能，對中文來講，實現(xiàn)不是特別好。

GET movie_index/movie/_search
{
    "query":{
      "fuzzy": {"name":"rad"}
    }
}

返回2個結(jié)果，會把incident red sea和operation red sea匹配上

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.19178805,
    "hits" : [
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "1",
        "_score" : 0.19178805,
        "_source" : {
          "id" : 100,
          "name" : "operation red sea",
          "doubanScore" : 8.5,
          "actorList" : [
            {
              "id" : 1,
              "name" : "zhang yi"
            },
            {
              "id" : 2,
              "name" : "hai qing"
            },
            {
              "id" : 3,
              "name" : "zhang han yu"
            }
          ]
        }
      },
      {
        "_index" : "movie_index",
        "_type" : "movie",
        "_id" : "3",
        "_score" : 0.19178805,
        "_source" : {
          "id" : 300,
          "name" : "incident red sea",
          "doubanScore" : 5.0,
          "actorList" : [
            {
              "id" : 4,
              "name" : "zhang san feng"
            }
          ]
        }
      }
    ]
  }
}

4.8 過濾—先匹配，再過濾

GET movie_index/movie/_search
{
    "query":{
      "match": {"name":"red"}
    },
    "post_filter":{
      "term": {
        "actorList.id": 3
      }
    }
}

4.9 過濾—匹配和過濾同時（推薦使用）

GET movie_index/movie/_search
{
  "query": {
    "bool": {
       "must": [
        {"match": {
          "name": "red"
        }}
      ],
      "filter": [
        {"term": { "actorList.id": "1"}},
        {"term": {"actorList.id": "3"}}
      ]
    }
  }
}

4.10 過濾--按范圍過濾

GET movie_index/movie/_search
{
  "query": {
    "range": {
      "doubanScore": {
        "gte": 6,
        "lte": 8.5
      }
    }
  }
}

關(guān)于范圍操作符：

gt	大于
lt	小于
gte	大于等于 great than or equals
lte	小于等于 less than or equals

4.11 排序

GET movie_index/movie/_search
{
  "query":{
    "match": {"name":"red sea"}
  },
 "sort":
    {
      "doubanScore": {
        "order": "desc"
      }
    }
}

4.12 分頁查詢

from參數(shù)(基于0)指定從哪個文檔序號開始

size參數(shù)指定返回多少個文檔

這兩個參數(shù)對于搜索結(jié)果分頁非常有用。

注意，如果沒有指定from，則默認值為0。

GET movie_index/movie/_search
{
  "query": { "match_all": {} },
  "from": 1,
  "size": 1
}

4.13 指定查詢的字段

GET movie_index/movie/_search
{
  "query": { "match_all": {} },
  "_source": ["name", "doubanScore"]
}

只顯示name和doubanScore字段

4.14 高亮

GET movie_index/movie/_search
{
    "query":{
      "match": {"name":"red sea"}
    },
    "highlight": {
      "fields": {"name":{} }
    }
}

對命中的詞進行高亮顯示

4.15 聚合

聚合提供了對數(shù)據(jù)進行分組、統(tǒng)計的能力，類似于SQL中Group By和SQL聚合函數(shù)。在ElasticSearch中，可以同時返回搜索結(jié)果及其聚合計算結(jié)果，這是非常強大和高效的。

需求1：取出每個演員共參演了多少部電影

GET movie_index/movie/_search
{
  "aggs": {
    "myAGG": {
      "terms": {
        "field": "actorList.name.keyword"
      }
    }
  }
}

aggs : 表示聚合

myAGG：給聚合取的名字，

trems：表示分組，相當于groupBy

field：指定分組字段

需求2：每個演員參演電影的平均分是多少，并按評分排序

GET movie_index/movie/_search
{ 
  "aggs": {
    "groupby_actor_id": {
      "terms": {
        "field": "actorList.name.keyword" ,
        "order": {
          "avg_score": "desc"
          }
      },
      "aggs": {
        "avg_score":{
          "avg": {
            "field": "doubanScore" 
          }
        }
       }
    } 
  }
}

.keyword 是某個字符串字段，專門儲存不分詞格式的副本，在某些場景中只允許只用不分詞的格式，

比如過濾filter比如聚合aggs, 所以字段要加上.keyword的后綴。

5. 分詞

5.1 查看英文單詞默認分詞情況

GET _analyze
{
  "text":"hello world"
}

按照空格對單詞進行切分

{
  "tokens" : [
    {
      "token" : "hello",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "world",
      "start_offset" : 6,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

5.2 查看中文默認分詞情況

GET _analyze
{
  "text":"小米手機"
}

默認手機是按照每個漢字進行切分

{
  "tokens" : [
    {
      "token" : "小",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "米",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "手",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "機",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    }
  ]
}

5.3 中文分詞器

通過上面的查詢，我們可以看到ES本身自帶的中文分詞，就是單純把中文一個字一個字的分開，根本沒有詞匯的概念。

但是實際應(yīng)用中，用戶都是以詞匯為條件，進行查詢匹配的，如果能夠把文章以詞匯為單位切分開，那么與用戶的查詢條件能夠更貼切的匹配上，查詢速度也更加快速。

常見的一些開源分詞器對比，我們使用IK分詞器

分詞器	優(yōu)勢	劣勢
Smart Chinese Analysis	官方插件	中文分詞效果慘不忍睹
IKAnalyzer	簡單易用，支持自定義詞典和遠程詞典	詞庫需要自行維護，不支持詞性識別
結(jié)巴分詞	新詞識別功能	不支持詞性識別
Ansj中文分詞	分詞精準度不錯，支持詞性識別	對標hanlp詞庫略少，學(xué)習(xí)成本高
Hanlp	目前詞庫最完善，支持的特性非常多	需要更優(yōu)的分詞效果，學(xué)習(xí)成本高

5.4 IK分詞器的安裝及使用

下載地址

https://github.com/medcl/elasticsearch-analysis-ik

將相關(guān)上傳到/opt/software
解壓zip文件
```
unzip elasticsearch-analysis-ik-6.6.0.zip -d /opt/module/elasticsearch/plugins/ik
```
注意

使用unzip進行解壓

-d指定解壓后的目錄

必須放到ES的plugins目錄下，并在plugins目錄下創(chuàng)建單獨的目錄
查看/opt/module/elasticsearch/plugins/ik/conf下的文件，分詞就是將所有詞匯分好放到文件中

分發(fā)

[root@node03 elasticsearch]# scp -r  /opt/module/elasticsearch/plugins/ik root@node04:/opt/module/elasticsearch/plugins/ik
[root@node03 elasticsearch]# scp -r  /opt/module/elasticsearch/plugins/ik root@node05:/opt/module/elasticsearch/plugins/ik

重啟ES
```
es-cluster.sh stop
es-cluster.sh start
```

測試使用

ik_smart

GET movie_index/_analyze
{  
  "analyzer": "ik_smart", 
  "text": "我是中國人"
}

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中國人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

ik_max_word

GET movie_index/_analyze
{  
  "analyzer": "ik_max_word", 
  "text": "我是中國人"
}

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中國人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "中國",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "國人",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

5.5 自定義詞庫-本地指定

有的時候，詞庫提供的詞并不包含項目中使用到的一些專業(yè)術(shù)語或者新興網(wǎng)絡(luò)用語，需要我們對詞庫進行補充。

具體步驟

通過配置本地目錄直接指定自定義詞庫

修改/opt/module/elasticsearch/plugins/ik/config/中的IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 擴展配置</comment>
        <!--用戶可以在這里配置自己的擴展字典 -->
        <entry key="ext_dict">./myword.txt</entry>
         <!--用戶可以在這里配置自己的擴展停止詞字典-->
        <entry key="ext_stopwords"></entry>
        <!--用戶可以在這里配置遠程擴展字典 -->
        <!-- <entry key="remote_ext_dict">words_location</entry> -->
        <!--用戶可以在這里配置遠程擴展停止詞字典-->
        <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

在/opt/module/elasticsearch/plugins/ik/config/當前目錄下創(chuàng)建myword.txt
```
[root@node03 config]# vim myword.txt

藍瘦
藍瘦香菇
```

分發(fā)配置文件以及myword.txt

[root@node03 elasticsearch]# scp -r  /opt/module/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml root@node04:/opt/module/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml 
[root@node03 elasticsearch]# scp -r  /opt/module/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml root@node05:/opt/module/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml 

[root@node03 elasticsearch]# scp -r  /opt/module/elasticsearch/plugins/ik/config/myword.txt root@node04:/opt/module/elasticsearch/plugins/ik/config/myword.txt
[root@node03 elasticsearch]# scp -r  /opt/module/elasticsearch/plugins/ik/config/myword.txt root@node05:/opt/module/elasticsearch/plugins/ik/config/myword.txt

重啟ES服務(wù)

es-cluster.sh stop
es-cluster.sh start

測試分詞效果

GET movie_index/_analyze
{  
  "analyzer": "ik_smart", 
  "text": "藍瘦香菇"
}

5.6 自定義詞庫-遠程指定

遠程配置一般是如下流程，我們這里簡易通過nginx模擬

自定義詞庫遠程指定.png

修改/opt/module/elasticsearch/plugins/ik/config/中的IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 擴展配置</comment>
        <!--用戶可以在這里配置自己的擴展字典 -->
        <!--<entry key="ext_dict"> </entry>-->
         <!--用戶可以在這里配置自己的擴展停止詞字典-->
        <!--<entry key="ext_stopwords"></entry>-->
        <!--用戶可以在這里配置遠程擴展字典 -->
        <entry key="remote_ext_dict">http://node03/fenci/myword.txt</entry>
        <!--用戶可以在這里配置遠程擴展停止詞字典-->
        <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

注意：將本地配置注釋掉

分發(fā)配置文件
在nginx.conf文件中配置靜態(tài)資源路徑

pwd
/opt/module/nginx/conf
[atguigu@node03 conf]$ vim nginx.conf
location /fenci{
     root es;
}

在/opt/module/nginx/目錄下創(chuàng)建es/fenci目錄，并在es/fenci目錄下創(chuàng)建myword.txt

pwd
/opt/module/nginx/es/fenci

vim myword.txt
藍瘦
藍瘦香菇

啟動nginx

/opt/module/nginx/sbin/nginx

重啟ES服務(wù)測試nginx是否能夠訪問

es-cluster.sh stop
es-cluster.sh start

測試分詞效果

更新完成后，ES只會對新增的數(shù)據(jù)用新詞分詞。歷史數(shù)據(jù)是不會重新分詞的。如果想要歷史數(shù)據(jù)重新分詞。需要執(zhí)行：

POST movies_index_chn/_update_by_query?conflicts=proceed

6 關(guān)于mapping

Type可以理解為關(guān)系型數(shù)據(jù)庫的Table，那每個字段的數(shù)據(jù)類型是如何定義的?

實際上每個Type中的字段是什么數(shù)據(jù)類型，由mapping定義，如果我們在創(chuàng)建Index的時候，沒有設(shè)定mapping，系統(tǒng)會自動根據(jù)一條數(shù)據(jù)的格式來推斷出該數(shù)據(jù)對應(yīng)的字段類型，具體推斷類型如下：

true/false → boolean
1020 → long
20.1 → float
“2018-02-01” → date
“hello world” → text +keyword

默認只有text會進行分詞，keyword是不會分詞的字符串。mapping除了自動定義，還可以手動定義，但是只能對新加的、沒有數(shù)據(jù)的字段進行定義，一旦有了數(shù)據(jù)就無法再做修改了。

6.1 基于中文分詞搭建索引-自動定義mapping

直接創(chuàng)建Document

這個時候index不存在，建立文檔的時候自動創(chuàng)建index，同時mapping會自動定義

PUT /movie_chn_1/movie/1
{ "id":1,
  "name":"紅海行動",
  "doubanScore":8.5,
  "actorList":[  
  {"id":1,"name":"張譯"},
  {"id":2,"name":"海清"},
  {"id":3,"name":"張涵予"}
 ]
}
PUT /movie_chn_1/movie/2
{
  "id":2,
  "name":"湄公河行動",
  "doubanScore":8.0,
  "actorList":[  
{"id":3,"name":"張涵予"}
]
}

PUT /movie_chn_1/movie/3
{
  "id":3,
  "name":"紅海事件",
  "doubanScore":5.0,
  "actorList":[  
{"id":4,"name":"張三豐"}
]
}

查看測試

GET /movie_chn_1/movie/_search
{
  "query": {
    "match": {
      "name": "海行"
    }
  }
}

{
  "took" : 23,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.5753642,
    "hits" : [
      {
        "_index" : "movie_chn_1",
        "_type" : "movie",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {
          "id" : 1,
          "name" : "紅海行動",
          "doubanScore" : 8.5,
          "actorList" : [
            {
              "id" : 1,
              "name" : "張譯"
            },
            {
              "id" : 2,
              "name" : "海清"
            },
            {
              "id" : 3,
              "name" : "張涵予"
            }
          ]
        }
      },
      {
        "_index" : "movie_chn_1",
        "_type" : "movie",
        "_id" : "2",
        "_score" : 0.2876821,
        "_source" : {
          "id" : 2,
          "name" : "湄公河行動",
          "doubanScore" : 8.0,
          "actorList" : [
            {
              "id" : 3,
              "name" : "張涵予"
            }
          ]
        }
      },
      {
        "_index" : "movie_chn_1",
        "_type" : "movie",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "id" : 3,
          "name" : "紅海事件",
          "doubanScore" : 5.0,
          "actorList" : [
            {
              "id" : 4,
              "name" : "張三豐"
            }
          ]
        }
      }
    ]
  }
}

分析結(jié)論

上面查詢“海行”命中了三條記錄，是因為我們在定義的Index的時候，沒有指定分詞器，使用的是默認的分詞器，對中文是按照每個漢字進行分詞的。

6.2 基于中文分詞搭建索引-手動定義mapping

定義Index，指定mapping

PUT movie_chn_2
{
  "mappings": {
    "movie":{
      "properties": {
        "id":{
          "type": "long"
        },
        "name":{
          "type": "text", 
          "analyzer": "ik_smart"
        },
        "doubanScore":{
          "type": "double"
        },
        "actorList":{
          "properties": {
            "id":{
              "type":"long"
            },
            "name":{
              "type":"keyword"
            }
          }
        }
      }
    }
  }
}

向Index中放入Document

PUT /movie_chn_2/movie/1
{ "id":1,
  "name":"紅海行動",
  "doubanScore":8.5,
  "actorList":[  
  {"id":1,"name":"張譯"},
  {"id":2,"name":"海清"},
  {"id":3,"name":"張涵予"}
 ]
}

PUT /movie_chn_2/movie/2
{
  "id":2,
  "name":"湄公河行動",
  "doubanScore":8.0,
  "actorList":[  
{"id":3,"name":"張涵予"}
]
}

PUT /movie_chn_2/movie/3
{
  "id":3,
  "name":"紅海事件",
  "doubanScore":5.0,
  "actorList":[  
{"id":4,"name":"張三豐"}
]
}}

查看手動定義的mapping

GET movie_chn_2/_mapping

{
  "movie_chn_2" : {
    "mappings" : {
      "movie" : {
        "properties" : {
          "actorList" : {
            "properties" : {
              "id" : {
                "type" : "long"
              },
              "name" : {
                "type" : "keyword"
              }
            }
          },
          "doubanScore" : {
            "type" : "double"
          },
          "id" : {
            "type" : "long"
          },
          "name" : {
            "type" : "text",
            "analyzer" : "ik_smart"
          }
        }
      }
    }
  }
}

分析結(jié)論

上面查詢沒有命中任何記錄，是因為我們在創(chuàng)建Index的時候，指定使用ik分詞器進行分詞

6.3 索引數(shù)據(jù)拷貝

ElasticSearch雖然強大，但是卻不能動態(tài)修改mapping到時候我們有時候需要修改結(jié)構(gòu)的時候不得不重新創(chuàng)建索引；

ElasticSearch為我們提供了一個reindex的命令，就是會將一個索引的快照數(shù)據(jù)copy到另一個索引，默認情況下存在相同的_id會進行覆蓋（一般不會發(fā)生，除非是將兩個索引的數(shù)據(jù)copy到一個索引中），可以使用POST _reindex命令將索引快照進行copy

POST _reindex
    {
      "source": {
        "index": "my_index_name"
      },
      "dest": {
        "index": "my_index_name_new"
      }
    }

7. 索引別名 _aliases

索引別名就像一個快捷方式或軟連接，可以指向一個或多個索引，也可以給任何一個需要索引名的API來使用。

7.1 創(chuàng)建索引別名

創(chuàng)建Index的時候聲明

PUT 索引名
{  
 "aliases": {
      "索引別名": {}
  }
}
#創(chuàng)建索引的時候，手動mapping，并指定別名

PUT movie_chn_3
{
  "aliases": {
      "movie_chn_3_aliase": {}
  },
  "mappings": {
    "movie":{
      "properties": {
        "id":{
          "type": "long"
        },
        "name":{
          "type": "text", 
          "analyzer": "ik_smart"
        },
        "doubanScore":{
          "type": "double"
        },
        "actorList":{
          "properties": {
            "id":{
              "type":"long"
            },
            "name":{
              "type":"keyword"
            }
          }
        }
      }
    }
  }
}

為已存在的索引增加別名

POST  _aliases
{
    "actions": [
        { "add":{ "index": "索引名", "alias": "索引別名" }}
    ]
}    
#給movie_chn_3添加別名

POST  _aliases
{
    "actions": [
        { "add":{ "index": "movie_chn_3", "alias": "movie_chn_3_a2" }}
    ]
}

7.2 查詢別名列表

GET  _cat/aliases?v

alias              index       filter routing.index routing.search
movie_chn_3_a2     movie_chn_3 -      -             -
movie_chn_3_aliase movie_chn_3 -      -             -
.kibana            .kibana_1   -      -             -

7.3 使用索引別名查詢

與使用普通索引沒有區(qū)別

GET 索引別名/_search

7.4 刪除某個索引的別名

POST  _aliases
{
    "actions": [
        { "remove":    { "index": "索引名", "alias": "索引別名" }}
    ]
}

POST  _aliases
{
    "actions": [
        { "remove":    { "index": "movie_chn_3", "alias": "movie_chn_3_aliase" }}
    ]
}

7.5 使用場景

給多個索引分組 (例如， last_three_months)

POST  _aliases
{
    "actions": [
        { "add":    { "index": "movie_chn_1", "alias": "movie_chn_query" }},
        { "add":    { "index": "movie_chn_2", "alias": "movie_chn_query" }}
    ]
}
GET movie_chn_query/_search

給索引的一個子集創(chuàng)建視圖

相當于給Index加了一些過濾條件，縮小查詢范圍

POST  _aliases
{
    "actions": [
        { 
          "add":    
          { 
            "index": "movie_chn_1", 
            "alias": "movie_chn_1_sub_query",
            "filter": {
                "term": {  "actorList.id": "4"}
            }
          }
        }
    ]
}
GET movie_chn_1_sub_query/_search

在運行的集群中可以無縫的從一個索引切換到另一個索引

POST /_aliases
{
    "actions": [
        { "remove": { "index": "movie_chn_1", "alias": "movie_chn_query" }},
        { "remove": { "index": "movie_chn_2", "alias": "movie_chn_query" }},
        { "add":    { "index": "movie_chn_3", "alias": "movie_chn_query" }}
    ]
}
整個操作都是原子的，不用擔(dān)心數(shù)據(jù)丟失或者重復(fù)的問題

8 索引模板

8.1 創(chuàng)建索引模板

PUT _template/template_movie2020
{
  "index_patterns": ["movie_test*"],                  
  "settings": {                                               
    "number_of_shards": 1
  },
  "aliases" : { 
    "{index}-query": {},
    "movie_test-query":{}
  },
  "mappings": {                                          
    "_doc": {
      "properties": {
        "id": {
          "type": "keyword"
        },
        "movie_name": {
          "type": "text",
          "analyzer": "ik_smart"
        }
      }
    }
  }
}

其中 "index_patterns": ["movie_test*"]的含義就是凡是往movie_test開頭的索引寫入數(shù)據(jù)時，如果索引不存在，那么ES會根據(jù)此模板自動建立索引。

在 "aliases" 中用{index}表示，獲得真正的創(chuàng)建的索引名。aliases中會創(chuàng)建兩個別名，一個是根據(jù)當前索引創(chuàng)建的，另一個是全局固定的別名。

8.2 測試

向索引中添加數(shù)據(jù)

POST movie_test_202011/_doc
{
  "id":"333",
  "name":"zhang3"
}

查詢Index的mapping，就是使用我們的索引模板創(chuàng)建的

GET movie_test_202011-query/_mapping

根據(jù)模板中取的別名查詢數(shù)據(jù)

GET movie_test-query/_search

8.3 查看系統(tǒng)中已有的模板清單

GET  _cat/templates

8.4 查看某個模板詳情

GET  _template/template_movie2020
或者
GET  _template/template_movie*

8.5 使用場景

分割索引

分割索引就是根據(jù)時間間隔把一個業(yè)務(wù)索引切分成多個索引。

比如 把order_info  變成 order_info_20200101,order_info_20200102 …..

這樣做的好處有兩個：

結(jié)構(gòu)變化的靈活性

因為ES不允許對數(shù)據(jù)結(jié)構(gòu)進行修改。但是實際使用中索引的結(jié)構(gòu)和配置難免變化，那么只要對下一個間隔的索引進行修改，原來的索引維持原狀。這樣就有了一定的靈活性。

要想實現(xiàn)這個效果，我們只需要在需要變化的索引那天將模板重新建立即可。
查詢范圍優(yōu)化

因為一般情況并不會查詢?nèi)繒r間周期的數(shù)據(jù)，那么通過切分索引，物理上減少了掃描數(shù)據(jù)的范圍，也是對性能的優(yōu)化。

8.6 注意

使用索引模板，一般在向索引中插入第一條數(shù)據(jù)創(chuàng)建索引，如果ES中的Shard特別多，有可能創(chuàng)建索引會變慢，如果延遲不能接受，可以不使用模板，使用定時腳本在頭一天提前建立第二天的索引。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

ElasticSearch 系列 - RestFulAPI(DSL)

前言