基礎(chǔ)知識(shí)

ES查詢的URL是以/索引/類型/文檔的形式組織的。查詢URL舉例:

GET /product/book/1?name=xxx

index就像sql中的庫(kù)，type就像sql中的表，document就像sql中的記錄。

ykuang: 個(gè)人認(rèn)為把index比作數(shù)據(jù)庫(kù)中的字典表create table dict(id,type,item,name,...,其他字段)會(huì)更合適, type就是字典表里區(qū)分字典的一個(gè)字段, document就是一條記錄, 一個(gè)字典項(xiàng), document里的json屬性就是字典表里的其他字段。

索引index

索引index是存儲(chǔ)document文檔數(shù)據(jù)的結(jié)構(gòu),意義類似于關(guān)系型數(shù)據(jù)庫(kù)中的數(shù)據(jù)庫(kù)。

類型type

類型type也是用于存儲(chǔ)document的邏輯結(jié)構(gòu)，相對(duì)于index來(lái)說(shuō)，type是index的下級(jí)，所以通常在面向有實(shí)際意義的數(shù)據(jù)時(shí)，index作為大類的劃分，type作為小類的劃分。比如如果把book作為一個(gè)大類來(lái)建立index的話，那么書(shū)的類型(小說(shuō)類、文學(xué)類、IT技術(shù)類等)就可以作為type。

type只是意義上的邏輯結(jié)構(gòu), 并不真的用來(lái)劃分?jǐn)?shù)據(jù)。
可以從SQL方面想，就好像一個(gè)職員表，一條記錄中的某一個(gè)字段說(shuō)明了他屬于哪個(gè)部門(mén)】。

文檔document

文檔的格式是json式的。
對(duì)于文檔，有幾個(gè)主要的標(biāo)識(shí)信息：

_index(插入到哪個(gè)索引中),
_type(插入到哪個(gè)類型中),
_id(文檔的id是多少)，
_version：版本，對(duì)這個(gè)ID的文檔的操作次數(shù)

前3個(gè)是創(chuàng)建一個(gè)文檔的時(shí)候必須的, 當(dāng)沒(méi)有提供_type時(shí),默認(rèn)設(shè)置為_doc

ElaticSearch并不是完全無(wú)結(jié)構(gòu)的，不要與某些NoSQL數(shù)據(jù)庫(kù)混為一談，雖然它的結(jié)構(gòu)非常靈活（面向json，可以隨意增加字段）。在index中還有一個(gè)mapping，mapping管理了整個(gè)index的各個(gè)字段的屬性，也就是定義了整個(gè)index中document的結(jié)構(gòu)。

好顯然用URL來(lái)查詢不是很靈活, 所以ES還提供DSL來(lái)查詢。下面介紹DSL的語(yǔ)法。

精確查詢

等值查詢

可以理解為SQL中的=符號(hào)


term主要用于精確匹配，比如數(shù)字，日期，布爾值或 未經(jīng)分析的字符串

GET  test1/_doc/_search
{
  "query": {
    "term": {
      "phone": "12345678909"
    }
  }
}


如果想在一個(gè)字段匹配多個(gè)值的話，可以使用terms，相當(dāng)于SQL的in語(yǔ)法。

GET  test1/_doc/_search
{
  "query": {
    "terms": {
       "uid": [ 1234, 12345, 123456 ] 
    }
  }
}

term 用法（與 match 進(jìn)行對(duì)比）
term 一般用在不分詞字段上的，因?yàn)樗峭耆ヅ洳樵?，如果要查詢的字段是分詞字段就會(huì)被拆分成各種分詞結(jié)果，和完全查詢的內(nèi)容就對(duì)應(yīng)不上了。

范圍查詢

range可以理解為SQL中的><符號(hào)，其中gt是大于，lt是小于，gte是大于等于，lte是小于等于。

GET  test1/_doc/_search
{
  "query": {
   "range": { 
      "uid": { 
        "gt": 1234,
        "lte": 12345
      } 
    } 
  }
}

存在(exists)查詢

exists可以理解為SQL中的exists函數(shù)，就是判斷是否存在該字段(注意是字段不是字段值)。

GET  test1/_doc/_search
{
  "query": {
   "exists": { 
       "field":"msgcode" 
    } 
  }
}

模糊查詢

前綴查詢

prefix 前綴搜索（性能較差，掃描所有倒排索引）
比如有一個(gè)不分詞字段 product_name，分別有兩個(gè) doc 是：iphone-6，iphone-7。我們搜索 iphone 這個(gè)前綴關(guān)鍵字就可以搜索到結(jié)果

GET /product_index/product/_search
{
  "query": {
    "prefix": {
      "product_name": {
        "value": "iphone"
      }
    }
  }
}

模糊(wildcard)查詢

wildcard查詢相當(dāng)于SQL語(yǔ)句中的like語(yǔ)法，只不過(guò)它查詢的數(shù)據(jù)需要加上*符號(hào)。

GET /test1/_search
{
  "query": {
   "wildcard": { 
       "message":"*wu*" 
    } 
  }
}

正則(regexp)查詢

regexp可以支持正則查詢，比如查詢短信內(nèi)容中的驗(yàn)證碼之類的。

下面的這個(gè)示例就是查詢以xu開(kāi)頭，后面是0-9數(shù)字的內(nèi)容的數(shù)據(jù)。

GET /test1/_search
{
  "query": {
   "regexp": { 
       "message":"xu[0-9]" 
    } 
  }
}

fuzzy 糾錯(cuò)查詢

參數(shù) fuzziness 默認(rèn)是 2，表示最多可以糾錯(cuò)兩次，但是這個(gè)值不能很大，不然沒(méi)效果。一般 AUTO 是自動(dòng)糾錯(cuò)。
下面的關(guān)鍵字漏了一個(gè)字母 o。

GET /product_index/product/_search
{
  "query": {
    "match": {
      "product_name": {
        "query": "PHILIPS tothbrush",
        "fuzziness": "AUTO",
        "operator": "and"
      }
    }
  }
}

全文檢索

match_all可以查詢集群所有索引庫(kù)的信息，包括一些隱藏索性庫(kù)的信息。

GET _search
{   
  "query": {
    "match_all": {}
  }
}

full-text search 全文檢索，倒排索引

索引中只要有任意一個(gè)匹配拆分后詞就可以出現(xiàn)在結(jié)果中，只是匹配度越高的排越前面
比如查詢：PHILIPS toothbrush，會(huì)被拆分成兩個(gè)單詞：PHILIPS 和 toothbrush。只要索引中 product_name 中含有任意對(duì)應(yīng)單詞，都會(huì)在搜索結(jié)果中，只是如果有數(shù)據(jù)同時(shí)含有這兩個(gè)單詞，則排序在前面。

GET /product_index/product/_search
{
  "query": {
    "match": {
      "product_name": "PHILIPS toothbrush"
    }
  }
}

phrase search 短語(yǔ)搜索

索引中必須同時(shí)匹配拆分后詞就可以出現(xiàn)在結(jié)果中
比如查詢：PHILIPS toothbrush，會(huì)被拆分成兩個(gè)單詞：PHILIPS 和 toothbrush。索引中必須有同時(shí)有這兩個(gè)單詞的才會(huì)在結(jié)果中。

GET /product_index/product/_search
{
  "query": {
    "match_phrase": {
      "product_name": "PHILIPS toothbrush"
    }
  }
}

match 用法（與 term 進(jìn)行對(duì)比）：
查詢的字段內(nèi)容是進(jìn)行分詞處理的，只要分詞的單詞結(jié)果中，在數(shù)據(jù)中有滿足任意的分詞結(jié)果都會(huì)被查詢出來(lái)

match必須滿足分詞結(jié)果中所有的詞，任意一個(gè)就可以的。（這個(gè)常見(jiàn)，所以很重要）

GET /product_index/product/_search
{
  "query": {
    "match": {
      "product_name": {
        "query": "PHILIPS toothbrush",
        "operator": "and"
      }
     }
   }
}

match 還還有一種情況，就是必須滿足分詞結(jié)果中百分比的詞，比如搜索詞被分成這樣子：java 程序員書(shū) 推薦，這里就有 4 個(gè)詞，假如要求 50% 命中其中兩個(gè)詞就返回，我們可以這樣：
當(dāng)然，這種需求也可以用 must、must_not、should 匹配同一個(gè)字段進(jìn)行組合來(lái)查詢

GET /product_index/product/_search
{
  "query": {
    "match": {
      "product_name": {
        "query": "java 程序員 書(shū) 推薦",
        "minimum_should_match": "50%"
      }
    }
  }
}

multi_match 用法：
查詢 product_name 和 product_desc 字段中，只要有：toothbrush 關(guān)鍵字的就查詢出來(lái)。

GET /product_index/product/_search
{
  "query": {
    "multi_match": {
      "query": "toothbrush",
      "fields": [
        "product_name",
        "product_desc"
      ]
    }
  }
}

multi_match 跨多個(gè) field 查詢，表示查詢分詞必須出現(xiàn)在相同字段中。

GET /product_index/product/_search
{
  "query": {
    "multi_match": {
      "query": "PHILIPS toothbrush",
      "type": "cross_fields",
      "operator": "and",
      "fields": [
        "product_name",
        "product_desc"
      ]
    }
  }
}

match_phrase 用法（短語(yǔ)搜索）（與 match 進(jìn)行對(duì)比）：
對(duì)這個(gè)查詢?cè)~不進(jìn)行分詞，必須完全匹配查詢?cè)~才可以作為結(jié)果顯示。

GET /product_index/product/_search
{
  "query": {
    "match_phrase": {
      "product_name": "PHILIPS toothbrush"
    }
  }
}

match_phrase + slop（與 match_phrase 進(jìn)行對(duì)比）：
在說(shuō) slop 的用法之前，需要先說(shuō)明原數(shù)據(jù)是：PHILIPS toothbrush HX6730/02，被分詞后至少有：PHILIPS，toothbrush，HX6730 三個(gè) term。
match_phrase 的用法我們上面說(shuō)了，按理說(shuō)查詢的詞必須完全匹配才能查詢到，PHILIPS HX6730 很明顯是不完全匹配的。
但是有時(shí)候我們就是要這種不完全匹配，只要求他們盡可能靠譜，中間有幾個(gè)單詞是沒(méi)啥問(wèn)題的，那就可以用到 slop。slop = 2 表示中間如果間隔 2 個(gè)單詞以內(nèi)也算是匹配的結(jié)果（）。
其實(shí)也不能稱作間隔，應(yīng)該說(shuō)是移位，查詢的關(guān)鍵字分詞后移動(dòng)多少位可以跟 doc 內(nèi)容匹配，移動(dòng)的次數(shù)就是 slop。所以 HX6730 PHILIPS 其實(shí)也是可以匹配到 doc 的，只是 slop = 5 才行。

GET /product_index/product/_search
{
  "query": {
    "match_phrase": {
      "product_name" : {
          "query" : "PHILIPS HX6730",
          "slop" : 1
      }
    }
  }
}

match + match_phrase + slop 組合查詢，使查詢結(jié)果更加精準(zhǔn)和結(jié)果更多
但是 match_phrase 性能沒(méi)有 match 好，所以一般需要先用 match 第一步進(jìn)行過(guò)濾，然后在用 match_phrase 進(jìn)行進(jìn)一步匹配，并且重新打分，這里又用到了：rescore，window_size 表示對(duì)前 10 個(gè)進(jìn)行重新打分
下面第一個(gè)是未重新打分的，第二個(gè)是重新打分的

GET /product_index/product/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "product_name": {
            "query": "PHILIPS HX6730"
          }
        }
      },
      "should": {
        "match_phrase": {
          "product_name": {
            "query": "PHILIPS HX6730",
            "slop": 10
          }
        }
      }
    }
  }
}

GET /product_index/product/_search
{
  "query": {
    "match": {
      "product_name": "PHILIPS HX6730"
    }
  },
  "rescore": {
    "window_size": 10,
    "query": {
      "rescore_query": {
        "match_phrase": {
          "product_name": {
            "query": "PHILIPS HX6730",
            "slop": 10
          }
        }
      }
    }
  }
}

match_phrase_prefix 用法（不常用），一般用于類似 Google 搜索框，關(guān)鍵字輸入推薦
max_expansions 用來(lái)限定最多匹配多少個(gè) term，優(yōu)化性能
但是總體來(lái)說(shuō)性能還是很差，因?yàn)檫€是會(huì)掃描整個(gè)倒排索引。推薦用 edge_ngram 做該功能

GET /product_index/product/_search
{
  "query": {
    "match_phrase_prefix": {
      "product_name": "PHILIPS HX",
      "slop": 5,
      "max_expansions": 20
    }
  }
}

組合查詢

bool 可以用來(lái)合并多個(gè)過(guò)濾條件查詢結(jié)果的布爾邏輯，它包含這如下幾個(gè)操作符:

must : 多個(gè)查詢條件的完全匹配,相當(dāng)于 and。
must_not ::多個(gè)查詢條件的相反匹配，相當(dāng)于 not。
should : 至少有一個(gè)查詢條件匹配, 相當(dāng)于 or。

GET /test1/_search
{
  "query": {
    "bool": {
      "must": {
        "term": {
          "phone": "12345678909"
        }
      },
      "must_not": {
        "term": {
          "uid": 12345
        }
      },
      "should": [
        {
          "term": {
            "uid": 1234
          }
        },
        {
          "term": {
            "uid": 123456
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  }
}

過(guò)濾

query 和 filter 一起使用的話，filter 會(huì)先執(zhí)行

從搜索結(jié)果上看：
filter，只查詢出搜索條件的數(shù)據(jù)，不計(jì)算相關(guān)度分?jǐn)?shù)
query，查詢出搜索條件的數(shù)據(jù)，并計(jì)算相關(guān)度分?jǐn)?shù)，按照分?jǐn)?shù)進(jìn)行倒序排序

從性能上看：
filter（性能更好，無(wú)排序），無(wú)需計(jì)算相關(guān)度分?jǐn)?shù)，也就無(wú)需排序，內(nèi)置的自動(dòng)緩存最常使用查詢結(jié)果的數(shù)據(jù)
緩存的東西不是文檔內(nèi)容，而是 bitset，具體看：https://www.elastic.co/guide/en/elasticsearch/guide/2.x/_finding_exact_values.html#_internal_filter_operation
query（性能較差，有排序），要計(jì)算相關(guān)度分?jǐn)?shù)，按照分?jǐn)?shù)進(jìn)行倒序排序，沒(méi)有緩存結(jié)果的功能

filter 和 query 一起使用可以兼顧兩者的特性，所以看你業(yè)務(wù)需求。

GET /store/products/_search
{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "price": 200
        }
      }
    }
  }

排序

一般應(yīng)該用不到, 因?yàn)橛肊S一般都是用它的全文檢索功能, 一般都是按相似度倒序排。

GET /product_index/product/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "product_name": "PHILIPS toothbrush"
          }
        }
      ]
    }
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ]
}

權(quán)重

boost 用法（默認(rèn)是 1）。在搜索精準(zhǔn)度的控制上，還有一種需求，比如搜索：PHILIPS toothbrush，要比：Braun toothbrush 更加優(yōu)先，我們可以這樣：

GET /product_index/product/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "product_name": "toothbrush"
          }
        }
      ],
      "should": [
        {
          "match": {
            "product_name": {
              "query": "PHILIPS",
              "boost": 4
            }
          }
        },
        {
          "match": {
            "product_name": {
              "query": "Braun",
              "boost": 3
            }
          }
        }
      ]
    }
  }
}

高亮 Highlight

給匹配拆分后的查詢?cè)~增加高亮的 html 標(biāo)簽，比如這樣的結(jié)果：<em>PHILIPS</em> <em>toothbrush</em> HX6730/02

GET /product_index/product/_search
{
  "query": {
    "match": {
      "product_name": "PHILIPS toothbrush"
    }
  },
  "highlight": {
    "fields": {
      "product_name": {}
    }
  }
}

參考資料

官方文檔:

https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/_queries_and_filters.html

博客文章:

https://www.cnblogs.com/xuwujing/p/11567053.html
https://www.cnblogs.com/sddai/p/11061412.html

隨風(fēng)行云博客里的幾篇文章:

〈一〉ElasticSearch的介紹
〈二〉ElasticSearch的認(rèn)識(shí)：索引、類型、文檔
〈三〉ElasticSearch的認(rèn)識(shí)：搜索、過(guò)濾、排序
〈四〉ElasticSearch的認(rèn)識(shí)：基礎(chǔ)原理的補(bǔ)充

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

ElasticSearch查詢基礎(chǔ)知識(shí)

ElasticSearch查詢基礎(chǔ)知識(shí)

基礎(chǔ)知識(shí)

精確查詢

等值查詢

范圍查詢

存在(exists)查詢

模糊查詢

前綴查詢

模糊(wildcard)查詢

正則(regexp)查詢

fuzzy 糾錯(cuò)查詢

全文檢索

full-text search 全文檢索，倒排索引

phrase search 短語(yǔ)搜索

組合查詢

過(guò)濾

排序

權(quán)重

高亮 Highlight

參考資料

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

ElasticSearch查詢基礎(chǔ)知識(shí)

基礎(chǔ)知識(shí)

精確查詢

等值查詢

范圍查詢

存在(exists)查詢

模糊查詢

前綴查詢

模糊(wildcard)查詢

正則(regexp)查詢

fuzzy 糾錯(cuò)查詢

全文檢索

full-text search 全文檢索，倒排索引

phrase search 短語(yǔ)搜索

組合查詢

過(guò)濾

排序

權(quán)重

高亮 Highlight

參考資料

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

full-text search 全文檢索，倒排索引