elasticsearch之七search搜索詳解

個人專題目錄


1. search搜索入門

1.1 搜索語法入門

query phase

  • 搜索請求發(fā)送到某一個coordinate node,構構建一個priority queue,長度以paging操作from和size為準,默認為10
  • coordinate node將請求轉發(fā)到所有shard,每個shard本地搜索,并構建一個本地的priority queue
  • 各個shard將自己的priority queue返回給coordinate node,并構建一個全局的priority queue

replica shard如何提升搜索吞吐量

一次請求要打到所有shard的一個replica/primary上去,如果每個shard都有多個replica,那么同時并發(fā)過來的搜索請求可以同時打到其他的replica上去

query string search

search的參數都是類似http請求頭中的字符串參數提供搜索條件的。

GET [/index_name/type_name/]_search[?parameter_name=parameter_value&...]

GET /book/_search
{
  "took" : 969,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "book",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "Bootstrap開發(fā)",
          "description" : "Bootstrap是由Twitter推出的一個前臺頁面開發(fā)css框架,是一個非常流行的開發(fā)框架,此框架集成了多種頁面效果。此開發(fā)框架包含了大量的CSS、JS程序代碼,可以幫助開發(fā)者(尤其是不擅長css頁面開發(fā)的程序人員)輕松的實現一個css,不受瀏覽器限制的精美界面css效果。",
          "studymodel" : "201002",
          "price" : 38.6,
          "timestamp" : "2019-08-25 19:11:35",
          "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
          "tags" : [
            "bootstrap",
            "dev"
          ]
        }
      },
      {
        "_index" : "book",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "java編程思想",
          "description" : "java語言是世界第一編程語言,在軟件開發(fā)領域使用人數最多。",
          "studymodel" : "201001",
          "price" : 68.6,
          "timestamp" : "2019-08-25 19:11:35",
          "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
          "tags" : [
            "java",
            "dev"
          ]
        }
      },
      {
        "_index" : "book",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "spring開發(fā)基礎",
          "description" : "spring 在java領域非常流行,java程序員都在用。",
          "studymodel" : "201001",
          "price" : 88.6,
          "timestamp" : "2019-08-24 19:11:35",
          "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
          "tags" : [
            "spring",
            "java"
          ]
        }
      }
    ]
  }
}

解釋

took:耗費了幾毫秒

timed_out:是否超時,這里是沒有

_shards:到幾個分片搜索,成功幾個,跳過幾個,失敗幾個。

hits.total:查詢結果的數量,3個document

hits.max_score:score的含義,就是document對于一個search的相關度的匹配分數,越相關,就越匹配,分數也高

hits.hits:包含了匹配搜索的document的所有詳細數據

傳參

與http請求傳參類似

GET /book/_search?q=name:java&sort=price:desc

類比sql: select * from book where name like ’ %java%’ order by price desc

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "book",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "name" : "java編程思想",
          "description" : "java語言是世界第一編程語言,在軟件開發(fā)領域使用人數最多。",
          "studymodel" : "201001",
          "price" : 68.6,
          "timestamp" : "2019-08-25 19:11:35",
          "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
          "tags" : [
            "java",
            "dev"
          ]
        },
        "sort" : [
          68.6
        ]
      }
    ]
  }
}

timeout

timeout參數:是超時時長定義。代表每個節(jié)點上的每個shard執(zhí)行搜索時最多耗時多久。不會影響響應的正常返回。只會影響返回響應中的數據數量。

如:索引a中,有10億數據。存儲在5個shard中,假設每個shard中2億數據,執(zhí)行全數據搜索的時候,需要耗時1000毫秒。定義timeout為10毫秒,代表的是shard執(zhí)行10毫秒,搜索出多少數據,直接返回。

GET /book/_search?timeout=10ms

全局設置:配置文件中設置 search.default_search_timeout:100ms。默認不超時。

{
  "took": 144, #請求耗時多少毫秒
  "timed_out": false, #是否超時。默認情況下沒有超時機制,也就是客戶端等待Elasticsearch搜索結束(無論執(zhí)行多久),提供超時機制的話,Elasticsearch則在指定時長內處理搜索,在指定時長結束的時候,將搜索的結果直接返回(無論是否搜索結束)。指定超時的方式是傳遞參數,參數單位是:毫秒-ms。秒-s。分鐘-m。
  "_shards": {
    "total": 1, #請求發(fā)送到多少個shard上
    "successful": 1,#成功返回搜索結果的shard
    "skipped": 0, #停止服務的shard
    "failed": 0 #失敗的shard
  },
  "hits": {
    "total": 1, #返回了多少結果
    "max_score": 1, #搜索結果中,最大的相關度分數,相關度越大分數越高,_score越大,排位越靠前。
    "hits": [ #搜索到的結果集合,默認查詢前10條數據。
      {
        "_index": "test_index", #數據所在索引
        "_type": "test_type", #數據所在類型
        "_id": "1", #數據的id
        "_score": 1, #數據的搜索相關度分數
        "_source": { # 數據的具體內容。
          "field": "value"
        }
      }
    ]
  }
}

1.2 multi-index 多索引搜索

multi-index搜索模式

所謂的multi-index就是從多個index中搜索數據。相對使用較少,只有在復合數據搜索的時候,可能出現。一般來說,如果真使用復合數據搜索,都會使用_all。

/_search:所有索引下的所有數據都搜索出來
/index1/_search:指定一個index,搜索其下所有的數據
/index1,index2/_search:同時搜索兩個index下的數據
/index*/_search:按照通配符去匹配多個索引

應用場景:生產環(huán)境log索引可以按照日期分開。

log_to_es_20190910

log_to_es_20190911

log_to_es_20180910

1.3 分頁搜索

分頁搜索的語法

默認情況下,Elasticsearch搜索返回結果是10條數據。從第0條開始查詢。

GET /book/_search?size=10
GET /book/_search?size=10&from=0
GET /book/_search?size=10&from=20
GET /book_search?from=0&size=3

+/-搜索

GET 索引名/_search?q=字段名:條件
GET 索引名/_search?q=+字段名:條件
GET 索引名/_search?q=-字段名:條件

+ :和不定義符號含義一樣,就是搜索指定的字段中包含key words的數據

- : 與+符號含義相反,就是搜索指定的字段中不包含key words的數據

deep paging

什么是deep paging

根據相關度評分倒排序,所以分頁過深,協(xié)調節(jié)點會將大量數據聚合分析。

deep paging 性能問題

  1. 消耗網絡帶寬,因為所搜過深的話,各 shard 要把數據傳遞給 coordinate node,這個過程是有大量數據傳遞的,消耗網絡。

  2. 消耗內存,各 shard 要把數據傳送給 coordinate node,這個傳遞回來的數據,是被 coordinate node 保存在內存中的,這樣會大量消耗內存。

  3. 消耗cup,coordinate node 要把傳回來的數據進行排序,這個排序過程很消耗cpu。
    所以:鑒于deep paging的性能問題,所有應盡量減少使用。

1.4 query string基礎語法

query string基礎語法

GET /book/_search?q=name:java

GET /book/_search?q=+name:java

GET /book/_search?q=-name:java

_all metadata的原理和作用

GET /book/_search?q=java

直接可以搜索所有的field,任意一個field包含指定的關鍵字就可以搜索出來。我們在進行中搜索的時候,難道是對document中的每一個field都進行一次搜索嗎?不是的。

es中_all元數據。建立索引的時候,插入一條docunment,es會將所有的field值經行全量分詞,把這些分詞,放到_all field中。在搜索的時候,沒有指定field,就在_all搜索。

舉例

{
    name:jack
    email:123@qq.com
    address:beijing
}

_all : jack,123@qq.com,beijing 作為這一條document的_all field的值,同時進行分詞后建立對應的倒排索引

1.5 query DSL入門

DSL

DSL - Domain Specified Language , 特殊領域的語言。

請求參數是請求體傳遞的。在Elasticsearch中,請求體的字符集默認為UTF-8。

query string 后邊的參數原來越多,搜索條件越來越復雜,不能滿足需求。

GET /book/_search?q=name:java&size=10&from=0&sort=price:desc

DSL:Domain Specified Language,特定領域的語言

es特有的搜索語言,可在請求體中攜帶搜索條件,功能強大。

查詢全部 GET /book/_search

GET /book/_search
{
  "query": { "match_all": {} }
}

排序 GET /book/_search?sort=price:desc

GET /book/_search 
{
    "query" : {
        "match" : {
            "name" : " java"
        }
    },
    "sort": [
        { "price": "desc" }
    ]
}

分頁查詢 GET /book/_search?size=10&from=0

GET  /book/_search 
{
  "query": { "match_all": {} },
  "from": 0,
  "size": 1
}

指定返回字段 GET /book/ _search? _source=name,studymodel

GET /book/_search 
{
  "query": { "match_all": {} },
  "_source": ["name", "studymodel"]
}

通過組合以上各種類型查詢,實現復雜查詢。

Query DSL語法

{
    QUERY_NAME: {
        ARGUMENT: VALUE,
        ARGUMENT: VALUE,...
    }
}
{
    QUERY_NAME: {
        FIELD_NAME: {
            ARGUMENT: VALUE,
            ARGUMENT: VALUE,...
        }
    }
}
GET /test_index/_search 
{
  "query": {
    "match": {
      "test_field": "test"
    }
  }
}

組合多個搜索條件

搜索需求:title必須包含elasticsearch,content可以包含elasticsearch也可以不包含,author_id必須不為111

初始數據:

POST /website/_doc/1
{
          "title": "my hadoop article",
          "content": "hadoop is very bad",
          "author_id": 111
}

POST /website/_doc/2
{
          "title": "my elasticsearch  article",
          "content": "es is very bad",
          "author_id": 112
}
POST /website/_doc/3
{
          "title": "my elasticsearch article",
          "content": "es is very goods",
          "author_id": 111
}

搜索:

GET /website/_doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "elasticsearch"
          }
        }
      ],
      "should": [
        {
          "match": {
            "content": "elasticsearch"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "author_id": 111
          }
        }
      ]
    }
  }
}

返回:

{
  "took" : 488,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.47000363,
    "hits" : [
      {
        "_index" : "website",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.47000363,
        "_source" : {
          "title" : "my elasticsearch  article",
          "content" : "es is very bad",
          "author_id" : 112
        }
      }
    ]
  }
}

更復雜的搜索需求:

select * from test_index where name='tom' or (hired =true and (personality ='good' and rude != true ))

GET /test_index/_search
{
    "query": {
            "bool": {
                "must": { "match":{ "name": "tom" }},
                "should": [
                    { "match":{ "hired": true }},
                    { "bool": {
                        "must":{ "match": { "personality": "good" }},
                        "must_not": { "match": { "rude": true }}
                    }}
                ],
                "minimum_should_match": 1
            }
    }
}

1.6 full-text search 全文檢索

全文檢索

重新創(chuàng)建book索引

PUT /book/
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "name":{
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "description":{
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "studymodel":{
        "type": "keyword"
      },
      "price":{
        "type": "double"
      },
      "timestamp": {
         "type": "date",
         "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      },
      "pic":{
        "type":"text",
        "index":false
      }
    }
  }
}

插入數據

PUT /book/_doc/1
{
"name": "Bootstrap開發(fā)",
"description": "Bootstrap是由Twitter推出的一個前臺頁面開發(fā)css框架,是一個非常流行的開發(fā)框架,此框架集成了多種頁面效果。此開發(fā)框架包含了大量的CSS、JS程序代碼,可以幫助開發(fā)者(尤其是不擅長css頁面開發(fā)的程序人員)輕松的實現一個css,不受瀏覽器限制的精美界面css效果。",
"studymodel": "201002",
"price":38.6,
"timestamp":"2019-08-25 19:11:35",
"pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags": [ "bootstrap", "dev"]
}

PUT /book/_doc/2
{
"name": "java編程思想",
"description": "java語言是世界第一編程語言,在軟件開發(fā)領域使用人數最多。",
"studymodel": "201001",
"price":68.6,
"timestamp":"2019-08-25 19:11:35",
"pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags": [ "java", "dev"]
}

PUT /book/_doc/3
{
"name": "spring開發(fā)基礎",
"description": "spring 在java領域非常流行,java程序員都在用。",
"studymodel": "201001",
"price":88.6,
"timestamp":"2019-08-24 19:11:35",
"pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags": [ "spring", "java"]
}

搜索

GET  /book/_search 
{
    "query" : {
        "match" : {
            "description" : "java程序員"
        }
    }
}

1.7 評分機制 TF\IDF

算法介紹

relevance score算法,簡單來說,就是計算出,一個索引中的文本,與搜索文本,他們之間的關聯匹配程度。

Elasticsearch使用的是 term frequency/inverse document frequency算法,簡稱為TF/IDF算法。TF詞頻(Term Frequency),IDF逆向文件頻率(Inverse Document Frequency)

Term frequency:搜索文本中的各個詞條在field文本中出現了多少次,出現次數越多,就越相關。

舉例:搜索請求:hello world

doc1 : hello you and me,and world is very good.

doc2 : hello,how are you

Inverse document frequency:搜索文本中的各個詞條在整個索引的所有文檔中出現了多少次,出現的次數越多,就越不相關.

舉例:搜索請求:hello world

doc1 : hello ,today is very good

doc2 : hi world ,how are you

整個index中1億條數據。hello的document 1000個,有world的document 有100個。

doc2 更相關

Field-length norm:field長度,field越長,相關度越弱

舉例:搜索請求:hello world

doc1 : {"title":"hello article","content ":"balabalabal 1萬個"}

doc2 : {"title":"my article","content ":"balabalabal 1萬個,world"}

_score是如何被計算出來的

GET /book/_search?explain=true
{
  "query": {
    "match": {
      "description": "java程序員"
    }
  }
}

返回

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.137549,
    "hits" : [
      {
        "_shard" : "[book][0]",
        "_node" : "MDA45-r6SUGJ0ZyqyhTINA",
        "_index" : "book",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 2.137549,
        "_source" : {
          "name" : "spring開發(fā)基礎",
          "description" : "spring 在java領域非常流行,java程序員都在用。",
          "studymodel" : "201001",
          "price" : 88.6,
          "timestamp" : "2019-08-24 19:11:35",
          "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
          "tags" : [
            "spring",
            "java"
          ]
        },
        "_explanation" : {
          "value" : 2.137549,
          "description" : "sum of:",
          "details" : [
            {
              "value" : 0.7936629,
              "description" : "weight(description:java in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.7936629,
                  "description" : "score(freq=2.0), product of:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.47000363,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 2,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 3,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.7675597,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 2.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 12.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 35.333332,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 1.3438859,
              "description" : "weight(description:程序員 in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 1.3438859,
                  "description" : "score(freq=1.0), product of:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.98082924,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 1,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 3,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.6227967,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 12.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 35.333332,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      },
      {
        "_shard" : "[book][0]",
        "_node" : "MDA45-r6SUGJ0ZyqyhTINA",
        "_index" : "book",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.57961315,
        "_source" : {
          "name" : "java編程思想",
          "description" : "java語言是世界第一編程語言,在軟件開發(fā)領域使用人數最多。",
          "studymodel" : "201001",
          "price" : 68.6,
          "timestamp" : "2019-08-25 19:11:35",
          "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
          "tags" : [
            "java",
            "dev"
          ]
        },
        "_explanation" : {
          "value" : 0.57961315,
          "description" : "sum of:",
          "details" : [
            {
              "value" : 0.57961315,
              "description" : "weight(description:java in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.57961315,
                  "description" : "score(freq=1.0), product of:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.47000363,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 2,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 3,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.56055,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 19.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 35.333332,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

分析一個document是如何被匹配上的

GET /book/_explain/3
{
  "query": {
    "match": {
      "description": "java程序員"
    }
  }
}

1.8 Doc value

搜索的時候,要依靠倒排索引;排序的時候,需要依靠正排索引,看到每個document的每個field,然后進行排序,所謂的正排索引,其實就是doc values

在建立索引的時候,一方面會建立倒排索引,以供搜索用;一方面會建立正排索引,也就是doc values,以供排序,聚合,過濾等操作使用

doc values是被保存在磁盤上的,此時如果內存足夠,os會自動將其緩存在內存中,性能還是會很高;如果內存不足夠,os會將其寫入磁盤上

倒排索引

doc1: hello world you and me

doc2: hi, world, how are you

term doc1 doc2
hello *
world * *
you * *
and *
me *
hi *
how *
are *

搜索時:

hello you --> hello, you

hello --> doc1

you --> doc1,doc2

doc1: hello world you and me

doc2: hi, world, how are you

sort by 出現問題

正排索引

doc1: { "name": "jack", "age": 27 }

doc2: { "name": "tom", "age": 30 }

document name age
doc1 jack 27
doc2 tom 30

1.9 fetch phase

fetch phbase工作流程

  • coordinate node構建完priority queue之后,就發(fā)送mget請求去所有shard上獲取對應的document

  • 各個shard將document返回給coordinate node

  • coordinate node將合并后的document結果返回給client客戶端

一般搜索,如果不加from和size,就默認搜索前10條,按照_score排序

短語檢索。要求查詢條件必須和具體數據完全匹配才算搜索結果。其特征是:1-搜索條件不做任何分詞解析;2-在搜索字段對應的倒排索引(正排索引)中進行精確匹配,不再是簡單的全文檢索。

GET 索引名/_search
{
  "query": {
    "match_phrase": {
      "字段名": "搜索條件"
    }
  }
}

1.10 搜索參數小總結

preference

決定了哪些shard會被用來執(zhí)行搜索操作

_primary, _primary_first, _local, _only_node:xyz, _prefer_node:xyz, _shards:2,3

bouncing results問題,兩個document排序,field值相同;不同的shard上,可能排序不同;每次請求輪詢打到不同的replica shard上;每次頁面上看到的搜索結果的排序都不一樣。這就是bouncing result,也就是跳躍的結果。

搜索的時候,是輪詢將搜索請求發(fā)送到每一個replica shard(primary shard),但是在不同的shard上,可能document的排序不同

解決方案就是將preference設置為一個字符串,比如說user_id,讓每個user每次搜索的時候,都使用同一個replica shard去執(zhí)行,就不會看到bouncing results了

timeout

主要就是限定在一定時間內,將部分獲取到的數據直接返回,避免查詢耗時過長

routing

document文檔路由,_id路由,routing=user_id,這樣的話可以讓同一個user對應的數據到一個shard上去

search_type

default:query_then_fetch

dfs_query_then_fetch,可以提升revelance sort精準度

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

友情鏈接更多精彩內容