Elasticsearch如何實(shí)現(xiàn) SQL語(yǔ)句中 Group By 和 Limit 的功能

有 SQL 背景的同學(xué)在學(xué)習(xí) Elasticsearch 時(shí),面對(duì)一個(gè)查詢(xún)需求,不由自主地會(huì)先思考如何用 SQL 來(lái)實(shí)現(xiàn),然后再去想 Elasticsearch 的 Query DSL 如何實(shí)現(xiàn)。那么本篇就給大家講一條常見(jiàn)的 SQL 語(yǔ)句如何用 Elasticsearch 的查詢(xún)語(yǔ)言實(shí)現(xiàn)。

1. SQL語(yǔ)句

假設(shè)我們有一個(gè)汽車(chē)的數(shù)據(jù)集,每個(gè)汽車(chē)都有車(chē)型、顏色等字段,我希望獲取顏色種類(lèi)大于1個(gè)的前2車(chē)型。假設(shè)汽車(chē)的數(shù)據(jù)模型如下:

{
    "model":"modelA",
    "color":"red"
}

假設(shè)我們有一個(gè) cars 表,通過(guò)如下語(yǔ)句創(chuàng)建測(cè)試數(shù)據(jù)。

INSERT INTO cars (model,color) VALUES ('A','red'); 
INSERT INTO cars (model,color) VALUES ('A','white'); 
INSERT INTO cars (model,color) VALUES ('A','black'); 
INSERT INTO cars (model,color) VALUES ('A','yellow'); 
INSERT INTO cars (model,color) VALUES ('B','red'); 
INSERT INTO cars (model,color) VALUES ('B','white'); 
INSERT INTO cars (model,color) VALUES ('C','black'); 
INSERT INTO cars (model,color) VALUES ('C','red'); 
INSERT INTO cars (model,color) VALUES ('C','white'); 
INSERT INTO cars (model,color) VALUES ('C','yellow'); 
INSERT INTO cars (model,color) VALUES ('C','blue'); 
INSERT INTO cars (model,color) VALUES ('D','red');
INSERT INTO cars (model,color) VALUES ('A','red'); 

那么實(shí)現(xiàn)我們需求的 SQL 語(yǔ)句也比較簡(jiǎn)單,實(shí)現(xiàn)如下:

SELECT model,COUNT(DISTINCT color) color_count FROM cars GROUP BY model HAVING color_count > 1 ORDER BY color_count desc LIMIT 2;

這條查詢(xún)語(yǔ)句中 Group By 是按照 model 做分組, Having color_count>1 限定了車(chē)型顏色種類(lèi)大于1,**ORDER BY color_count desc ** 限定結(jié)果按照顏色種類(lèi)倒序排列,而 LIMIT 2 限定只返回前3條數(shù)據(jù)。

那么在 Elasticsearch 中如何實(shí)現(xiàn)這個(gè)需求呢?

2. 在 Elasticsearch 模擬測(cè)試數(shù)據(jù)

首先我們需要先在 elasticsearch 中插入測(cè)試的數(shù)據(jù),這里我們使用 bulk 接口 ,如下所示:

POST _bulk
{"index":{"_index":"cars","_type":"doc","_id":"1"}}
{"model":"A","color":"red"}
{"index":{"_index":"cars","_type":"doc","_id":"2"}}
{"model":"A","color":"white"}
{"index":{"_index":"cars","_type":"doc","_id":"3"}}
{"model":"A","color":"black"}
{"index":{"_index":"cars","_type":"doc","_id":"4"}}
{"model":"A","color":"yellow"}
{"index":{"_index":"cars","_type":"doc","_id":"5"}}
{"model":"B","color":"red"}
{"index":{"_index":"cars","_type":"doc","_id":"6"}}
{"model":"B","color":"white"}
{"index":{"_index":"cars","_type":"doc","_id":"7"}}
{"model":"C","color":"black"}
{"index":{"_index":"cars","_type":"doc","_id":"8"}}
{"model":"C","color":"red"}
{"index":{"_index":"cars","_type":"doc","_id":"9"}}
{"model":"C","color":"white"}
{"index":{"_index":"cars","_type":"doc","_id":"10"}}
{"model":"C","color":"yellow"}
{"index":{"_index":"cars","_type":"doc","_id":"11"}}
{"model":"C","color":"blue"}
{"index":{"_index":"cars","_type":"doc","_id":"12"}}
{"model":"D","color":"red"}
{"index":{"_index":"cars","_type":"doc","_id":"13"}}
{"model":"A","color":"red"}

其中 index 為 cars,type 為 doc,所有數(shù)據(jù)與mysql 數(shù)據(jù)保持一致。大家可以在 Kibana 的 Dev Tools 中執(zhí)行上面的命令,然后執(zhí)行下面的查詢(xún)語(yǔ)句驗(yàn)證數(shù)據(jù)是否已經(jīng)成功存入。

GET cars/_search

3. Group By VS Terms/Metric Aggregation

SQL 中 Group By 語(yǔ)句在 Elasticsearch 中對(duì)應(yīng)的是 Terms Aggregation,即分桶聚合,對(duì)應(yīng) Group By color 的語(yǔ)句如下所示:

GET cars/_search
{
  "size":0,
  "aggs":{
    "models":{
      "terms":{
        "field":"model.keyword"
      }
    }
  }
}

結(jié)果如下:

{
  "took": 161,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 13,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "models": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "A",
          "doc_count": 5
        },
        {
          "key": "C",
          "doc_count": 5
        },
        {
          "key": "B",
          "doc_count": 2
        },
        {
          "key": "D",
          "doc_count": 1
        }
      ]
    }
  }
}

我們看 aggregations 這個(gè) key 下面的即為返回結(jié)果。

SQL 語(yǔ)句中還有一項(xiàng)是 COUNT(DISTINCT color) color_count 用于計(jì)算每個(gè) model 的顏色數(shù),在 Elasticsearch 中我們需要使用一個(gè)指標(biāo)類(lèi)聚合 Cardinality ,進(jìn)行不同值計(jì)數(shù)。語(yǔ)句如下:

GET cars/_search
{
  "size": 0,
  "aggs": {
    "models": {
      "terms": {
        "field": "model.keyword"
      },
      "aggs": {
        "color_count": {
          "cardinality": {
            "field": "color.keyword"
          }
        }
      }
    }
  }
}

其返回結(jié)果如下:

{
  "took": 74,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 13,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "models": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "A",
          "doc_count": 5,
          "color_count": {
            "value": 4
          }
        },
        {
          "key": "C",
          "doc_count": 5,
          "color_count": {
            "value": 5
          }
        },
        {
          "key": "B",
          "doc_count": 2,
          "color_count": {
            "value": 2
          }
        },
        {
          "key": "D",
          "doc_count": 1,
          "color_count": {
            "value": 1
          }
        }
      ]
    }
  }
}

結(jié)果中 color_count 即為每個(gè) model 的顏色數(shù),但這里所有的模型都返回了,我們只想要顏色數(shù)大于1的模型,因此這里還要加一個(gè)過(guò)濾條件。

4. Having Condition VS Bucket Filter Aggregation

Having color_count > 1 在 Elasticsearch 中對(duì)應(yīng)的是 Bucket Filter 聚合,語(yǔ)句如下所示:

GET cars/_search
{
  "size": 0,
  "aggs": {
    "models": {
      "terms": {
        "field": "model.keyword"
      },
      "aggs": {
        "color_count": {
          "cardinality": {
            "field": "color.keyword"
          }
        },
        "color_count_filter": {
          "bucket_selector": {
            "buckets_path": {
              "colorCount": "color_count"
            },
            "script": "params.colorCount>1"
          }
        }
      }
    }
  }
}

返回結(jié)果如下:

{
  "took": 39,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 13,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "models": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "A",
          "doc_count": 5,
          "color_count": {
            "value": 4
          }
        },
        {
          "key": "C",
          "doc_count": 5,
          "color_count": {
            "value": 5
          }
        },
        {
          "key": "B",
          "doc_count": 2,
          "color_count": {
            "value": 2
          }
        }
      ]
    }
  }
}

此時(shí)返回結(jié)果只包含顏色數(shù)大于1的模型,但大家會(huì)發(fā)現(xiàn)顏色數(shù)多的 C 不是在第一個(gè)位置,我們還需要做排序處理。

5. Order By Limit VS Bucket Sort Aggregation

ORDER BY color_count desc LIMIT 3 在 Elasticsearch 中可以使用 Bucket Sort 聚合實(shí)現(xiàn),語(yǔ)句如下所示:

GET cars/_search
{
  "size": 0,
  "aggs": {
    "models": {
      "terms": {
        "field": "model.keyword"
      },
      "aggs": {
        "color_count": {
          "cardinality": {
            "field": "color.keyword"
          }
        },
        "color_count_filter": {
          "bucket_selector": {
            "buckets_path": {
              "colorCount": "color_count"
            },
            "script": "params.colorCount>1"
          }
        },
        "color_count_sort": {
          "bucket_sort": {
            "sort": {
              "color_count": "desc"
            },
            "size": 2
          }
        }
      }
    }
  }
}

返回結(jié)果如下:

{
  "took": 32,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 13,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "models": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "C",
          "doc_count": 5,
          "color_count": {
            "value": 5
          }
        },
        {
          "key": "A",
          "doc_count": 5,
          "color_count": {
            "value": 4
          }
        }
      ]
    }
  }
}

至此我們便將 SQL 語(yǔ)句實(shí)現(xiàn)的功能用 Elasticsearch 查詢(xún)語(yǔ)句實(shí)現(xiàn)了。對(duì)比 SQL 語(yǔ)句與 Elasticsearch 的查詢(xún)語(yǔ)句,大家會(huì)發(fā)現(xiàn)后者復(fù)雜了很多,但并非無(wú)章可循,隨著大家對(duì)常見(jiàn)語(yǔ)法越來(lái)越熟悉,相信一定會(huì)越寫(xiě)越得心應(yīng)手!

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容