有 SQL 背景的同學(xué)在學(xué)習(xí) Elasticsearch 時(shí),面對(duì)一個(gè)查詢(xún)需求,不由自主地會(huì)先思考如何用 SQL 來(lái)實(shí)現(xiàn),然后再去想 Elasticsearch 的 Query DSL 如何實(shí)現(xiàn)。那么本篇就給大家講一條常見(jiàn)的 SQL 語(yǔ)句如何用 Elasticsearch 的查詢(xún)語(yǔ)言實(shí)現(xiàn)。
1. SQL語(yǔ)句
假設(shè)我們有一個(gè)汽車(chē)的數(shù)據(jù)集,每個(gè)汽車(chē)都有車(chē)型、顏色等字段,我希望獲取顏色種類(lèi)大于1個(gè)的前2車(chē)型。假設(shè)汽車(chē)的數(shù)據(jù)模型如下:
{
"model":"modelA",
"color":"red"
}
假設(shè)我們有一個(gè) cars 表,通過(guò)如下語(yǔ)句創(chuàng)建測(cè)試數(shù)據(jù)。
INSERT INTO cars (model,color) VALUES ('A','red');
INSERT INTO cars (model,color) VALUES ('A','white');
INSERT INTO cars (model,color) VALUES ('A','black');
INSERT INTO cars (model,color) VALUES ('A','yellow');
INSERT INTO cars (model,color) VALUES ('B','red');
INSERT INTO cars (model,color) VALUES ('B','white');
INSERT INTO cars (model,color) VALUES ('C','black');
INSERT INTO cars (model,color) VALUES ('C','red');
INSERT INTO cars (model,color) VALUES ('C','white');
INSERT INTO cars (model,color) VALUES ('C','yellow');
INSERT INTO cars (model,color) VALUES ('C','blue');
INSERT INTO cars (model,color) VALUES ('D','red');
INSERT INTO cars (model,color) VALUES ('A','red');
那么實(shí)現(xiàn)我們需求的 SQL 語(yǔ)句也比較簡(jiǎn)單,實(shí)現(xiàn)如下:
SELECT model,COUNT(DISTINCT color) color_count FROM cars GROUP BY model HAVING color_count > 1 ORDER BY color_count desc LIMIT 2;
這條查詢(xún)語(yǔ)句中 Group By 是按照 model 做分組, Having color_count>1 限定了車(chē)型顏色種類(lèi)大于1,**ORDER BY color_count desc ** 限定結(jié)果按照顏色種類(lèi)倒序排列,而 LIMIT 2 限定只返回前3條數(shù)據(jù)。
那么在 Elasticsearch 中如何實(shí)現(xiàn)這個(gè)需求呢?
2. 在 Elasticsearch 模擬測(cè)試數(shù)據(jù)
首先我們需要先在 elasticsearch 中插入測(cè)試的數(shù)據(jù),這里我們使用 bulk 接口 ,如下所示:
POST _bulk
{"index":{"_index":"cars","_type":"doc","_id":"1"}}
{"model":"A","color":"red"}
{"index":{"_index":"cars","_type":"doc","_id":"2"}}
{"model":"A","color":"white"}
{"index":{"_index":"cars","_type":"doc","_id":"3"}}
{"model":"A","color":"black"}
{"index":{"_index":"cars","_type":"doc","_id":"4"}}
{"model":"A","color":"yellow"}
{"index":{"_index":"cars","_type":"doc","_id":"5"}}
{"model":"B","color":"red"}
{"index":{"_index":"cars","_type":"doc","_id":"6"}}
{"model":"B","color":"white"}
{"index":{"_index":"cars","_type":"doc","_id":"7"}}
{"model":"C","color":"black"}
{"index":{"_index":"cars","_type":"doc","_id":"8"}}
{"model":"C","color":"red"}
{"index":{"_index":"cars","_type":"doc","_id":"9"}}
{"model":"C","color":"white"}
{"index":{"_index":"cars","_type":"doc","_id":"10"}}
{"model":"C","color":"yellow"}
{"index":{"_index":"cars","_type":"doc","_id":"11"}}
{"model":"C","color":"blue"}
{"index":{"_index":"cars","_type":"doc","_id":"12"}}
{"model":"D","color":"red"}
{"index":{"_index":"cars","_type":"doc","_id":"13"}}
{"model":"A","color":"red"}
其中 index 為 cars,type 為 doc,所有數(shù)據(jù)與mysql 數(shù)據(jù)保持一致。大家可以在 Kibana 的 Dev Tools 中執(zhí)行上面的命令,然后執(zhí)行下面的查詢(xún)語(yǔ)句驗(yàn)證數(shù)據(jù)是否已經(jīng)成功存入。
GET cars/_search
3. Group By VS Terms/Metric Aggregation
SQL 中 Group By 語(yǔ)句在 Elasticsearch 中對(duì)應(yīng)的是 Terms Aggregation,即分桶聚合,對(duì)應(yīng) Group By color 的語(yǔ)句如下所示:
GET cars/_search
{
"size":0,
"aggs":{
"models":{
"terms":{
"field":"model.keyword"
}
}
}
}
結(jié)果如下:
{
"took": 161,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 13,
"max_score": 0,
"hits": []
},
"aggregations": {
"models": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "A",
"doc_count": 5
},
{
"key": "C",
"doc_count": 5
},
{
"key": "B",
"doc_count": 2
},
{
"key": "D",
"doc_count": 1
}
]
}
}
}
我們看 aggregations 這個(gè) key 下面的即為返回結(jié)果。
SQL 語(yǔ)句中還有一項(xiàng)是 COUNT(DISTINCT color) color_count 用于計(jì)算每個(gè) model 的顏色數(shù),在 Elasticsearch 中我們需要使用一個(gè)指標(biāo)類(lèi)聚合 Cardinality ,進(jìn)行不同值計(jì)數(shù)。語(yǔ)句如下:
GET cars/_search
{
"size": 0,
"aggs": {
"models": {
"terms": {
"field": "model.keyword"
},
"aggs": {
"color_count": {
"cardinality": {
"field": "color.keyword"
}
}
}
}
}
}
其返回結(jié)果如下:
{
"took": 74,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 13,
"max_score": 0,
"hits": []
},
"aggregations": {
"models": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "A",
"doc_count": 5,
"color_count": {
"value": 4
}
},
{
"key": "C",
"doc_count": 5,
"color_count": {
"value": 5
}
},
{
"key": "B",
"doc_count": 2,
"color_count": {
"value": 2
}
},
{
"key": "D",
"doc_count": 1,
"color_count": {
"value": 1
}
}
]
}
}
}
結(jié)果中 color_count 即為每個(gè) model 的顏色數(shù),但這里所有的模型都返回了,我們只想要顏色數(shù)大于1的模型,因此這里還要加一個(gè)過(guò)濾條件。
4. Having Condition VS Bucket Filter Aggregation
Having color_count > 1 在 Elasticsearch 中對(duì)應(yīng)的是 Bucket Filter 聚合,語(yǔ)句如下所示:
GET cars/_search
{
"size": 0,
"aggs": {
"models": {
"terms": {
"field": "model.keyword"
},
"aggs": {
"color_count": {
"cardinality": {
"field": "color.keyword"
}
},
"color_count_filter": {
"bucket_selector": {
"buckets_path": {
"colorCount": "color_count"
},
"script": "params.colorCount>1"
}
}
}
}
}
}
返回結(jié)果如下:
{
"took": 39,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 13,
"max_score": 0,
"hits": []
},
"aggregations": {
"models": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "A",
"doc_count": 5,
"color_count": {
"value": 4
}
},
{
"key": "C",
"doc_count": 5,
"color_count": {
"value": 5
}
},
{
"key": "B",
"doc_count": 2,
"color_count": {
"value": 2
}
}
]
}
}
}
此時(shí)返回結(jié)果只包含顏色數(shù)大于1的模型,但大家會(huì)發(fā)現(xiàn)顏色數(shù)多的 C 不是在第一個(gè)位置,我們還需要做排序處理。
5. Order By Limit VS Bucket Sort Aggregation
ORDER BY color_count desc LIMIT 3 在 Elasticsearch 中可以使用 Bucket Sort 聚合實(shí)現(xiàn),語(yǔ)句如下所示:
GET cars/_search
{
"size": 0,
"aggs": {
"models": {
"terms": {
"field": "model.keyword"
},
"aggs": {
"color_count": {
"cardinality": {
"field": "color.keyword"
}
},
"color_count_filter": {
"bucket_selector": {
"buckets_path": {
"colorCount": "color_count"
},
"script": "params.colorCount>1"
}
},
"color_count_sort": {
"bucket_sort": {
"sort": {
"color_count": "desc"
},
"size": 2
}
}
}
}
}
}
返回結(jié)果如下:
{
"took": 32,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 13,
"max_score": 0,
"hits": []
},
"aggregations": {
"models": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "C",
"doc_count": 5,
"color_count": {
"value": 5
}
},
{
"key": "A",
"doc_count": 5,
"color_count": {
"value": 4
}
}
]
}
}
}
至此我們便將 SQL 語(yǔ)句實(shí)現(xiàn)的功能用 Elasticsearch 查詢(xún)語(yǔ)句實(shí)現(xiàn)了。對(duì)比 SQL 語(yǔ)句與 Elasticsearch 的查詢(xún)語(yǔ)句,大家會(huì)發(fā)現(xiàn)后者復(fù)雜了很多,但并非無(wú)章可循,隨著大家對(duì)常見(jiàn)語(yǔ)法越來(lái)越熟悉,相信一定會(huì)越寫(xiě)越得心應(yīng)手!