一、聚合分析簡(jiǎn)介
??聚合分析是數(shù)據(jù)庫中重要的功能特性,完成對(duì)一個(gè)查詢的數(shù)據(jù)集中數(shù)據(jù)的聚合計(jì)算,如:找出某字段(或計(jì)算表達(dá)式的結(jié)果)的最大值、最小值,計(jì)算和、平均值等。ES作為搜索引擎兼數(shù)據(jù)庫,同樣提供了強(qiáng)大的聚合分析能力。
??對(duì)一個(gè)數(shù)據(jù)集求最大、最小、和、平均值等指標(biāo)的聚合,在ES中稱為指標(biāo)聚合 metric
??而關(guān)系型數(shù)據(jù)庫中除了有聚合函數(shù)外,還可以對(duì)查詢出的數(shù)據(jù)進(jìn)行分組group by,再在組上進(jìn)行指標(biāo)聚合。在 ES 中g(shù)roup by 稱為分桶,桶聚合 bucketing。
??ES中還提供了矩陣聚合(matrix)、管道聚合(pipleline),但還在完善中。
聚合分析的值來源:
聚合計(jì)算的值可以取字段的值,也可是腳本計(jì)算的結(jié)果。
二、指標(biāo)聚合
- 查找價(jià)格最高的商品
GET /goods_index/goods_type/_search
{
"size": 0,
"aggs": {
"masssbalance": {
"max": {
"field": "sell_price"
}
}
}
}
- 查找價(jià)格最低的商品
GET /goods_index/goods_type/_search
{
"size": 0,
"aggs": {
"masssbalance": {
"min": {
"field": "sell_price"
}
}
}
}
- 查找所有商品和
GET /goods_index/goods_type/_search
{
"size": 0,
"aggs": {
"masssbalance": {
"sum": {
"field": "sell_price"
}
}
}
}
- 查詢商品平均價(jià)
GET /goods_index/goods_type/_search
{
"size": 0,
"aggs": {
"masssbalance": {
"avg": {
"field": "sell_price"
}
}
}
}
- 文檔計(jì)數(shù) count
統(tǒng)計(jì)商品價(jià)格大于500的文檔數(shù)量
GET /goods_index/goods_type/_count
{
"query": {
"bool": {
"filter": {
"range": {
"sell_price": {
"gte": 10
}
}
}
}
}
}
- Value count 統(tǒng)計(jì)某字段有值的文檔數(shù)
GET /goods_index/goods_type/_search?size=0
{
"aggs": {
"sell_count": {
"value_count": {
"field": "sell_price"
}
}
}
}
結(jié)果:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0,
"hits": []
},
"aggregations": {
"sell_count": {
"value": 7
}
}
}
- cardinality 值去重計(jì)數(shù)
GET /goods_index/goods_type/_search?size=0
{
"aggs": {
"sell_count": {
"cardinality": {
"field": "sell_price"
}
}
}
}
結(jié)果:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0,
"hits": []
},
"aggregations": {
"sell_count": {
"value": 6
}
}
}
8.stats 統(tǒng)計(jì) count max min avg sum 5個(gè)值
GET /goods_index/goods_type/_search?size=0
{
"aggs": {
"sell_stats": {
"stats": {
"field": "sell_price"
}
}
}
}
結(jié)果:
{
"took": 17,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0,
"hits": []
},
"aggregations": {
"sell_stats": {
"count": 7,
"min": 398,
"max": 980,
"avg": 692.1428571428571,
"sum": 4845
}
}
}
- Extended stats
GET /goods_index/goods_type/_search?size=0
{
"aggs": {
"sell_stats": {
"extended_stats": {
"field": "sell_price"
}
}
}
}
結(jié)果:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0,
"hits": []
},
"aggregations": {
"sell_stats": {
"count": 7,
"min": 398,
"max": 980,
"avg": 692.1428571428571,
"sum": 4845,
"sum_of_squares": 3565461,
"variance": 30289.836734693898,
"std_deviation": 174.03975619005533,
"std_deviation_bounds": {
"upper": 1040.2223695229677,
"lower": 344.06334476274645
}
}
}
}
- Percentiles 占比百分位對(duì)應(yīng)的值統(tǒng)計(jì)
對(duì)指定字段(腳本)的值按從小到大累計(jì)每個(gè)值對(duì)應(yīng)的文檔數(shù)的占比(占所有命中文檔數(shù)的百分比),返回指定占比比例對(duì)應(yīng)的值。默認(rèn)返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值。如下中間的結(jié)果,可以理解為:占比為50%的文檔的sell_price值 <= 696,或反過來:sell_price<=696的文檔數(shù)占總命中文檔數(shù)的50%。
GET /goods_index/goods_type/_search?size=0
{
"aggs": {
"age_percents": {
"percentiles": {
"field": "sell_price"
}
}
}
}
結(jié)果:
{
"took": 34,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0,
"hits": []
},
"aggregations": {
"age_percents": {
"values": {
"1.0": 398,
"5.0": 398,
"25.0": 603.75,
"50.0": 696,
"75.0": 819,
"95.0": 980,
"99.0": 980
}
}
}
}
指定分位值
GET /goods_index/goods_type/_search?size=0
{
"aggs": {
"age_percents": {
"percentiles": {
"field": "sell_price",
"percents" : [95, 99, 99.9]
}
}
}
}
結(jié)果:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0,
"hits": []
},
"aggregations": {
"age_percents": {
"values": {
"95.0": 980,
"99.0": 980,
"99.9": 980
}
}
}
}
- Percentiles rank 統(tǒng)計(jì)值小于等于指定值的文檔占比
統(tǒng)計(jì)年齡小于800和500的文檔的占比
GET /goods_index/goods_type/_search?size=0
{
"aggs": {
"gge_perc_rank": {
"percentile_ranks": {
"field": "sell_price",
"values": [
500,
800
]
}
}
}
}
結(jié)果:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0,
"hits": []
},
"aggregations": {
"gge_perc_rank": {
"values": {
"500.0": 14.417379855167873,
"800.0": 73.64185110663985
}
}
}
}
一、計(jì)算每個(gè)tag下的商品數(shù)量

image.png
將文本field的fielddata屬性設(shè)置為true

image.png

image.png
二、對(duì)名稱中包含yagao的商品,計(jì)算每個(gè)tag下的商品數(shù)量

image.png
三、先分組,再算每組的平均值,計(jì)算每個(gè)tag下的商品的平均價(jià)格

image.png
四、計(jì)算每個(gè)tag下的商品的平均價(jià)格,并且按照平均價(jià)格降序排序

image.png
五、按照指定的價(jià)格范圍區(qū)間進(jìn)行分組,然后在每組內(nèi)再按照tag進(jìn)行分組,最后再計(jì)算每組的平均價(jià)格
新增一個(gè)商品便于分析
PUT ecommerce/product/4
{
"name": "shiwang yagao",
"desc": "gaoxiao meibai fangzhu",
"price": 30,
"producer": "shiwang producer",
"tags": [
"meibai",
"fangzhu"
]
}

image.png