介紹Hive查詢(xún)中數(shù)值累加的思路的方法
1. 需求分析
現(xiàn)有 hive 表 record, 內(nèi)容如下:
hive> select * from record;
OK
A 2015-01 5
A 2015-01 15
B 2015-01 5
A 2015-01 8
B 2015-01 25
A 2015-01 5
A 2015-02 4
A 2015-02 6
B 2015-02 10
B 2015-02 5
A 2015-03 16
A 2015-03 22
B 2015-03 23
B 2015-03 10
B 2015-03 11
其中字段意義:
userid(string) month(string) count(int)
分別代表:
用戶(hù)id 月份 該月訪(fǎng)問(wèn)次數(shù)
需求:
統(tǒng)計(jì)每個(gè)用戶(hù)截止到當(dāng)月為止的最大單月訪(fǎng)問(wèn)次數(shù)和累計(jì)到該月的總訪(fǎng)問(wèn)次數(shù)
最終結(jié)果為:
用戶(hù) 月份 本月訪(fǎng)問(wèn)次數(shù) 截止到當(dāng)月總訪(fǎng)問(wèn)次數(shù) 截止到當(dāng)月最大訪(fǎng)問(wèn)次數(shù)
A 2015-01 33 33 33
A 2015-02 10 43 33
A 2015-03 38 81 38
B 2015-01 30 30 30
B 2015-02 15 45 30
B 2015-03 44 89 44
2. 方法一
--(1)
# 先求出每個(gè)用戶(hù)每個(gè)月總訪(fǎng)問(wèn)量
CREATE TABLE record_2 AS
SELECT userid, month, sum(count) as count
FROM record
GROUP BY userid, month;
# record_2 表中內(nèi)容為:
A 2015-01 33
A 2015-02 10
A 2015-03 38
B 2015-01 30
B 2015-02 15
B 2015-03 44
--(2)
SELECT t1.userid, t1.month, t1.count, sum(t2.count) sum_count, max(t2.count) max_count
FROM record_2 t1 INNER JOIN record_2 t2
ON t1.userid = t2.userid
WHERE t1.month >= t2.month
GROUP BY t1.userid, t1.month, t1.count
ORDER BY t1.userid, t1.month;
# 最終結(jié)果:
A 2015-01 33 33 33
A 2015-02 10 43 33
A 2015-03 38 81 38
B 2015-01 30 30 30
B 2015-02 15 45 30
B 2015-03 44 89 44
3. 方法二:使用Hive窗口函數(shù)max()、sum()
select userid, month, count,
sum(count) over(partition by userid order by month) as sum_count,
max(count) over(partition by userid order by month) as max_count
from record_2;
結(jié)果:
A 2015-01 33 33 33
A 2015-02 10 43 33
A 2015-03 38 81 38
B 2015-01 30 30 30
B 2015-02 15 45 30
B 2015-03 44 89 44