1)相關(guān)函數(shù)說明
OVER():指定分析函數(shù)工作的數(shù)據(jù)窗口大小,這個數(shù)據(jù)窗口大小可能會隨著行的變而變化。
CURRENT ROW:當(dāng)前行
n PRECEDING:往前 n 行數(shù)據(jù)
n FOLLOWING:往后 n 行數(shù)據(jù)
UNBOUNDED:起點,
UNBOUNDED PRECEDING 表示從前面的起點,
UNBOUNDED FOLLOWING 表示到后面的終點
LAG(col,n,default_val):往前第 n 行數(shù)據(jù)
LEAD(col,n, default_val):往后第 n 行數(shù)據(jù)
NTILE(n):把有序窗口的行分發(fā)到指定數(shù)據(jù)的組中,各個組有編號,編號從 1 開始,對
于每一行,NTILE 返回此行所屬的組的編號。注意:n 必須為 int 類型。
2)數(shù)據(jù)準(zhǔn)備:name,orderdate,cost
jack,2017-01-01,10
tony,2017-01-02,15
jack,2017-02-03,23
tony,2017-01-04,29
jack,2017-01-05,46
jack,2017-04-06,42
tony,2017-01-07,50
jack,2017-01-08,55
mart,2017-04-08,62
mart,2017-04-09,68
neil,2017-05-10,12
mart,2017-04-11,75
neil,2017-06-12,80
mart,2017-04-13,94
3)需求
(1)查詢在 2017 年 4 月份購買過的顧客及總?cè)藬?shù)
(2)查詢顧客的購買明細(xì)及月購買總額
(3)上述的場景, 將每個顧客的 cost 按照日期進(jìn)行累加
(4)查詢每個顧客上次的購買時間
(5)查詢前 20%時間的訂單信息
4)創(chuàng)建本地 business.txt,導(dǎo)入數(shù)據(jù)
[atguigu@hadoop102 datas]$ vi business.txt
5)創(chuàng)建 hive 表并導(dǎo)入數(shù)據(jù)
create table business(
name string,
orderdate string,
cost int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
load data local inpath "/opt/module/data/business.txt" into table
business;
6)按需求查詢數(shù)據(jù)
(1) 查詢在 2017 年 4 月份購買過的顧客及總?cè)藬?shù)
select name,count(*) over ()
from business
where substring(orderdate,1,7) = '2017-04'
group by name;
(2) 查詢顧客的購買明細(xì)及月購買總額
select name,orderdate,cost,sum(cost) over(partition by month(orderdate))
from business;
(3) 將每個顧客的 cost 按照日期進(jìn)行累加
select name,orderdate,cost,
sum(cost) over() as sample1,--所有行相加
sum(cost) over(partition by name) as sample2,--按 name 分組,組內(nèi)數(shù)據(jù)相加
sum(cost) over(partition by name order by orderdate) as sample3,--按 name
分組,組內(nèi)數(shù)據(jù)累加
sum(cost) over(partition by name order by orderdate rows between
UNBOUNDED PRECEDING and current row ) as sample4 ,--和 sample3 一樣,由起點到
當(dāng)前行的聚合
sum(cost) over(partition by name order by orderdate rows between 1
PRECEDING and current row) as sample5, --當(dāng)前行和前面一行做聚合
sum(cost) over(partition by name order by orderdate rows between 1
PRECEDING AND 1 FOLLOWING ) as sample6,--當(dāng)前行和前邊一行及后面一行
sum(cost) over(partition by name order by orderdate rows between current
row and UNBOUNDED FOLLOWING ) as sample7 --當(dāng)前行及后面所有行
from business;
rows 必須跟在 order by 子句之后,對排序的結(jié)果進(jìn)行限制,使用固定的行數(shù)來限制分
區(qū)中的數(shù)據(jù)行數(shù)量
(4) 查看顧客上次的購買時間
select name,orderdate,cost,
lag(orderdate,1,'1900-01-01') over(partition by name order by orderdate )
as time1, lag(orderdate,2) over (partition by name order by orderdate) as
time2
from business;
(5)查詢前 20%時間的訂單信息
select * from (
select name,orderdate,cost, ntile(5) over(order by orderdate) sorted
from business
) t
where sorted = 1;
Rank
1)函數(shù)說明
RANK() 排序相同時會重復(fù),總數(shù)不會變
DENSE_RANK() 排序相同時會重復(fù),總數(shù)會減少
ROW_NUMBER() 會根據(jù)順序計算
2)數(shù)據(jù)準(zhǔn)備

3)需求
計算每門學(xué)科成績排名。
4)創(chuàng)建本地 score.txt,導(dǎo)入數(shù)據(jù)
[atguigu@hadoop102 datas]$ vi score.txt
5)創(chuàng)建 hive 表并導(dǎo)入數(shù)據(jù)
create table score(
name string,
subject string,
score int)
row format delimited fields terminated by "\t";
load data local inpath '/opt/module/data/score.txt' into table score;
6)按需求查詢數(shù)據(jù)
select name,
subject,
score,
rank() over(partition by subject order by score desc) rp,
dense_rank() over(partition by subject order by score desc) drp,
row_number() over(partition by subject order by score desc) rmp
from score;
name subject score rp drp rmp
孫悟空 數(shù)學(xué) 95 1 1 1
宋宋 數(shù)學(xué) 86 2 2 2
婷婷 數(shù)學(xué) 85 3 3 3
大海 數(shù)學(xué) 56 4 4 4
宋宋 英語 84 1 1 1
大海 英語 84 1 1 2
婷婷 英語 78 3 2 3
孫悟空 英語 68 4 3 4
大海 語文 94 1 1 1
孫悟空 語文 87 2 2 2
婷婷 語文 65 3 3 3
宋宋 語文 64 4 4 4