窗口函數(shù) over ,rank

1)相關(guān)函數(shù)說明

OVER():指定分析函數(shù)工作的數(shù)據(jù)窗口大小,這個數(shù)據(jù)窗口大小可能會隨著行的變而變化。
CURRENT ROW:當(dāng)前行
n PRECEDING:往前 n 行數(shù)據(jù)
n FOLLOWING:往后 n 行數(shù)據(jù)
UNBOUNDED:起點,
UNBOUNDED PRECEDING 表示從前面的起點,
UNBOUNDED FOLLOWING 表示到后面的終點
LAG(col,n,default_val):往前第 n 行數(shù)據(jù)
LEAD(col,n, default_val):往后第 n 行數(shù)據(jù)
NTILE(n):把有序窗口的行分發(fā)到指定數(shù)據(jù)的組中,各個組有編號,編號從 1 開始,對
于每一行,NTILE 返回此行所屬的組的編號。注意:n 必須為 int 類型。

2)數(shù)據(jù)準(zhǔn)備:name,orderdate,cost

jack,2017-01-01,10
tony,2017-01-02,15
jack,2017-02-03,23
tony,2017-01-04,29
jack,2017-01-05,46
jack,2017-04-06,42
tony,2017-01-07,50
jack,2017-01-08,55
mart,2017-04-08,62
mart,2017-04-09,68
neil,2017-05-10,12
mart,2017-04-11,75
neil,2017-06-12,80
mart,2017-04-13,94

3)需求

(1)查詢在 2017 年 4 月份購買過的顧客及總?cè)藬?shù)
(2)查詢顧客的購買明細(xì)及月購買總額
(3)上述的場景, 將每個顧客的 cost 按照日期進(jìn)行累加
(4)查詢每個顧客上次的購買時間
(5)查詢前 20%時間的訂單信息

4)創(chuàng)建本地 business.txt,導(dǎo)入數(shù)據(jù)

[atguigu@hadoop102 datas]$ vi business.txt

5)創(chuàng)建 hive 表并導(dǎo)入數(shù)據(jù)

create table business(
name string,
orderdate string,
cost int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
load data local inpath "/opt/module/data/business.txt" into table 
business;

6)按需求查詢數(shù)據(jù)

(1) 查詢在 2017 年 4 月份購買過的顧客及總?cè)藬?shù)

select name,count(*) over () 
from business
where substring(orderdate,1,7) = '2017-04'
group by name;

(2) 查詢顧客的購買明細(xì)及月購買總額

select name,orderdate,cost,sum(cost) over(partition by month(orderdate)) 
from business;

(3) 將每個顧客的 cost 按照日期進(jìn)行累加

select name,orderdate,cost,
sum(cost) over() as sample1,--所有行相加
sum(cost) over(partition by name) as sample2,--按 name 分組,組內(nèi)數(shù)據(jù)相加
sum(cost) over(partition by name order by orderdate) as sample3,--按 name
分組,組內(nèi)數(shù)據(jù)累加
sum(cost) over(partition by name order by orderdate rows between 
UNBOUNDED PRECEDING and current row ) as sample4 ,--和 sample3 一樣,由起點到
當(dāng)前行的聚合
sum(cost) over(partition by name order by orderdate rows between 1 
PRECEDING and current row) as sample5, --當(dāng)前行和前面一行做聚合
sum(cost) over(partition by name order by orderdate rows between 1 
PRECEDING AND 1 FOLLOWING ) as sample6,--當(dāng)前行和前邊一行及后面一行
sum(cost) over(partition by name order by orderdate rows between current 
row and UNBOUNDED FOLLOWING ) as sample7 --當(dāng)前行及后面所有行
from business;

rows 必須跟在 order by 子句之后,對排序的結(jié)果進(jìn)行限制,使用固定的行數(shù)來限制分
區(qū)中的數(shù)據(jù)行數(shù)量

(4) 查看顧客上次的購買時間

select name,orderdate,cost,
lag(orderdate,1,'1900-01-01') over(partition by name order by orderdate ) 
as time1, lag(orderdate,2) over (partition by name order by orderdate) as 
time2 
from business;

(5)查詢前 20%時間的訂單信息

select * from (
 select name,orderdate,cost, ntile(5) over(order by orderdate) sorted
 from business
) t
where sorted = 1;

Rank

1)函數(shù)說明

RANK() 排序相同時會重復(fù),總數(shù)不會變
DENSE_RANK() 排序相同時會重復(fù),總數(shù)會減少
ROW_NUMBER() 會根據(jù)順序計算

2)數(shù)據(jù)準(zhǔn)備

測試數(shù)據(jù)

3)需求

計算每門學(xué)科成績排名。

4)創(chuàng)建本地 score.txt,導(dǎo)入數(shù)據(jù)

[atguigu@hadoop102 datas]$ vi score.txt

5)創(chuàng)建 hive 表并導(dǎo)入數(shù)據(jù)

create table score(
name string,
subject string, 
score int) 
row format delimited fields terminated by "\t";
load data local inpath '/opt/module/data/score.txt' into table score;

6)按需求查詢數(shù)據(jù)

select name,
subject,
score,
rank() over(partition by subject order by score desc) rp,
dense_rank() over(partition by subject order by score desc) drp,
row_number() over(partition by subject order by score desc) rmp
from score;

name subject score rp drp rmp
孫悟空 數(shù)學(xué) 95 1 1 1
宋宋 數(shù)學(xué) 86 2 2 2
婷婷 數(shù)學(xué) 85 3 3 3
大海 數(shù)學(xué) 56 4 4 4
宋宋 英語 84 1 1 1
大海 英語 84 1 1 2
婷婷 英語 78 3 2 3
孫悟空 英語 68 4 3 4
大海 語文 94 1 1 1
孫悟空 語文 87 2 2 2
婷婷 語文 65 3 3 3
宋宋 語文 64 4 4 4
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容