課堂目標
- 掌握sum(),avg()用于累計計算的窗口函數(shù)
- 掌握row_number(),rank()用于排序的窗口函數(shù)【最高頻※※※】
- 掌握ntile()用于分組查詢的窗口函數(shù)【很低頻】
- 掌握lag(),lead()偏移分析窗口函數(shù)【很低頻】
1.2018年每月的支付總額和當年累計支付總額
'''sum后面加需要累計求和的字段
over后面是按照月份的升序進行的【排到某月份前面的所有都進行匯總】'''
select a.month,
a.pay_amount,
sum(a.pay_amount) over(order by a.month)
from
(select month(dt) month,
sum(pay_amount) pay_amount
from user_trade
where year(dt)='2018'
group by month(dt)) a;

image.png
2.2017-2018年每月的支付總額和當年累計支付總額
- 先找出每年每月的支付金額
- 是按年分組的前提下進行月份升序求和的
partition by 起到分組的作用
order by 按照什么順序進行累加,默認升序,降序desc
select a.year,
a.month,
a.pay_amount,
sum(a.pay_amount) over(partition by a.year order by a.month)
from
(select year(dt) year,
month(dt) month,
sum(pay_amount) pay_amount
from user_trade
where year(dt) in (2017,2018)
group by year(dt),month(dt)) a;

image.png
3.2018年每個月的近三月移動平均支付金額
移動范圍:rows between 2 preceding and current row

image.png
select a.month,
a.pay_amount,
avg(a.pay_amount) over(order by a.month rows between 2 preceding and current row)
from
(select month(dt) month,
sum(pay_amount) pay_amount
from user_trade
where year(dt)='2018'
group by month(dt)) a;

image.png
方法總結(jié)

image.png
拓展:
max(......) over(parttition by ......order by ......rows between ......and......)
min(......) over(parttition by ......order by ......rows between ......and......)
row_number() over(......)
rank() over(......)
dense_rand() over(......)
1.2019年1月,用戶購買商品品類數(shù)量的排名
select user_name,
count(distinct goods_category),
row_number() over(order by count(distinct goods_category)),
rank() over(order by count(distinct goods_category)),
dense_rank() over(order by count(distinct goods_category))
from user_trade
where substr(dt,1,7)='2019-01'
group by user_name;

image.png
row_number:名額是固定的,挑選一定數(shù)量的人
rank:成績排名
dense_number:三種類型的獎牌,同樣分數(shù)獲得
2.選出2019年支付金額排名在第 10、20、30 名的用戶
select a.user_name,
a.pay_amount,
a.rank
from
(select user_name,
sum(pay_amount) pay_amount,
rank() over(order by sum(pay_amount) desc) rank
from user_trade
where year(dt)='2019'
group by user_name) a
where a.rank in (10,20,30);

image.png
分組窗口函數(shù)
ntile(n) over(partition by ... order by ...)
1.將2019年1月的支付用戶,按照支付金額分成5組
select user_name,
sum(pay_amount) pay_amount,
ntile(5) over(order by sum(pay_amount) desc)
from user_trade
where dt between'2019-01-01' and '2019-01-31'
group by user_name;
'''
dt between'2019-01-01' and '2019-01-31'
等價于
substr(dt,1,7)='2019-01'
'''

image.png
2.選出2019年退款金額排名前10%的用戶
select a.user_name,
a.refund_amount,
a.rank
from
(select user_name,
sum(refund_amount) refund_amount,
ntile(10) over(order by sum(refund_amount) desc) rank
from user_refund
where year(dt)='2019'
group by user_name) a
where a.rank=1;

image.png
偏移分析窗口函數(shù)
- 這一次和上一次,今天和明天,這個月和上個月等
lag(exp_str,offset,defval) over(partition by ... order by ...)【向前】
lead(exp_str,offset,defval) over(partition by ... order by ...)【向后】
exp_str:字段名稱
offset:偏移量(默認值是1)
defval:默認值,超出范圍
1.Alice和Alexander的各種時間偏移量
【場景】購買用戶購買的時間間隔分布
select user_name,
dt,
lag(dt,1,dt) over(partition by user_name order by dt),
lag(dt) over(partition by user_name order by dt),
lag(dt,2,dt) over(partition by user_name order by dt),
lag(dt,2) over(partition by user_name order by dt)
from user_trade
where dt>'0'
and user_name in ('Alice','Alexander');

image.png
2.支付時間間隔超過100天的用戶數(shù)【※】
select count(distinct user_name)
from
(select user_name,
dt,
lead(dt) over(partition by user_name order by dt) lead_dt
from user_trade
where dt>'0') a
where a.lead_dt is not null
and datediff(a.lead_dt,a.dt)>100;
'''結(jié)果:180'''
重點練習
1.每個城市,不同性別,2018年支付金額最高的TOP3用戶
【user_trade和user_info表】
'''
1.每個用戶的支付金額匯總
2.每個用戶的城市和性別(直接左連接user_info表)
3.每個城市不同性別的用戶的支付金額排名
4.找出TOP3
'''
select c.city,
c.sex,
c.user_name,
c.pay_amount,
c.rank
from
(select b.city,
b.sex,
a.user_name,
a.pay_amount,
row_number() over(partition by b.city,b.sex order by a.pay_amount desc) rank
from
(select user_name,
sum(pay_amount) pay_amount
from user_trade
where year(dt)='2018'
group by user_name) a
left join user_info b
on a.user_name=b.user_name
-- 這里不要 group by 了,因為在排序中已經(jīng)指定了
) c
where c.rank<=3;

image.png
2.2.每個手機品牌退款金額前25%的用戶【desc千萬不要忘記】【user_refund和user_info表】
ntile(4) over()
'''
自己做錯原因:
1.將user_info 表左連接user_trade表
而應該user_trade表左連接user_info 表
2.可以直接 left join user_info b 表示可以去全部字段
a.user_name=b.user_name
而當left join (select ... from ...) b 時,就一定要寫select user_name,否則就不能
a.user_name=b.user_name
3.get_json_object(extra1,'$.phonebrand') phonebrand
等價于extra2['phonebrand'] phonebrand
'''
'''
1.每個用戶的手機品牌
2.每個用戶的退款金額
'''
select c.user_name,
c.phonebrand,
c.refund_amount,
c.level
from
(select a.user_name,
b.phonebrand,
a.refund_amount,
ntile(4) over(partition by b.phonebrand order by a.refund_amount desc) level
from
(select user_name,
sum(refund_amount) refund_amount
from user_refund
group by user_name) a
left join
(select user_name,
get_json_object(extra1,'$.phonebrand') phonebrand
from user_info) b
on a.user_name=b.user_name) c
where c.level=1;

image.png
3.計算每12個月的用戶累計支付金額
select a.month,
a.pay_amount,
sum(a.pay_amount) over(order by a.month rows between 11 preceding and current row)
from
(select substr(dt,1,7) month,
sum(pay_amount) pay_amount
from user_trade
where dt>'0'
group by substr(dt,1,7)) a;

image.png
4.計算出每4個月的最大退款金額
select a.month,
a.refund_amount,
max(a.refund_amount) over(order by a.month rows between 3 preceding and current row)
from
(select substr(dt,1,7) month,
sum(refund_amount) refund_amount
from user_refund
where dt>'0'
group by substr(dt,1,7)) a;

image.png
5.退款時間間隔最長的用戶
- 用戶的退款時間間隔(天),然后order by 就好,真搞不懂當初為啥要用max()
select a.user_name,
datediff(a.lead_dt,a.dt) date_diff
from
(select user_name,
dt,
lead(dt) over(partition by user_name order by dt) lead_dt
from user_refund
where dt>'0') a
where a.lead_dt is not null;

image.png
- 退款時間間隔最長的用戶【錯誤】:要用max()必須要用 group by
select a.user_name,
max(a.date_diff)
from
(select a.user_name,
datediff(a.lead_dt,a.dt) date_diff
from
(select user_name,
dt,
lead(dt) over(partition by user_name order by dt) lead_dt
from user_refund
where dt>'0') a
where a.lead_dt is not null) a;
- 正確寫法:
select b.user_name,
b.date_diff
from
(select a.user_name,
datediff(a.lead_dt,a.dt) date_diff
from
(select user_name,
dt,
lead(dt) over(partition by user_name order by dt) lead_dt
from user_refund
where dt>'0') a
where a.lead_dt is not null) b
order by b.date_diff desc limit 1;

image.png