相關(guān)精彩專題鏈接: 數(shù)據(jù)成就更好的你
一、項(xiàng)目需求
本案例的數(shù)據(jù)為小程序運(yùn)營(yíng)數(shù)據(jù),以行業(yè)常見(jiàn)指標(biāo)對(duì)用戶行為進(jìn)行分析,包括UV、PV、新增用戶分析、留存分析、復(fù)購(gòu)分析等內(nèi)容。
項(xiàng)目需求如下:
1.日訪問(wèn)量分析,并觀察其走勢(shì)
2.不同行為類型的訪問(wèn)量分析
3.一天中不同時(shí)間段的訪問(wèn)量分析(時(shí)間段按小時(shí)劃分)
4.每日新增用戶情況分析
5.用戶留存分析
6.復(fù)購(gòu)分析
7.商品排行榜分析
8.利用sqoop將數(shù)據(jù)分析結(jié)果導(dǎo)入mysql存儲(chǔ)
二、數(shù)據(jù)介紹
1.用戶行為信息表

2.查看具體的數(shù)據(jù)格式
a.用戶信息:head -n 3 behavior.txt

b.去除首行,首行為標(biāo)題行,hive導(dǎo)入數(shù)據(jù)時(shí)不需要此行:
sed -i "1d" behavior.txt
三、創(chuàng)建表
創(chuàng)建用戶行為表(需結(jié)合數(shù)據(jù)格式)
#創(chuàng)建用戶行為表
create table if not exists behavior(
user_id int comment "用戶id",
goods_id int comment "商品id",
cat int comment "商品類別id",
behavior string comment "用戶行為",
time1 Timestamp comment "訪問(wèn)時(shí)間",
addr string comment "地域",
price double comment "單價(jià)",
amount double comment "數(shù)量")
row format delimited
fields terminated by "\t"
lines terminated by "\n";
#向表中導(dǎo)入數(shù)據(jù)
load data local inpath "/home/software/hive/data/behavior.txt" into table behavior;
四、用戶行為分析:pv/uv
1.日訪問(wèn)量分析,并觀察其走勢(shì)
select date(time1) date1,count(1) pv, count(distinct(user_id)) uv from behavior group by date(time1);

2.不同行為類型的訪問(wèn)量分析
select
date(time1) date1,
sum(if(behavior="buy",1,0)) buy_pv, count(distinct(if(behavior="buy",user_id,null))) buy_uv,
sum(if(behavior="cart",1,0)) cart_pv, count(distinct(if(behavior="cart",user_id,null))) cart_uv,
sum(if(behavior="fav",1,0)) fav_pv, count(distinct(if(behavior="fav",user_id,null))) fav_uv,
sum(if(behavior="pv",1,0)) view_pv, count(distinct(if(behavior="pv",user_id,null))) view_uv
from behavior group by date(time1);

3.一天中不同時(shí)間段的訪問(wèn)量分析(時(shí)間段按小時(shí)劃分)
select date(time1) date1,hour(time1) hour1,count(1) pv,count(distinct(user_id)) uv from behavior group by date(time1),hour(time1);

五、獲客分析
獲客分析:觀察每日新增用戶情況。新用戶的定義:第一次訪問(wèn)網(wǎng)站
select a.date1,count(1) new_visitor
from (select user_id,min(date(time1)) date1 from behavior group by user_id)a
group by a.date1;

六、用戶留存分析
留存定義:
1月1日,新增用戶200人;
次日留存:第2天,1月2日,這200人里面有100人活躍,則次日留存率為:100 / 200 = 50%
2日留存:第3天,1月3日,這200名新增用戶里面有80人活躍, 第3日新增留存率為:80/200 = 40%; 以此類推
#計(jì)算次日留存和3日留存
select
a.date1,
count(distinct(a.user_id)) new_user_num,
count(distinct(if(abs(datediff(a.date1,date(b.time1)))=1,b.user_id,null))) retention_one_num,
concat(round(count(distinct(if(abs(datediff(a.date1,date(b.time1)))=1,b.user_id,null)))*100/count(distinct(a.user_id)),2),"%") retention_one_rate,
count(distinct(if(abs(datediff(a.date1,date(b.time1)))=3,b.user_id,null))) retention_three_num,
concat(round(count(distinct(if(abs(datediff(a.date1,date(b.time1)))=3,b.user_id,null)))*100/count(distinct(a.user_id)),2),"%") retention_three_rate
from
(select user_id,min(date(time1)) date1 from behavior group by user_id)a
left join behavior b
on a.user_id=b.user_id
group by a.date1
留存分析結(jié)果如下:
例:2019-11-28日的新增7610個(gè)用戶,次日這些新增用戶有6026個(gè)再次訪問(wèn)網(wǎng)頁(yè),留存率為79.19%,第4天,有5980個(gè)用戶再次訪問(wèn),留存率為78.58%

七、復(fù)購(gòu)分析
指在單位時(shí)間段內(nèi),重復(fù)購(gòu)買(mǎi)率=再次購(gòu)買(mǎi)人數(shù)/總購(gòu)買(mǎi)人數(shù)。
例如在一個(gè)月內(nèi),有100個(gè)客戶成交,其中有20個(gè)是回頭客,則重復(fù)購(gòu)買(mǎi)率為20%。
此處的回頭客定義為:按天去重,即一個(gè)客戶一天產(chǎn)生多筆交易付款,則算一次購(gòu)買(mǎi),除非在統(tǒng)計(jì)周期內(nèi)另外一天也有購(gòu)買(mǎi)的客戶才是回頭客。
1.用戶的購(gòu)買(mǎi)次數(shù)統(tǒng)計(jì)
create table user_buy as
select m.user_id as user_id,count(1) as num
from
(select user_id,date(time1) date1 from behavior where behavior="buy" group by user_id,date(time1))m
group by m.user_id
2.復(fù)購(gòu)率計(jì)算
select concat(round(count(if(num>1,user_id,null))*100/count(1),2),"%") rebuy_rate from user_buy
#本案列的復(fù)購(gòu)率為42.06%

八、商品排行榜信息
1.商品的銷售數(shù)量top10,排名需考慮并列排名的情況
select * from
(select a.goods_id as goods_id,a.sale_amount as sale_amount,dense_rank() over(order by a.sale_amount desc ) as rank1
from
(select goods_id,sum(amount) sale_amount from behavior where behavior="buy" group by goods_id)a
)b
where b.rank1<=10;

2.商品的瀏覽次數(shù)top10,排名需考慮并列排名的情況
select * from
(select a.goods_id as goods_id,a.pv as pv,dense_rank() over(order by a.pv desc ) as rank1
from
(select goods_id,count(1) pv from behavior where behavior="pv" group by goods_id)a
)b
where b.rank1<=10;

3.商品的收藏次數(shù)top10,排名需考慮并列排名的情況
select * from
(select a.goods_id as goods_id,a.fav as fav,dense_rank() over(order by a.fav desc ) as rank1
from
(select goods_id,count(1) fav from behavior where behavior="fav" group by goods_id)a
)b
where b.rank1<=10;

4.城市購(gòu)買(mǎi)力排名
create table city_rank as
select addr as city ,round(sum(price*amount),2) as money from behavior group by addr
order by money desc;
select * from city_rank;

九、利用sqoop將數(shù)據(jù)分析結(jié)果導(dǎo)入mysql存儲(chǔ)
1.在mysql創(chuàng)建一張表,字段類型、順序都和hive中的表一樣
create table city_rank (city varchar(20), money double);
2.測(cè)試sqoop連接mysql是否成功
sqoop list-tables --connect jdbc:mysql://localhost:3306/ljh --username root --password root
#結(jié)果可以看到ljh數(shù)據(jù)庫(kù)下的表信息,所以連接成功

3.利用sqoop將數(shù)據(jù)分析結(jié)果導(dǎo)入mysql存儲(chǔ)
sqoop export --connect jdbc:mysql://localhost:3306/ljh --username root --password root --table city_rank --fields-terminated-by '\001' --export-dir '/user/hive/warehouse/ljh.db/city_rank';
4.mysql中查詢導(dǎo)入結(jié)果,看結(jié)果是否正確
