HIVE大數(shù)據(jù)實(shí)戰(zhàn)項(xiàng)目---用戶行為分析

相關(guān)精彩專題鏈接: 數(shù)據(jù)成就更好的你

一、項(xiàng)目需求
本案例的數(shù)據(jù)為小程序運(yùn)營(yíng)數(shù)據(jù),以行業(yè)常見(jiàn)指標(biāo)對(duì)用戶行為進(jìn)行分析,包括UV、PV、新增用戶分析、留存分析、復(fù)購(gòu)分析等內(nèi)容。

項(xiàng)目需求如下:
1.日訪問(wèn)量分析,并觀察其走勢(shì)
2.不同行為類型的訪問(wèn)量分析
3.一天中不同時(shí)間段的訪問(wèn)量分析(時(shí)間段按小時(shí)劃分)
4.每日新增用戶情況分析
5.用戶留存分析
6.復(fù)購(gòu)分析
7.商品排行榜分析
8.利用sqoop將數(shù)據(jù)分析結(jié)果導(dǎo)入mysql存儲(chǔ)

二、數(shù)據(jù)介紹
1.用戶行為信息表


image.png

2.查看具體的數(shù)據(jù)格式
a.用戶信息:head -n 3 behavior.txt


image.png

b.去除首行,首行為標(biāo)題行,hive導(dǎo)入數(shù)據(jù)時(shí)不需要此行:
sed -i "1d" behavior.txt

三、創(chuàng)建表
創(chuàng)建用戶行為表(需結(jié)合數(shù)據(jù)格式)

#創(chuàng)建用戶行為表
create table if not exists  behavior(
user_id int comment "用戶id",
goods_id int comment "商品id",
cat int comment "商品類別id",
behavior string comment "用戶行為",
time1 Timestamp  comment "訪問(wèn)時(shí)間",
addr string comment "地域",
price double comment "單價(jià)",
amount double  comment "數(shù)量")
row format delimited 
fields terminated by "\t"
lines terminated by "\n";

#向表中導(dǎo)入數(shù)據(jù)
load data local inpath "/home/software/hive/data/behavior.txt" into table behavior;

四、用戶行為分析:pv/uv
1.日訪問(wèn)量分析,并觀察其走勢(shì)

 select date(time1) date1,count(1) pv, count(distinct(user_id)) uv from behavior group by date(time1);
image.png

2.不同行為類型的訪問(wèn)量分析

select 
    date(time1) date1,
    sum(if(behavior="buy",1,0)) buy_pv, count(distinct(if(behavior="buy",user_id,null))) buy_uv,
    sum(if(behavior="cart",1,0)) cart_pv, count(distinct(if(behavior="cart",user_id,null))) cart_uv,
    sum(if(behavior="fav",1,0)) fav_pv, count(distinct(if(behavior="fav",user_id,null))) fav_uv,
    sum(if(behavior="pv",1,0)) view_pv, count(distinct(if(behavior="pv",user_id,null))) view_uv
from behavior group by date(time1);
image.png

3.一天中不同時(shí)間段的訪問(wèn)量分析(時(shí)間段按小時(shí)劃分)

select date(time1) date1,hour(time1) hour1,count(1) pv,count(distinct(user_id)) uv from behavior group by date(time1),hour(time1);
image.png

五、獲客分析
獲客分析:觀察每日新增用戶情況。新用戶的定義:第一次訪問(wèn)網(wǎng)站

select a.date1,count(1) new_visitor 
from (select user_id,min(date(time1)) date1 from behavior group by user_id)a
group by a.date1;
image.png

六、用戶留存分析
留存定義:
1月1日,新增用戶200人;
次日留存:第2天,1月2日,這200人里面有100人活躍,則次日留存率為:100 / 200 = 50%
2日留存:第3天,1月3日,這200名新增用戶里面有80人活躍, 第3日新增留存率為:80/200 = 40%; 以此類推

#計(jì)算次日留存和3日留存
select 
    a.date1,
    count(distinct(a.user_id)) new_user_num,
    count(distinct(if(abs(datediff(a.date1,date(b.time1)))=1,b.user_id,null))) retention_one_num,
    concat(round(count(distinct(if(abs(datediff(a.date1,date(b.time1)))=1,b.user_id,null)))*100/count(distinct(a.user_id)),2),"%") retention_one_rate,
    count(distinct(if(abs(datediff(a.date1,date(b.time1)))=3,b.user_id,null))) retention_three_num,
    concat(round(count(distinct(if(abs(datediff(a.date1,date(b.time1)))=3,b.user_id,null)))*100/count(distinct(a.user_id)),2),"%") retention_three_rate
from 
   (select user_id,min(date(time1)) date1 from behavior group by user_id)a
left join behavior b
on a.user_id=b.user_id
group by a.date1

留存分析結(jié)果如下:
例:2019-11-28日的新增7610個(gè)用戶,次日這些新增用戶有6026個(gè)再次訪問(wèn)網(wǎng)頁(yè),留存率為79.19%,第4天,有5980個(gè)用戶再次訪問(wèn),留存率為78.58%


image.png

七、復(fù)購(gòu)分析
指在單位時(shí)間段內(nèi),重復(fù)購(gòu)買(mǎi)率=再次購(gòu)買(mǎi)人數(shù)/總購(gòu)買(mǎi)人數(shù)。
例如在一個(gè)月內(nèi),有100個(gè)客戶成交,其中有20個(gè)是回頭客,則重復(fù)購(gòu)買(mǎi)率為20%。
此處的回頭客定義為:按天去重,即一個(gè)客戶一天產(chǎn)生多筆交易付款,則算一次購(gòu)買(mǎi),除非在統(tǒng)計(jì)周期內(nèi)另外一天也有購(gòu)買(mǎi)的客戶才是回頭客。

1.用戶的購(gòu)買(mǎi)次數(shù)統(tǒng)計(jì)

create table user_buy as
select m.user_id as user_id,count(1) as num
from
   (select user_id,date(time1) date1 from behavior where behavior="buy" group by user_id,date(time1))m
group by m.user_id

2.復(fù)購(gòu)率計(jì)算

select concat(round(count(if(num>1,user_id,null))*100/count(1),2),"%") rebuy_rate from user_buy
#本案列的復(fù)購(gòu)率為42.06%
image.png

八、商品排行榜信息
1.商品的銷售數(shù)量top10,排名需考慮并列排名的情況

select * from
   (select a.goods_id as goods_id,a.sale_amount as sale_amount,dense_rank() over(order by a.sale_amount desc ) as rank1
   from 
      (select goods_id,sum(amount) sale_amount from behavior where behavior="buy" group by goods_id)a
   )b
where b.rank1<=10;
image.png

2.商品的瀏覽次數(shù)top10,排名需考慮并列排名的情況

select * from
    (select a.goods_id as goods_id,a.pv as pv,dense_rank() over(order by a.pv desc ) as rank1
    from 
       (select goods_id,count(1) pv from behavior where behavior="pv" group by goods_id)a
    )b
where b.rank1<=10;
image.png

3.商品的收藏次數(shù)top10,排名需考慮并列排名的情況

select * from
    (select a.goods_id as goods_id,a.fav as fav,dense_rank() over(order by a.fav desc ) as rank1
    from 
       (select goods_id,count(1) fav from behavior where behavior="fav" group by goods_id)a
    )b
where b.rank1<=10;
image.png

4.城市購(gòu)買(mǎi)力排名

create table city_rank as
select addr as city ,round(sum(price*amount),2) as money from behavior group by addr 
order by money desc;

select * from city_rank;
image.png

九、利用sqoop將數(shù)據(jù)分析結(jié)果導(dǎo)入mysql存儲(chǔ)

1.在mysql創(chuàng)建一張表,字段類型、順序都和hive中的表一樣

create table city_rank (city varchar(20), money double);

2.測(cè)試sqoop連接mysql是否成功

sqoop list-tables --connect jdbc:mysql://localhost:3306/ljh --username root --password root
#結(jié)果可以看到ljh數(shù)據(jù)庫(kù)下的表信息,所以連接成功
image.png

3.利用sqoop將數(shù)據(jù)分析結(jié)果導(dǎo)入mysql存儲(chǔ)

sqoop export --connect jdbc:mysql://localhost:3306/ljh --username root --password root --table city_rank --fields-terminated-by '\001' --export-dir '/user/hive/warehouse/ljh.db/city_rank';

4.mysql中查詢導(dǎo)入結(jié)果,看結(jié)果是否正確


image.png
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。
禁止轉(zhuǎn)載,如需轉(zhuǎn)載請(qǐng)通過(guò)簡(jiǎn)信或評(píng)論聯(lián)系作者。

友情鏈接更多精彩內(nèi)容