五期題目解答

題目一

image.png

導(dǎo)入excel數(shù)據(jù)至navicat,查看表信息如下:


image.png

求相機(jī)活躍用戶數(shù)、次日留存用戶數(shù)、三日留存用戶數(shù)、七日留存用戶數(shù)思路:
1.通過(guò)where篩選相機(jī)
2.通過(guò)groupby分組用戶id、日期,(日期格式需調(diào)整和需求一致)
3.用戶活躍率是前后日對(duì)比所得,所以需要將表命名兩個(gè)不同名字的表,當(dāng)做不同的集合,實(shí)現(xiàn)自連接
4.分組日期,聚合用戶人數(shù),求出每日活躍用戶數(shù)
5.case when 實(shí)現(xiàn)次日留存用戶數(shù)、三日留存用戶數(shù)、七日留存用戶數(shù)
6.case when 實(shí)現(xiàn)次日留存率、三日留存率、七日留存率

SQL查詢代碼
1.通過(guò)where篩選相機(jī)
SELECT * FROM sheet1
where app_name='相機(jī)';
2.通過(guò)groupby分組用戶id、日期(日期格式需調(diào)整和需求一致)
SELECT *, date_format(dayno, '%Y%m%d') as day1
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno
image.png
3.用戶活躍率是前后日對(duì)比所得,所以需要將表命名兩個(gè)不同名字的表,當(dāng)做不同的集合,實(shí)現(xiàn)自連接
select *
from 
(SELECT uid, date_format(dayno, '%Y%m%d') as day1
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as a 

left join 

(SELECT uid, date_format(dayno, '%Y%m%d') as day2
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as b

on a.uid =b.uid 
where a.day1<b.day2;
4.分組日期,聚合用戶人數(shù),求出每日活躍用戶數(shù)
select day1,count(distinct a.uid) as '活躍用戶數(shù)'
from 
(SELECT uid, date_format(dayno, '%Y%m%d') as day1
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as a 

left join 

(SELECT uid, date_format(dayno, '%Y%m%d') as day2
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as b

on a.uid =b.uid and a.day1<b.day2

group by a.day1 ;
image.png
5.case when 實(shí)現(xiàn)次日留存用戶數(shù)、三日留存用戶數(shù)、七日留存用戶數(shù)
select day1,count(distinct a.uid) as '活躍用戶數(shù)',
sum(case when b.day2-a.day1 =1 then 1 else 0 end) as '次日留存用戶數(shù)',
sum(case when b.day2-a.day1 =3 then 1 else 0 end) as '三日留存用戶數(shù)',
sum(case when b.day2-a.day1 =7 then 1 else 0 end) as '七日留存用戶數(shù)'
from 
(SELECT uid, date_format(dayno, '%Y%m%d') as day1
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as a 

left join 

(SELECT uid, date_format(dayno, '%Y%m%d') as day2
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as b

on a.uid =b.uid and a.day1<b.day2

group by a.day1 ;
image.png
6.case when 實(shí)現(xiàn)次日留存率、三日留存率、七日留存率
select day1,count(distinct a.uid) as '活躍用戶數(shù)',
sum(case when b.day2-a.day1 =1 then 1 else 0 end) as '次日留存用戶數(shù)',
sum(case when b.day2-a.day1 =3 then 1 else 0 end) as '三日留存用戶數(shù)',
sum(case when b.day2-a.day1 =7 then 1 else 0 end) as '七日留存用戶數(shù)',
concat(round((sum(case when b.day2-a.day1 =1 then 1 else 0 end)/count(distinct a.uid))*100,2),'%') as '次日留存率',
concat(round((sum(case when b.day2-a.day1 =3 then 1 else 0 end)/count(distinct a.uid))*100,2),'%') as '三日留存率',
concat(round((sum(case when b.day2-a.day1 =7 then 1 else 0 end)/count(distinct a.uid))*100,2),'%') as '七日留存率'
from 
(SELECT uid, date_format(dayno, '%Y%m%d') as day1
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as a 

left join 

(SELECT uid, date_format(dayno, '%Y%m%d') as day2
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as b

on a.uid =b.uid and a.day1<b.day2

group by a.day1 ;
image.png

題目二

image.png
1、分別用Python、sql導(dǎo)入附件(相機(jī).xlsx)的數(shù)據(jù)進(jìn)入自己的Mysql當(dāng)中,要求字段符合、表名符合(可加后綴區(qū)分)。
SQL部分:

直接導(dǎo)入到navicat,根據(jù)導(dǎo)入向?qū)б来蜗乱徊讲僮?,主要操作如下?/p>

image.png

image.png

image.png
Python部分:

導(dǎo)入模塊:

import pandas as pd
import numpy as np
import pymysql
pymysql.install_as_MySQLdb()
from sqlalchemy import create_engine

讀取數(shù)據(jù):
df=pd.read_excel


image.png
2、分析pd.to_sql中的if_exist參數(shù)可用哪些,分別是什么,有什么細(xì)節(jié)區(qū)別?認(rèn)為用哪種更好
存儲(chǔ)到數(shù)據(jù)庫(kù)代碼:
engine = create_engine('mysql://root:mysql111@localhost:3306/test')
df.to_sql('sheet1',con = engine,if_exists='replace', index=False)

分別是fail, append, replace。

fail當(dāng)數(shù)據(jù)庫(kù)里存在同名表時(shí),什么也不做。append當(dāng)數(shù)據(jù)庫(kù)里存在同名表時(shí),把數(shù)據(jù)追加在表后面。replace會(huì)刪除原來(lái)的同名表,再新建表后插入數(shù)據(jù)。
具體哪種比較好要看情況,如果你的數(shù)據(jù)每天追加就用append。每天都更新,并且不需要保留之前數(shù)據(jù)就用replace。fail的話,可以避免覆蓋原有表格的失誤發(fā)生

3、總結(jié)MYSQL的date_format、concat的用法,并以此數(shù)據(jù)舉例說(shuō)明。

date_format可以將把一個(gè)日期轉(zhuǎn)換為各種樣式的字符串。格式為date_format(原日期格,新日期格式)
代碼示例與結(jié)果如下:

SELECT *, date_format(dayno, '%Y%m%d') as day1
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno
image.png

concat拼接用法,例如上述,留存率需要以百分比表現(xiàn),拼接一個(gè)‘%’的用法如下:

select day1,count(distinct a.uid) as '活躍用戶數(shù)',
round(sum(case when b.day2-a.day1 =1 then 1 else 0 end)/count(distinct a.uid),2) as '次日留存率',
round(sum(case when b.day2-a.day1 =1 then 1 else 0 end)/count(distinct a.uid),2) as '三日留存率',
round(sum(case when b.day2-a.day1 =1 then 1 else 0 end)/count(distinct a.uid),2) as '七日留存率',
concat(round((sum(case when b.day2-a.day1 =1 then 1 else 0 end)/count(distinct a.uid))*100,2),'%') as '次日留存率',
concat(round((sum(case when b.day2-a.day1 =3 then 1 else 0 end)/count(distinct a.uid))*100,2),'%') as '三日留存率',
concat(round((sum(case when b.day2-a.day1 =7 then 1 else 0 end)/count(distinct a.uid))*100,2),'%') as '七日留存率'
from 
(SELECT uid, date_format(dayno, '%Y%m%d') as day1
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as a 

left join 

(SELECT uid, date_format(dayno, '%Y%m%d') as day2
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as b

on a.uid =b.uid and a.day1<b.day2

group by a.day1 ;
image.png
4、利用MYSQL語(yǔ)句解題,分步驟達(dá)到目的再一次性運(yùn)行,達(dá)到題目要求。

(1)求活躍度計(jì)算
(2)次日留存
(3)延伸 次日、三日、七日留存
解題思路與答案見(jiàn)題目一

5、用Python解題,要求函數(shù)封裝,利用調(diào)用函數(shù)計(jì)算

解題思路與答案借鑒Vayne老哥
作者:vayne233
鏈接:http://www.itdecent.cn/p/8a03ff5d99c8

import datetime
import pandas as pd
import numpy as np

def jisuan():
    # 讀取數(shù)據(jù)
    df=pd.read_excel(r'C:\Users\admin\Desktop\xiangji.xlsx',sheet_name='Sheet1')
    # 通過(guò)處理,得到后續(xù)所有計(jì)算所需的df2表
    df=df[df.app_name=='相機(jī)'].drop_duplicates(subset=['dayno','uid'])
    df=df[['uid','dayno']]
    df2=pd.merge(df,df,on='uid',how='left')
    df2=df2[df2.dayno_x<=df2.dayno_y]
    df2['difference']=df2.apply(lambda x: x[2]-x[1], axis=1)
    # 得到基礎(chǔ)活躍人數(shù)表d1,方便以后做連接
    d1=df2.groupby('dayno_x').uid.nunique().reset_index().rename(columns={'dayno_x':'日期','uid':'活躍用戶數(shù)'})
    d2=df2[df2['difference']==datetime.timedelta(1)].groupby('dayno_x').uid.count().reset_index().rename(columns={'dayno_x':'日期','uid':'次日留存用戶數(shù)'})
    d1=pd.merge(d1,d2,how='left')
    d2=df2[df2['difference']==datetime.timedelta(2)].groupby('dayno_x').uid.count().reset_index().rename(columns={'dayno_x':'日期','uid':'三日留存用戶數(shù)'})
    d1=pd.merge(d1,d2,how='left')
    d2=df2[df2['difference']==datetime.timedelta(6)].groupby('dayno_x').uid.count().reset_index().rename(columns={'dayno_x':'日期','uid':'七日留存用戶數(shù)'})
    # 填充缺失值
    d1=pd.merge(d1,d2,how='left').fillna(0)
    # 浮點(diǎn)數(shù)轉(zhuǎn)為整數(shù)
    d1[['次日留存用戶數(shù)','三日留存用戶數(shù)','七日留存用戶數(shù)']]=d1[['次日留存用戶數(shù)','三日留存用戶數(shù)','七日留存用戶數(shù)']].astype(int)
    d1['次日留存率']=d1['次日留存用戶數(shù)']/d1['活躍用戶數(shù)']
    d1['三日留存率']=d1['三日留存用戶數(shù)']/d1['活躍用戶數(shù)']
    d1['七日留存率']=d1['七日留存用戶數(shù)']/d1['活躍用戶數(shù)']
    # 浮點(diǎn)數(shù)轉(zhuǎn)為百分比
    d1['次日留存率']=d1['次日留存率'].apply(lambda x: format(x,'.2%'))
    d1['三日留存率']=d1['三日留存率'].apply(lambda x: format(x,'.2%'))
    d1['七日留存率']=d1['七日留存率'].apply(lambda x: format(x,'.2%'))
    return d1
image.png
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容