題目一

導(dǎo)入excel數(shù)據(jù)至navicat,查看表信息如下:

求相機(jī)活躍用戶數(shù)、次日留存用戶數(shù)、三日留存用戶數(shù)、七日留存用戶數(shù)思路:
1.通過(guò)where篩選相機(jī)
2.通過(guò)groupby分組用戶id、日期,(日期格式需調(diào)整和需求一致)
3.用戶活躍率是前后日對(duì)比所得,所以需要將表命名兩個(gè)不同名字的表,當(dāng)做不同的集合,實(shí)現(xiàn)自連接
4.分組日期,聚合用戶人數(shù),求出每日活躍用戶數(shù)
5.case when 實(shí)現(xiàn)次日留存用戶數(shù)、三日留存用戶數(shù)、七日留存用戶數(shù)
6.case when 實(shí)現(xiàn)次日留存率、三日留存率、七日留存率
SQL查詢代碼
1.通過(guò)where篩選相機(jī)
SELECT * FROM sheet1
where app_name='相機(jī)';
2.通過(guò)groupby分組用戶id、日期(日期格式需調(diào)整和需求一致)
SELECT *, date_format(dayno, '%Y%m%d') as day1
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno

3.用戶活躍率是前后日對(duì)比所得,所以需要將表命名兩個(gè)不同名字的表,當(dāng)做不同的集合,實(shí)現(xiàn)自連接
select *
from
(SELECT uid, date_format(dayno, '%Y%m%d') as day1
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as a
left join
(SELECT uid, date_format(dayno, '%Y%m%d') as day2
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as b
on a.uid =b.uid
where a.day1<b.day2;
4.分組日期,聚合用戶人數(shù),求出每日活躍用戶數(shù)
select day1,count(distinct a.uid) as '活躍用戶數(shù)'
from
(SELECT uid, date_format(dayno, '%Y%m%d') as day1
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as a
left join
(SELECT uid, date_format(dayno, '%Y%m%d') as day2
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as b
on a.uid =b.uid and a.day1<b.day2
group by a.day1 ;

5.case when 實(shí)現(xiàn)次日留存用戶數(shù)、三日留存用戶數(shù)、七日留存用戶數(shù)
select day1,count(distinct a.uid) as '活躍用戶數(shù)',
sum(case when b.day2-a.day1 =1 then 1 else 0 end) as '次日留存用戶數(shù)',
sum(case when b.day2-a.day1 =3 then 1 else 0 end) as '三日留存用戶數(shù)',
sum(case when b.day2-a.day1 =7 then 1 else 0 end) as '七日留存用戶數(shù)'
from
(SELECT uid, date_format(dayno, '%Y%m%d') as day1
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as a
left join
(SELECT uid, date_format(dayno, '%Y%m%d') as day2
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as b
on a.uid =b.uid and a.day1<b.day2
group by a.day1 ;

6.case when 實(shí)現(xiàn)次日留存率、三日留存率、七日留存率
select day1,count(distinct a.uid) as '活躍用戶數(shù)',
sum(case when b.day2-a.day1 =1 then 1 else 0 end) as '次日留存用戶數(shù)',
sum(case when b.day2-a.day1 =3 then 1 else 0 end) as '三日留存用戶數(shù)',
sum(case when b.day2-a.day1 =7 then 1 else 0 end) as '七日留存用戶數(shù)',
concat(round((sum(case when b.day2-a.day1 =1 then 1 else 0 end)/count(distinct a.uid))*100,2),'%') as '次日留存率',
concat(round((sum(case when b.day2-a.day1 =3 then 1 else 0 end)/count(distinct a.uid))*100,2),'%') as '三日留存率',
concat(round((sum(case when b.day2-a.day1 =7 then 1 else 0 end)/count(distinct a.uid))*100,2),'%') as '七日留存率'
from
(SELECT uid, date_format(dayno, '%Y%m%d') as day1
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as a
left join
(SELECT uid, date_format(dayno, '%Y%m%d') as day2
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as b
on a.uid =b.uid and a.day1<b.day2
group by a.day1 ;

題目二

1、分別用Python、sql導(dǎo)入附件(相機(jī).xlsx)的數(shù)據(jù)進(jìn)入自己的Mysql當(dāng)中,要求字段符合、表名符合(可加后綴區(qū)分)。
SQL部分:
直接導(dǎo)入到navicat,根據(jù)導(dǎo)入向?qū)б来蜗乱徊讲僮?,主要操作如下?/p>



Python部分:
導(dǎo)入模塊:
import pandas as pd
import numpy as np
import pymysql
pymysql.install_as_MySQLdb()
from sqlalchemy import create_engine
讀取數(shù)據(jù):
df=pd.read_excel

2、分析pd.to_sql中的if_exist參數(shù)可用哪些,分別是什么,有什么細(xì)節(jié)區(qū)別?認(rèn)為用哪種更好
存儲(chǔ)到數(shù)據(jù)庫(kù)代碼:
engine = create_engine('mysql://root:mysql111@localhost:3306/test')
df.to_sql('sheet1',con = engine,if_exists='replace', index=False)
分別是fail, append, replace。
fail當(dāng)數(shù)據(jù)庫(kù)里存在同名表時(shí),什么也不做。append當(dāng)數(shù)據(jù)庫(kù)里存在同名表時(shí),把數(shù)據(jù)追加在表后面。replace會(huì)刪除原來(lái)的同名表,再新建表后插入數(shù)據(jù)。
具體哪種比較好要看情況,如果你的數(shù)據(jù)每天追加就用append。每天都更新,并且不需要保留之前數(shù)據(jù)就用replace。fail的話,可以避免覆蓋原有表格的失誤發(fā)生
3、總結(jié)MYSQL的date_format、concat的用法,并以此數(shù)據(jù)舉例說(shuō)明。
date_format可以將把一個(gè)日期轉(zhuǎn)換為各種樣式的字符串。格式為date_format(原日期格,新日期格式)
代碼示例與結(jié)果如下:
SELECT *, date_format(dayno, '%Y%m%d') as day1
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno

concat拼接用法,例如上述,留存率需要以百分比表現(xiàn),拼接一個(gè)‘%’的用法如下:
select day1,count(distinct a.uid) as '活躍用戶數(shù)',
round(sum(case when b.day2-a.day1 =1 then 1 else 0 end)/count(distinct a.uid),2) as '次日留存率',
round(sum(case when b.day2-a.day1 =1 then 1 else 0 end)/count(distinct a.uid),2) as '三日留存率',
round(sum(case when b.day2-a.day1 =1 then 1 else 0 end)/count(distinct a.uid),2) as '七日留存率',
concat(round((sum(case when b.day2-a.day1 =1 then 1 else 0 end)/count(distinct a.uid))*100,2),'%') as '次日留存率',
concat(round((sum(case when b.day2-a.day1 =3 then 1 else 0 end)/count(distinct a.uid))*100,2),'%') as '三日留存率',
concat(round((sum(case when b.day2-a.day1 =7 then 1 else 0 end)/count(distinct a.uid))*100,2),'%') as '七日留存率'
from
(SELECT uid, date_format(dayno, '%Y%m%d') as day1
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as a
left join
(SELECT uid, date_format(dayno, '%Y%m%d') as day2
FROM sheet1
where app_name='相機(jī)'
group by uid,dayno) as b
on a.uid =b.uid and a.day1<b.day2
group by a.day1 ;

4、利用MYSQL語(yǔ)句解題,分步驟達(dá)到目的再一次性運(yùn)行,達(dá)到題目要求。
(1)求活躍度計(jì)算
(2)次日留存
(3)延伸 次日、三日、七日留存
解題思路與答案見(jiàn)題目一
5、用Python解題,要求函數(shù)封裝,利用調(diào)用函數(shù)計(jì)算
解題思路與答案借鑒Vayne老哥
作者:vayne233
鏈接:http://www.itdecent.cn/p/8a03ff5d99c8
import datetime
import pandas as pd
import numpy as np
def jisuan():
# 讀取數(shù)據(jù)
df=pd.read_excel(r'C:\Users\admin\Desktop\xiangji.xlsx',sheet_name='Sheet1')
# 通過(guò)處理,得到后續(xù)所有計(jì)算所需的df2表
df=df[df.app_name=='相機(jī)'].drop_duplicates(subset=['dayno','uid'])
df=df[['uid','dayno']]
df2=pd.merge(df,df,on='uid',how='left')
df2=df2[df2.dayno_x<=df2.dayno_y]
df2['difference']=df2.apply(lambda x: x[2]-x[1], axis=1)
# 得到基礎(chǔ)活躍人數(shù)表d1,方便以后做連接
d1=df2.groupby('dayno_x').uid.nunique().reset_index().rename(columns={'dayno_x':'日期','uid':'活躍用戶數(shù)'})
d2=df2[df2['difference']==datetime.timedelta(1)].groupby('dayno_x').uid.count().reset_index().rename(columns={'dayno_x':'日期','uid':'次日留存用戶數(shù)'})
d1=pd.merge(d1,d2,how='left')
d2=df2[df2['difference']==datetime.timedelta(2)].groupby('dayno_x').uid.count().reset_index().rename(columns={'dayno_x':'日期','uid':'三日留存用戶數(shù)'})
d1=pd.merge(d1,d2,how='left')
d2=df2[df2['difference']==datetime.timedelta(6)].groupby('dayno_x').uid.count().reset_index().rename(columns={'dayno_x':'日期','uid':'七日留存用戶數(shù)'})
# 填充缺失值
d1=pd.merge(d1,d2,how='left').fillna(0)
# 浮點(diǎn)數(shù)轉(zhuǎn)為整數(shù)
d1[['次日留存用戶數(shù)','三日留存用戶數(shù)','七日留存用戶數(shù)']]=d1[['次日留存用戶數(shù)','三日留存用戶數(shù)','七日留存用戶數(shù)']].astype(int)
d1['次日留存率']=d1['次日留存用戶數(shù)']/d1['活躍用戶數(shù)']
d1['三日留存率']=d1['三日留存用戶數(shù)']/d1['活躍用戶數(shù)']
d1['七日留存率']=d1['七日留存用戶數(shù)']/d1['活躍用戶數(shù)']
# 浮點(diǎn)數(shù)轉(zhuǎn)為百分比
d1['次日留存率']=d1['次日留存率'].apply(lambda x: format(x,'.2%'))
d1['三日留存率']=d1['三日留存率'].apply(lambda x: format(x,'.2%'))
d1['七日留存率']=d1['七日留存率'].apply(lambda x: format(x,'.2%'))
return d1
