Adventure Works Cycles是AdventureWorks樣本數(shù)據(jù)庫(kù)所虛構(gòu)的公司,這是一家大型跨國(guó)制造公司。本次分析數(shù)據(jù)庫(kù)使用datafrog提供的線(xiàn)上數(shù)據(jù)庫(kù),模擬真實(shí)業(yè)務(wù)場(chǎng)景。
公司背景
Adventure Works Cycle是國(guó)內(nèi)一家制造公司,該公司生產(chǎn)和銷(xiāo)售金屬和復(fù)合材料自行車(chē)在全國(guó)各個(gè)市場(chǎng)。銷(xiāo)售方式主要有兩種,前期主要是分銷(xiāo)商模式,但是2018年公司實(shí)現(xiàn)財(cái)政收入目標(biāo)后,2019就開(kāi)始通過(guò)公司自有網(wǎng)站獲取線(xiàn)上商戶(hù)進(jìn)一步擴(kuò)大市場(chǎng)。
項(xiàng)目背景
2019年12月5日,線(xiàn)上業(yè)務(wù)經(jīng)理需要向公司CEO匯報(bào)2019年11月自行車(chē)銷(xiāo)售情況,所以數(shù)據(jù)部門(mén)要提供11月份線(xiàn)上自行業(yè)務(wù)數(shù)據(jù)分析報(bào)告。
分析目的與思路
此報(bào)告目的在于匯報(bào)2019年11月綜合銷(xiāo)售情況,因此主要從如下5個(gè)角度展開(kāi)分析。
1.整體銷(xiāo)售表現(xiàn):分析2019.1—2019.11自行車(chē)整體銷(xiāo)售表現(xiàn)
2.地域的角度:分析11月每個(gè)區(qū)域銷(xiāo)售量表現(xiàn)、11月TOP10城市銷(xiāo)售量表現(xiàn)
3.產(chǎn)品類(lèi)別的角度:分析11月各類(lèi)別產(chǎn)品銷(xiāo)售量表現(xiàn)、11月細(xì)分產(chǎn)品銷(xiāo)售量表現(xiàn)
4.熱銷(xiāo)產(chǎn)品角度:分析11月TOP10產(chǎn)品銷(xiāo)量榜、11月TOP10銷(xiāo)量增速榜
5.用戶(hù)行為分析:分析11月用戶(hù)年齡分布及每個(gè)年齡段產(chǎn)品購(gòu)買(mǎi)喜好、11月男女用戶(hù)數(shù)量分布及男女消費(fèi)偏好
此次分析分2個(gè)步驟:數(shù)據(jù)預(yù)處理(python)+可視化(powerbi),本文主要描述數(shù)據(jù)預(yù)處理的步驟,可視化報(bào)告后續(xù)進(jìn)行更新
導(dǎo)入模塊
#導(dǎo)入模塊
import pandas as pd
import numpy as np
import pymysql
pymysql.install_as_MySQLdb()
from sqlalchemy import create_engine
from datetime import datetime #使用datetime庫(kù)
pd.set_option('display.float_format', lambda x: '%.6f' % x)#不顯示科學(xué)計(jì)數(shù)法,將顯示方法變?yōu)閿?shù)字
一、整體銷(xiāo)售表現(xiàn)
1.1、從數(shù)據(jù)庫(kù)讀取源數(shù)據(jù):dw_customer_order
#讀取源數(shù)據(jù)。不同城市,每天產(chǎn)品銷(xiāo)售信息
#創(chuàng)建數(shù)據(jù)庫(kù)引擎
engine = create_engine('mysql://frogdata001:密碼保密@106.13.128.83/adventure_ods?charset=gbk')
sql='select * from dw_customer_order'
gather_customer_order=pd.read_sql_query(sql,con = engine)
#查看源數(shù)據(jù)前5行,觀(guān)察數(shù)據(jù),判斷數(shù)據(jù)是否正常識(shí)別
gather_customer_order.head()

gather_customer_order.info()#查看數(shù)據(jù)源類(lèi)型

原始數(shù)據(jù)共有473050個(gè)字段,字段解釋如下:
create_date 訂單日期
product_name 產(chǎn)品名
cpzl_zw 產(chǎn)品子類(lèi)
cplb_zw 產(chǎn)品類(lèi)別
order_num 產(chǎn)品銷(xiāo)售數(shù)量
customer_num 購(gòu)買(mǎi)客戶(hù)數(shù)
sum_amount 產(chǎn)品銷(xiāo)售金額
is_current_year 是否當(dāng)前年(1:是,0:否)
is_last_year 是否上一年(1:是,0:否)
is_yesterday 是否昨天(1:是,0:否)
is_today 是否今天(1:是,0:否)
is_current_month 是否當(dāng)前余額(1:是,0:否)
is_current_quarter 是否當(dāng)前季度(1:是,0:否)
chinese_province 所在省份
chinese_city 所在城市
chinese_territory 所在區(qū)域
增加create_year_month月份字段。按月維度分析時(shí)使用
gather_customer_order['create_year_month']=gather_customer_order.create_date.apply(lambda x:x.strftime('%Y-%m'))
篩選產(chǎn)品類(lèi)別為自行車(chē)的數(shù)據(jù)
gather_customer_order = gather_customer_order[gather_customer_order['cplb_zw']=='自行車(chē)']
gather_customer_order

1.2 自行車(chē)整體銷(xiāo)售量表現(xiàn)
每月訂單數(shù)量和銷(xiāo)售金額
overall_sales_performance = gather_customer_order.groupby('create_year_month').agg({'order_num':'sum','sum_amount':'sum'}).reset_index()
#每月訂單數(shù)量和銷(xiāo)售金額
overall_sales_performance.head()
#按日期降序排序,方便計(jì)算環(huán)比

求環(huán)比:新增一列order_num_diff,此為每月自行車(chē)銷(xiāo)售訂單量環(huán)比,本月與上月對(duì)比
overall_sales_performance=overall_sales_performance.sort_values('create_year_month',ascending=True)
#銷(xiāo)量環(huán)比
overall_sales_performance['order_num_diff']=(overall_sales_performance.order_num.diff().fillna(0)/overall_sales_performance.order_num.shift()).fillna(0)
#銷(xiāo)售金額環(huán)比
overall_sales_performance['sum_amount_diff']=(overall_sales_performance.sum_amount.diff()/overall_sales_performance.sum_amount.shift()).fillna(0)
overall_sales_performance

將最終的overall_sales_performance的DataFrame存入Mysql的當(dāng)中,后續(xù)使用powerbi讀取
字段注釋?zhuān)?/strong>
create_year_month:時(shí)間,order_num:本月累計(jì)銷(xiāo)售數(shù)量,sum_amount:本月累計(jì)銷(xiāo)售金額,order_diff:本月銷(xiāo)售數(shù)量環(huán)比,sum_amount_diff:本月銷(xiāo)售金額環(huán)比,dw_customer_order:用戶(hù)訂單表
表名:pt_overall_sale_performance_1_sam
#將數(shù)據(jù)存入數(shù)據(jù)庫(kù)
engine = create_engine('mysql://frogdata05:密碼保密@106.15.121.232:3306/datafrog05_adventure?charset=gbk')
overall_sales_performance.to_sql('pt_overall_sale_performance_1_sam',con=engine,index=False,if_exists='append')
二、2019年11月自行車(chē)地域銷(xiāo)售表現(xiàn)
2.1、數(shù)據(jù)清洗篩選10月11月數(shù)據(jù)
#gather_customer_order在分析自行車(chē)整體表現(xiàn)時(shí)已從數(shù)據(jù)庫(kù)導(dǎo)入表(dw_customer_order),并篩選僅自行車(chē)數(shù)據(jù),這里不再導(dǎo)入
gather_customer_order.head()


上面是一張表,太長(zhǎng)了,分2段截圖。
篩選10月11月自行車(chē)數(shù)據(jù)
#篩選10月11月自行車(chē)數(shù)據(jù)
gather_customer_order_10_11 = gather_customer_order[gather_customer_order['create_year_month'].isin(['2019-10','2019-11'])]
len(gather_customer_order_10_11)#10月11月自行車(chē)訂單數(shù)據(jù)共6266條
2.2、2019年11月自行車(chē)區(qū)域銷(xiāo)售量表現(xiàn)
由于需要求環(huán)比,因此也需要關(guān)注10月的銷(xiāo)售情況。
10月、11月銷(xiāo)售金額總和
#按照區(qū)域、月分組,訂單量求和,銷(xiāo)售金額求和
gather_customer_order_10_11_group = gather_customer_order_10_11.groupby(['chinese_territory','create_year_month'])[['order_num','sum_amount']].sum().reset_index()
gather_customer_order_10_11_group.head()

10月11月的環(huán)比
#將區(qū)域存為列表
region_list=list(gather_customer_order_10_11.chinese_territory.unique())
region_list
#pct_change()當(dāng)前元素與先前元素的相差百分比,求不同區(qū)域10月11月環(huán)比
order_x=pd.Series([])
amount_x=pd.Series([])
for i in region_list:
a=gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory']==i].order_num.pct_change()
b=gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory']==i].sum_amount.pct_change()
order_x=order_x.append(a).sort_index().fillna(0)
amount_x=amount_x.append(b).sort_index().fillna(0)
gather_customer_order_10_11_group['order_diff']=order_x
gather_customer_order_10_11_group['amount_diff']=amount_x
#10月11月各個(gè)區(qū)域自行車(chē)銷(xiāo)售數(shù)量、銷(xiāo)售金額環(huán)比
gather_customer_order_10_11_group.head()

字段注釋?zhuān)?/strong>
chinese_territory:區(qū)域,create_year_month:時(shí)間,order_num:區(qū)域銷(xiāo)售數(shù)量,sum_amount:區(qū)域銷(xiāo)售金額,order_diff:本月銷(xiāo)售數(shù)量環(huán)比,amount_diff:本月銷(xiāo)售金額環(huán)比
將數(shù)據(jù)存入數(shù)據(jù)庫(kù)
表名:pt_bicy_november_territory_2_sam
#將數(shù)據(jù)存入數(shù)據(jù)庫(kù),表名:pt_bicy_november_territory_2_sam
engine = create_engine('mysql://frogdata05:密碼保密@106.15.121.232:3306/datafrog05_adventure?charset=gbk')
gather_customer_order_10_11_group.to_sql('pt_bicy_november_territory_2_sam',con=engine,index=False,if_exists='replace')
2.3、2019年11月自行車(chē)銷(xiāo)售量TOP10城市環(huán)比
(1)篩選11月自行車(chē)交易數(shù)據(jù)
#篩選11月自行車(chē)交易數(shù)據(jù)
gather_customer_order_11 = gather_customer_order[gather_customer_order['create_year_month']=='2019-11']
(2)查看銷(xiāo)量前十的城市銷(xiāo)售數(shù)量
gather_customer_order_city_11 = gather_customer_order_11.groupby('chinese_city')['order_num'].sum().reset_index()
# 11月自行車(chē)銷(xiāo)售數(shù)量前十城市
gather_customer_order_city_head = gather_customer_order_city_11.sort_values(by=['order_num'],ascending=False).head(10)
#查看11月自行車(chē)銷(xiāo)售數(shù)量前十城市
gather_customer_order_city_head

(3)分組計(jì)算前十城市,自行車(chē)銷(xiāo)售數(shù)量銷(xiāo)售金額
#11月銷(xiāo)售前十城市,全部自行車(chē)銷(xiāo)售數(shù)據(jù)
a=pd.merge(gather_customer_order_city_head['chinese_city'],gather_customer_order,on='chinese_city',how='left')
#篩選銷(xiāo)售前十城市,10月11月自行車(chē)銷(xiāo)售數(shù)據(jù)
gather_customer_order_10_11_head =a[a['create_year_month'].isin(['2019-10','2019-11'])]
#分組計(jì)算前十城市,自行車(chē)銷(xiāo)售數(shù)量銷(xiāo)售金額
gather_customer_order_city_10_11 = gather_customer_order.groupby(['chinese_city','create_year_month']).sum().reset_index()[(gather_customer_order.groupby(['chinese_city','create_year_month']).sum().reset_index().chinese_city.isin(gather_customer_order_city_head.chinese_city)) & (gather_customer_order.groupby(['chinese_city','create_year_month']).sum().reset_index().create_year_month.isin(['2019-10','2019-11']))].drop(labels=['customer_num'],axis=1).reset_index(drop=True)
gather_customer_order_city_10_11

計(jì)算10月11月top10城市銷(xiāo)售數(shù)量和銷(xiāo)售金額環(huán)比
#計(jì)算前十城市環(huán)比
city_top_list = list(gather_customer_order_city_10_11.drop_duplicates(['chinese_city']).chinese_city)
order_top_x = pd.Series([])
amount_top_x = pd.Series([])
for i in city_top_list:
#print(i)
a=gather_customer_order_city_10_11[gather_customer_order_city_10_11.chinese_city==i].order_num.pct_change()
b=gather_customer_order_city_10_11[gather_customer_order_city_10_11.chinese_city==i].sum_amount.pct_change()
order_top_x=order_top_x.append(a).fillna(0)
amount_top_x = amount_top_x.append(b).fillna(0)
#order_diff銷(xiāo)售數(shù)量環(huán)比,amount_diff銷(xiāo)售金額環(huán)比
gather_customer_order_city_10_11['order_diff']=order_top_x
gather_customer_order_city_10_11['amount_diff']=amount_top_x
gather_customer_order_city_10_11

字段注釋
chinese_city:城市,create_year_month:時(shí)間,order_num:本月銷(xiāo)售數(shù)量,sum_amount:本月銷(xiāo)售金額,order_diff:本月銷(xiāo)售數(shù)量環(huán)比,amount_diff:本月銷(xiāo)售金額環(huán)比
表名:pt_bicy_november_october_city_3_sam
#存入數(shù)據(jù)庫(kù)
engine = create_engine('mysql://frogdata05:保密@106.15.121.232/datafrog05_adventure?charset=gbk')
gather_customer_order_city_10_11.to_sql('pt_bicy_november_october_city_3_sam',con=engine,index=False,if_exists='replace')
三、2019年11月自行車(chē)各類(lèi)別產(chǎn)品銷(xiāo)售表現(xiàn)
3.1、細(xì)分市場(chǎng)銷(xiāo)量表現(xiàn)
#求每個(gè)月自行車(chē)?yán)塾?jì)銷(xiāo)售數(shù)量
gather_customer_order_group_month = gather_customer_order.groupby('create_year_month').agg({'order_num':'sum'}).reset_index()
#合并自行車(chē)銷(xiāo)售信息表+自行車(chē)每月累計(jì)銷(xiāo)售數(shù)量表,pd.merge
order_num_proportion = pd.merge(gather_customer_order,gather_customer_order_group_month,how='left',on='create_year_month')
#計(jì)算自行車(chē)銷(xiāo)量/自行車(chē)每月銷(xiāo)量占比
order_num_proportion['order_proportion'] = order_num_proportion['order_num_x']/order_num_proportion['order_num_y']
#重命名sum_month_order:自行車(chē)每月銷(xiāo)售量
order_num_proportion = order_num_proportion.rename(columns={'order_num_y':'sum_month_order'})
將每月自行車(chē)銷(xiāo)售信息存入數(shù)據(jù)庫(kù)
表名:pt_bicycle_product_sales_month_4_sam
字段注釋
create_date:時(shí)間, product_name:產(chǎn)品名, cpzl_zw:產(chǎn)品類(lèi)別, cplb_zw:產(chǎn)品大類(lèi), order_num_x:產(chǎn)品當(dāng)天銷(xiāo)售數(shù)量,customer_num:當(dāng)天用戶(hù)購(gòu)買(mǎi)人數(shù),
sum_amount:產(chǎn)品當(dāng)天銷(xiāo)售金額, chinese_province:省份, chinese_city:城市, chinese_territory:區(qū)域,create_year_month:月份, sum_month_order:本月累計(jì)銷(xiāo)量, order_proportion:產(chǎn)品銷(xiāo)量占比
#將每月自行車(chē)銷(xiāo)售信息存入數(shù)據(jù)庫(kù)
engine = create_engine('mysql://frogdata05:保密@106.15.121.232/datafrog05_adventure?charset=gbk')
order_num_proportion.to_sql('pt_bicycle_product_sales_month_4_sam',con=engine,index=False,if_exists='replace')
3.2公路/山地/旅游自行車(chē)細(xì)分市場(chǎng)表現(xiàn)
(1)公路自行車(chē)細(xì)分市場(chǎng)銷(xiāo)量表現(xiàn)
求每月公路自行車(chē)中各個(gè)型號(hào)自行車(chē)銷(xiāo)售數(shù)量與公路自行車(chē)總銷(xiāo)量,為后續(xù)求各個(gè)型號(hào)的比率做鋪墊
gather_customer_order_road = gather_customer_order[gather_customer_order['cpzl_zw'] == '公路自行車(chē)']
#1.求公路自行車(chē)不同型號(hào)產(chǎn)品銷(xiāo)售數(shù)量
gather_customer_order_road_month =gather_customer_order_road.groupby(by = ['create_year_month','product_name']).sum().order_num.reset_index()
gather_customer_order_road_month['cpzl_zw'] = '公路自行車(chē)'
#2.每個(gè)月公路自行車(chē)?yán)塾?jì)銷(xiāo)售數(shù)量
gather_customer_order_road_month_sum = gather_customer_order_road_month.groupby('create_year_month').agg({'order_num':'sum'}).reset_index()
#3.合并公路自行車(chē)gather_customer_order_road_month與每月累計(jì)銷(xiāo)售數(shù)量
#用于計(jì)算不同型號(hào)產(chǎn)品的占比
gather_customer_order_road_month = pd.merge(gather_customer_order_road_month,gather_customer_order_road_month_sum,on='create_year_month',how='outer')
gather_customer_order_road_month.head()

(2)山地自行車(chē)細(xì)分市場(chǎng)銷(xiāo)量表現(xiàn)
與公路自行車(chē)處理方式一致
# 篩選
gather_customer_order_Mountain = gather_customer_order[gather_customer_order['cpzl_zw'] == '山地自行車(chē)']
#求山地自行車(chē)不同型號(hào)產(chǎn)品銷(xiāo)售數(shù)量
gather_customer_order_Mountain_month = gather_customer_order_Mountain.groupby(['create_year_month','product_name']).agg({'order_num':'sum'}).reset_index()
gather_customer_order_Mountain_month['cpzl_zw'] = '山地自行車(chē)'
#每個(gè)月公路自行車(chē)?yán)塾?jì)銷(xiāo)售數(shù)量
gather_customer_order_Mountain_month_sum = gather_customer_order_Mountain_month.groupby('create_year_month').sum().reset_index()
gather_customer_order_road_month_sum.head()
#合并山地自行車(chē)hz_customer_order_Mountain_month與每月累計(jì)銷(xiāo)售數(shù)量
#用于計(jì)算不同型號(hào)產(chǎn)品的占比
gather_customer_order_Mountain_month = pd.merge(gather_customer_order_Mountain_month,gather_customer_order_Mountain_month_sum,on='create_year_month')
gather_customer_order_Mountain_month.head()

(3)旅游自行車(chē)細(xì)分市場(chǎng)銷(xiāo)量表現(xiàn)
與公路自行車(chē)處理方式一致
gather_customer_order_tour = gather_customer_order[gather_customer_order['cpzl_zw'] == '旅游自行車(chē)']
#求旅游自行車(chē)不同型號(hào)產(chǎn)品銷(xiāo)售數(shù)量
gather_customer_order_tour_month = gather_customer_order_tour.groupby(['create_year_month','product_name']).agg({'order_num':'sum'}).reset_index()
gather_customer_order_tour_month['cpzl_zw'] = '旅游自行車(chē)'
gather_customer_order_tour_month_sum = gather_customer_order_tour_month.groupby('create_year_month').sum().reset_index()
gather_customer_order_tour_month = pd.merge(gather_customer_order_tour_month,gather_customer_order_tour_month_sum,on='create_year_month')
(4)將山地自行車(chē)、旅游自行車(chē)、公路自行車(chē)每月銷(xiāo)量信息合并
#將山地自行車(chē)、旅游自行車(chē)、公路自行車(chē)每月銷(xiāo)量信息合并
gather_customer_order_month = pd.concat([gather_customer_order_road_month,gather_customer_order_Mountain_month,gather_customer_order_tour_month],axis=0)
(5)總銷(xiāo)售量比率
求出每種一級(jí)子類(lèi)自行車(chē)(公路、山地、旅行)中,各品類(lèi)的各月的銷(xiāo)量占比
#各類(lèi)自行車(chē),銷(xiāo)售量占每月自行車(chē)總銷(xiāo)售量比率
gather_customer_order_month['order_num_proportio'] =gather_customer_order_month['order_num_x']/gather_customer_order_month['order_num_y']
gather_customer_order_month.head()
#改名
#order_month_product當(dāng)月產(chǎn)品累計(jì)銷(xiāo)量
#sum_order_month當(dāng)月自行車(chē)總銷(xiāo)量
gather_customer_order_month.rename(columns={'order_num_x':'order_month_product','order_num_y':'sum_order_month'},inplace=True)

將細(xì)分市場(chǎng)表現(xiàn)數(shù)據(jù)存入數(shù)據(jù)庫(kù)
表名:pt_bicycle_product_sales_order_month_4_sam
字段注釋?zhuān)?br> create_year_month:時(shí)間,product_name:產(chǎn)品名,order_month_product:本月產(chǎn)品累計(jì)銷(xiāo)量,sum_order_month:當(dāng)月自行車(chē)總銷(xiāo)量,order_num_proportio:本月產(chǎn)品銷(xiāo)量占比
#將數(shù)據(jù)存入數(shù)據(jù)庫(kù)
engine = create_engine('mysql://frogdata05:secret@106.15.121.232:3306/datafrog05_adventure?charset=gbk')
gather_customer_order_month.to_sql('pt_bicycle_product_sales_order_month_4_sam',con=engine,index=False,if_exists='replace')
3.3、公路/山地/旅游自行車(chē)2019年11月銷(xiāo)量環(huán)比
#計(jì)算11月環(huán)比,先篩選10月11月數(shù)據(jù)
gather_customer_order_month_10_11 = gather_customer_order_month[gather_customer_order_month.create_year_month.isin(['2019-10','2019-11'])]
#排序。將10月11月自行車(chē)銷(xiāo)售信息排序
gather_customer_order_month_10_11 = gather_customer_order_month_10_11.sort_values(by = ['product_name','create_year_month']).reset_index(drop=True)
product_name = list(gather_customer_order_month_10_11.product_name.drop_duplicates())
#計(jì)算自行車(chē)銷(xiāo)售數(shù)量環(huán)比
order_num_diff=pd.Series([])
for i in product_name:
a=gather_customer_order_month_10_11[gather_customer_order_month_10_11['product_name']==i].order_month_product.pct_change()
order_num_diff=order_num_diff.append(a).fillna(0)
gather_customer_order_month_10_11['order_num_diff'] = order_num_diff
#篩選出11月自行車(chē)數(shù)據(jù)
gather_customer_order_month_11 = gather_customer_order_month_10_11[gather_customer_order_month_10_11['create_year_month'] == '2019-11']
gather_customer_order_month_11.head(10)

3.4、2019年1月至11月產(chǎn)品累計(jì)銷(xiāo)量
month_1_11=list(gather_customer_order_month.create_year_month.drop_duplicates()[:11])
gather_customer_order_month_1_11 = gather_customer_order_month[gather_customer_order_month.create_year_month.isin(month_1_11)]
#計(jì)算2019年1月至11月自行車(chē)?yán)塾?jì)銷(xiāo)量
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11.groupby(by = 'product_name').order_month_product.sum().reset_index()
#重命名sum_order_1_11:1-11月產(chǎn)品累計(jì)銷(xiāo)量
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11_sum.rename(columns = {'order_month_product':'sum_order_1_11'})

3.5 2019年11月自行車(chē)產(chǎn)品銷(xiāo)量、環(huán)比、累計(jì)銷(xiāo)量(結(jié)合3.2-3.4)
累計(jì)銷(xiāo)量在gather_customer_order_month_1_11_sum中已計(jì)算好,11月自行車(chē)環(huán)比、及產(chǎn)品銷(xiāo)量占比在gather_customer_order_month_11已計(jì)算好,把2張表merge一下
#按相同字段product_name產(chǎn)品名,合并兩張表
gather_customer_order_month_11 = pd.merge(gather_customer_order_month_11,gather_customer_order_month_1_11_sum,on='product_name')
#觀(guān)察表數(shù)據(jù)
gather_customer_order_month_11.head()

存入數(shù)據(jù)庫(kù)
表名:pt_bicycle_product_sales_order_month_11_sam
字段注釋?zhuān)?br> create_year_month:時(shí)間;product_name:產(chǎn)品,order_month_product:產(chǎn)品本月累計(jì)銷(xiāo)量,cpzl_zw:產(chǎn)品類(lèi)別,sum_order_month:當(dāng)月自行車(chē)總銷(xiāo)量,order_num_proportio:產(chǎn)品本月占比,order_num_diff:產(chǎn)品本月環(huán)比,sum_order_1_11:1-11月產(chǎn)品累計(jì)銷(xiāo)量
#存入數(shù)據(jù)庫(kù)
engine = create_engine('mysql://frogdata05:Frogdata!1321@106.15.121.232:3306/datafrog05_adventure?charset=gbk')
gather_customer_order_month_11.to_sql('pt_bicycle_product_sales_order_month_11_sam',index=False,con=engine,if_exists='replace')
四、2019年11月熱品銷(xiāo)售分析
4.1、11月產(chǎn)品銷(xiāo)量TOP10產(chǎn)品,銷(xiāo)量及環(huán)比
2019年11月自行車(chē)產(chǎn)品銷(xiāo)售表現(xiàn)”時(shí)已計(jì)算出11月所有產(chǎn)品的銷(xiāo)量及環(huán)比,這里不在重復(fù)計(jì)算,直接使用gather_customer_order_month_10_11、gather_customer_order_month_11
取top10產(chǎn)品
#計(jì)算產(chǎn)品銷(xiāo)售數(shù)量,\ 為換行符
#按照銷(xiāo)量降序,取TOP10產(chǎn)品
customer_order_11_top10 = gather_customer_order_11.groupby(by = 'product_name').\
order_num.count().reset_index().sort_values(by = 'order_num',ascending = False).head(10)
計(jì)算TOP10銷(xiāo)量及環(huán)比
只需要五個(gè)字段:create_year_month月份,product_name產(chǎn)品名,order_month_product本月銷(xiāo)量,cpzl_zw產(chǎn)品類(lèi)別,order_num_diff本月產(chǎn)品銷(xiāo)量環(huán)比
customer_order_month_10_11 = gather_customer_order_month_10_11[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
#找出top10商品
customer_order_month_10_11 = customer_order_month_10_11[customer_order_month_10_11['product_name'].isin(list(customer_order_11_top10['product_name']))]
customer_order_month_10_11['category'] = '本月TOP10銷(xiāo)量'
customer_order_month_10_11.head()

4.2、11月產(chǎn)品增速TOP10產(chǎn)品,銷(xiāo)售數(shù)量及環(huán)比
#11月增長(zhǎng)率前十的產(chǎn)品信息
customer_order_month_11 = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['create_year_month'] == '2019-11'].sort_values(by = 'order_num_diff',ascending = False).head(10)
#11月top10增長(zhǎng)率產(chǎn)品在10月和11月的信息
customer_order_month_11_top10_seep = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['product_name'].isin(list(customer_order_month_11['product_name']))]
篩選需要的四個(gè)字段:create_year_month月份,product_name產(chǎn)品名,order_month_product本月銷(xiāo)量,cpzl_zw產(chǎn)品類(lèi)別,order_num_diff本月產(chǎn)品銷(xiāo)量環(huán)比
customer_order_month_11_top10_seep = customer_order_month_11_top10_seep[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
customer_order_month_11_top10_seep['category'] = '本月TOP10增速'
customer_order_month_11_top10_seep.head()

合并TOP10銷(xiāo)量表和TOP10增速表
#axis = 0按照行維度合并,axis = 1按照列維度合并
hot_products_11 = pd.concat([customer_order_month_10_11,customer_order_month_11_top10_seep],axis = 0)
hot_products_11

將11月Top10銷(xiāo)量產(chǎn)品、增速產(chǎn)品表存入數(shù)據(jù)庫(kù)
表名:pt_hot_products_november_sam
字段注釋?zhuān)?br> create_year_month:月份,product_name:產(chǎn)品名,order_month_product:本月產(chǎn)品銷(xiāo)量,order_num_diff:本月產(chǎn)品環(huán)比,category:分類(lèi)
#存入數(shù)據(jù)庫(kù)
engine = create_engine('mysql://frogdata05:秘密@106.15.121.232:3306/datafrog05_adventure?charset=gbk')
datafrog=engine
hot_products_11.to_sql('pt_hot_products_november_sam',con = datafrog,if_exists='replace', index=False)
五、用戶(hù)行為分析
這里需要使用訂單明細(xì)表ods_sales_orders,用戶(hù)表信息表ods_customer
#讀取數(shù)據(jù)庫(kù)客戶(hù)信息表
engine = create_engine('mysql://frogdata001:秘密@106.13.128.83:3306/adventure_ods?charset=gbk')
datafrog=engine
df_CUSTOMER = pd.read_sql_query("select customer_key,birth_date,gender,marital_status from ods_customer where create_date < '2019-12-1'",con = datafrog)
#讀取數(shù)據(jù)庫(kù)銷(xiāo)售訂單表
engine = create_engine('mysql://frogdata001:秘密@106.13.128.83:3306/adventure_ods?charset=gbk')
datafrog=engine
df_sales_orders_11 = pd.read_sql_query("select * from ods_sales_orders where create_date>='2019-11-1' and create_date<'2019-12-1'",con = datafrog)
銷(xiāo)售訂單表中僅客戶(hù)編號(hào),無(wú)客戶(hù)年齡性別等信息,需要將銷(xiāo)售訂單表和客戶(hù)信息表合并
#pd.merge
sales_customer_order_11=pd.merge(df_sales_orders_11,df_CUSTOMER,on='customer_key',how='left')
sales_customer_order_11.head()

獲取客人的年份作為新的一列 birth_year
sales_customer_order_11.birth_date.fillna('0000-00-00',inplace=True)#有空值
sales_customer_order_11['birth_date'].str.split('-').apply(lambda x:x[0])
sales_customer_order_11['birth_year']=sales_customer_order_11['birth_date'].str.split('-').apply(lambda x:x[0])
sales_customer_order_11.head(3)

5.1、用戶(hù)年齡分析
#修改出生年為int數(shù)據(jù)類(lèi)型
sales_customer_order_11['birth_year'] = sales_customer_order_11['birth_year'].astype(int)
# 計(jì)算用戶(hù)年齡
sales_customer_order_11['customer_age'] = 2019 - sales_customer_order_11['birth_year']
對(duì)年齡進(jìn)行分層0-34,35-39,40-44,45-49,50-54,55-59,60-64
#年齡分層1
bins=[0,34,39,44,49,54,59,64]
labels=['0-34','35-39','40-44','45-49','50-54','55-59','60-64']
#新增'age_level'分層區(qū)間列
sales_customer_order_11['age_level'] = pd.cut(sales_customer_order_11['customer_age'],bins=bins,labels=labels)
篩選銷(xiāo)售訂單為自行車(chē)的訂單信息
#篩選銷(xiāo)售訂單為自行車(chē)的訂單信息
df_customer_order_bycle = sales_customer_order_11.loc[sales_customer_order_11['cplb_zw'] == '自行車(chē)']
#查看總數(shù):一共3438個(gè)
df_customer_order_bycle.count()
# 計(jì)算年齡比率
df_customer_order_bycle['age_level_rate'] = 1/3438
df_customer_order_bycle

將年齡分為3個(gè)層次,分別為'<=29'、'30-39'、'>=40'
def cut_age(x):
'''年齡分層函數(shù)'''
if x>=40:
return '>=40'
elif x>=30 & x<=39:
return '30-39'
elif x<=29:
return '<=29'
else:
return 'nan'
'''將年齡分為3個(gè)層次'''
df_customer_order_bycle.age_level2=df_customer_order_bycle.customer_age
df_customer_order_bycle['age_level2']=df_customer_order_bycle.customer_age.apply(cut_age)
df_customer_order_bycle.head(3)

查看每個(gè)年齡段人數(shù)
# 求每個(gè)年齡段人數(shù)
age_level2_count = df_customer_order_bycle.groupby(by = 'age_level2').customer_key.count().reset_index()
age_level2_count.head()

將自行車(chē)銷(xiāo)售訂單表和每個(gè)年齡段人數(shù)表合并,為后續(xù)算年齡段比例做準(zhǔn)備
df_customer_order_bycle = pd.merge(df_customer_order_bycle,age_level2_count,on = 'age_level2').rename(columns = {'customer_key_y':'age_level2_count'})

5.2、用戶(hù)性別分析
gender_count = df_customer_order_bycle.groupby(by = 'gender').cplb_zw.count().reset_index()
gender_count

將性別個(gè)數(shù)和自行車(chē)銷(xiāo)售訂單表+每個(gè)年齡段人數(shù)表合并,形成一個(gè)匯總信息表,并計(jì)算性別比率。匯總成了用戶(hù)行為信息總表。
#合并2表
df_customer_order_bycle = pd.merge(df_customer_order_bycle,gender_count,on = 'gender').rename(columns = {'cplb_zw_y':'gender_count'})
#性別比率
df_customer_order_bycle['gender_rate'] = 1/df_customer_order_bycle['gender_count']
將11月用戶(hù)行為信息總表保存到數(shù)據(jù)庫(kù)
表名:pt_user_behavior_november_sam
#df_customer_order_bycle 將11月自行車(chē)用戶(hù)存入數(shù)據(jù)庫(kù)
#存入數(shù)據(jù)庫(kù)
engine = create_engine('mysql://frogdata05:秘密@106.15.121.232:3306/datafrog05_adventure?charset=gbk')
datafrog=engine
df_customer_order_bycle.to_sql('pt_user_behavior_november_sam',con = datafrog,if_exists='replace', index=False)
至此數(shù)據(jù)預(yù)處理的工作已經(jīng)完成,后續(xù)會(huì)更新使用powerBI進(jìn)行可視化的部分
以下為分析報(bào)告
image.png
image.png
image.png
image.png
image.png
image.pngimage.png
image.png
image.png
image.png
image.png
image.png
image.png
image.png
image.png














