2019年11月自行車業(yè)務(wù)銷售分析報告
分析思路
- 自行車整體銷售表現(xiàn)
- 2019年11月自行車地域銷售表現(xiàn)
- 2019年11月自行車產(chǎn)品銷售表現(xiàn)
- 用戶行為分析
- 2019年11月熱品銷售分析
計算結(jié)果存入數(shù)據(jù)庫,對應(yīng)表名:
- 自行車整體銷售表現(xiàn):pt_overall_performance_1
- 2019年11月自行車地域銷售表現(xiàn):pt_bicy_november_territory_2、pt_bicy_novenber_october_city_3
- 2019年11月自行車產(chǎn)品銷售表現(xiàn):pt_bicycle_product_sales_month_4、pt_bicycle_sales_order_month_4、pt_bicycle_product_sales_order_month_11
- 用戶行為分析:pt_user_behavior_november
- 2019年11月熱品銷售分析:pt_hot_products_november
導入模塊
import pandas as pd
import numpy as np
import pymysql
# pymysql.install_as_MySQLdb() # 為了兼容mysqldb;若create_engine()里mysql→mysql+pymysql,則不需要MySQLdb模塊。
from sqlalchemy import create_engine
from datetime import datetime
import matplotlib.pyplot as plt
%matplotlib inline
1. 自行車整體銷售表現(xiàn)
engine=create_engine('mysql+pymysql://用戶名:密碼@ip地址:端口號/數(shù)據(jù)庫名稱')
sql='select * from dw_customer_order'
df=pd.read_sql_query(sql,engine)
gather_customer_order=df.copy() # 復制一份DataFrame,df原始數(shù)據(jù)在后面可以賦予其他變量。
gather_customer_order.head()

查看數(shù)據(jù)類型和是否有空值
gather_customer_order.info()

數(shù)據(jù)比較純凈,沒有空值,所以不用做數(shù)據(jù)清洗。
為按月維度分析,增加create_year_month字段
gather_customer_order['create_year_month']=gather_customer_order['create_date'].apply(lambda x :x.strftime('%Y-%m')) # str.strftime();網(wǎng)上基本找到的都是time.strftime(),這里需要著重加深印象。
篩選產(chǎn)品類別為自行車的訂單
gather_customer_order=gather_customer_order[gather_customer_order['cplb_zw']=='自行車']
gather_customer_order.head()

1.1 自行車整體銷售表現(xiàn)(訂單量,金額)
overall_sales_performance=gather_customer_order.groupby('create_year_month')[['order_num','sum_amount']].sum().reset_index().sort_values('create_year_month',ascending=False)
pd.set_option('display.float_format', lambda x: '%.3f' % x) # 科學計數(shù)法顯示小數(shù)點后3位
overall_sales_performance=overall_sales_performance[overall_sales_performance['create_year_month']<'2019-12']
overall_sales_performance

x=overall_sales_performance['create_year_month']
y1=overall_sales_performance['order_num']
y2=overall_sales_performance['sum_amount']
fig=plt.figure(figsize=(10,10))
ax1=fig.add_subplot(211)
plt.plot(x,y1)
ax1.set_xlabel('月份')
ax1.set_ylabel('訂單量')
ax1.set_title('月份-訂單')
ax2=fig.add_subplot(212)
plt.plot(x,y2)
ax2.set_xlabel('月份')
ax2.set_ylabel('金額')
ax2.set_title('月份-金額')

從月份-金額和月份-訂單量折線圖可以看出:
- 2019年自行車每月的訂單量和金額波動較大,銷售表現(xiàn)不穩(wěn)定,這并不是一件好事情。
- 除6-8月月份-訂單和月份-金額折線圖整體趨勢基本相同,除6月-8月客單價相對穩(wěn)定;
- 2019年2月、4月和9月銷售狀況最差,下跌嚴重,需分析這幾個月前后是否有促銷活動;2月是否是由于過年導致銷售量低等原因。
- 2019年11月銷售表現(xiàn)為整年最好,需分析是否是年底各銷售人員沖業(yè)績、月前后是否有促銷、是否在2019年9月有新的產(chǎn)品銷售等原因。
1.2 訂單量、銷售金額環(huán)比
# 訂單量環(huán)比
overall_sales_performance['order_diff']=overall_sales_performance['order_num'].pct_change(-1)
# 銷售金額環(huán)比
overall_sales_performance['sum_amount_diff']=overall_sales_performance['sum_amount'].pct_change(-1)
# 排序
overall_sales_performance.sort_values('create_year_month',inplace=True)
# 填充空值
overall_sales_performance.fillna(0,inplace=True)
overall_sales_performance

x=overall_sales_performance['create_year_month']
y1=overall_sales_performance['order_diff']
y2=overall_sales_performance['sum_amount_diff']
fig=plt.figure(figsize=(10,10))
fig.add_subplot(211)
plt.plot(x,y1)
plt.xlabel('月份')
plt.ylabel('訂單環(huán)比')
plt.title('2019年每月訂單環(huán)比')
fig.add_subplot(212)
plt.plot(x,y2)
plt.xlabel('月份')
plt.ylabel('銷售金額環(huán)比')
plt.title('2019年每月銷售金額環(huán)比')

趨勢同訂單量、銷售金額一致。
2. 2019年11月自行車地域銷售表現(xiàn)
篩選10-11月數(shù)據(jù)
gather_customer_order_10_11=gather_customer_order[(gather_customer_order['create_year_month']=='2019-10')|(gather_customer_order['create_year_month']=='2019-11')]
gather_customer_order_10_11.head()

按區(qū)域每月分組
gather_customer_order_10_11_group=gather_customer_order_10_11.groupby(['chinese_territory','create_year_month'])['order_num','sum_amount'].sum().reset_index()
gather_customer_order_10_11_group.head()

2.1 計算區(qū)域每月訂單量和銷售額環(huán)比
# 獲取區(qū)域
region_list=list(gather_customer_order_10_11_group['chinese_territory'].unique())
# 創(chuàng)建新series,用于存儲不同區(qū)域的訂單量、銷售金額環(huán)比
order_x=pd.Series([])
amount_x=pd.Series([])
# 遍歷區(qū)域
for i in region_list:
a=gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory']==i]['order_num'].pct_change(1)
b=gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory']==i]['sum_amount'].pct_change(1)
order_x=order_x.append(a)
amount_x=amount_x.append(b)
# 創(chuàng)建新列
gather_customer_order_10_11_group['order_diff']=order_x
gather_customer_order_10_11_group['amount_diff']=amount_x
# 填充空值
gather_customer_order_10_11_group.fillna(0,inplace=True)
gather_customer_order_10_11_group.head()

x=gather_customer_order_10_11_group[gather_customer_order_10_11_group['create_year_month']=='2019-11']['chinese_territory']
y1=gather_customer_order_10_11_group[gather_customer_order_10_11_group['create_year_month']=='2019-11']['order_diff']
y2=gather_customer_order_10_11_group[gather_customer_order_10_11_group['create_year_month']=='2019-11']['amount_diff']
fig=plt.figure(figsize=(10,10))
fig.add_subplot(211)
plt.plot(x,y1)
plt.xlabel('區(qū)域')
plt.ylabel('訂單環(huán)比')
plt.title('2019年11月區(qū)域訂單環(huán)比')
fig.add_subplot(212)
plt.plot(x,y2,color='green')
plt.xlabel('區(qū)域')
plt.ylabel('銷售金額環(huán)比')
plt.title('2019年11月區(qū)域銷售金額環(huán)比')

在11月中,西北和西南地區(qū)訂單量和銷售金額增長率過低;臺港澳地區(qū)訂單量和銷售金額環(huán)比為負,且基數(shù)較小。
2.2 2019年11月自行車銷量Top10城市
# 篩選11月自行車交易數(shù)據(jù)
gather_customer_order_11=gather_customer_order_10_11[gather_customer_order_10_11['create_year_month']=='2019-11']
# 按城市分組求總訂單量
gather_customer_order_city_11=gather_customer_order_11.groupby('chinese_city')[['order_num']].sum().sort_values('order_num',ascending=False).reset_index()
# 11月自行車銷售數(shù)量前10城市
gather_customer_order_city_head=gather_customer_order_city_11.head(10)
gather_customer_order_city_head

可以看出11月自行車銷量前10的城市基本分布在超一線城市例如北上廣或省會城市中。其中可以觀察下常德市的具體原因,是否是公司政策、政府政策或競爭相對較低等原因具體分析。
2.3 11月銷量top10城市在10-11月的訂單量、銷售金額環(huán)比
# 篩選銷售前10城市,10月、11月自行車銷售數(shù)據(jù)
gather_customer_order_10_11_head=gather_customer_order_10_11[gather_customer_order_10_11['chinese_city'].isin(gather_customer_order_city_head['chinese_city'])]
# 分組計算10-11月前10城市,自行車銷售數(shù)量和銷售金額
gather_customer_order_city_10_11=gather_customer_order_10_11_head.groupby(['chinese_city','create_year_month'])[['order_num','sum_amount']].sum().reset_index()
# 計算前10城市10、11月銷售金額與銷售量環(huán)比
city_top_list=gather_customer_order_city_10_11['chinese_city'].unique()
top_x=pd.DataFrame()
for i in city_top_list:
a=gather_customer_order_city_10_11[gather_customer_order_city_10_11['chinese_city']==i][['order_num','sum_amount']].pct_change(1)
top_x=top_x.append(a)
top_x=top_x.fillna(0)
# order_diff銷售數(shù)量環(huán)比,amount_diff銷售金額環(huán)比
gather_customer_order_city_10_11=pd.concat([gather_customer_order_city_10_11,top_x],axis=1) # axis=1為列
gather_customer_order_city_10_11

北京、南京、常德、鄭州、長沙的11月銷售金額增長率較高,成都、武漢11月銷售金額增長率為負。
3. 2019年11月自行車產(chǎn)品銷售表現(xiàn)
# 求每個月自行車銷售數(shù)量
gather_customer_order_group_month=gather_customer_order.groupby('create_year_month')['order_num'].sum().reset_index()
# 合并自行車銷售信息表+自行車每月銷售數(shù)量表
order_num_proportion=pd.merge(left=gather_customer_order,right=gather_customer_order_group_month,on='create_year_month')
# 計算自行車銷量/自行車每月銷量,形成新列order_proportion
order_num_proportion['order_proportion']=order_num_proportion['order_num_x']/order_num_proportion['order_num_y']
# 重命名自行車每月銷量為sum_month_order
order_num_proportion.rename(columns={'order_num_y':'sum_month_order'},inplace=True)
order_num_proportion.head()

3.1 公路/山地/旅游自行車細分市場表現(xiàn)
3.1.1 公路自行車
# 篩選公路自行車字段
gather_customer_order_road=gather_customer_order[gather_customer_order['cpzl_zw']=='公路自行車']
# 求公路自行車不同型號產(chǎn)品銷售數(shù)量
gather_customer_order_road_month=gather_customer_order_road.groupby(['create_year_month','product_name'])['order_num'].sum().reset_index()
# 添加一列公路自行車字段
gather_customer_order_road_month['cpzl_zw']='公路自行車'
gather_customer_order_road_month

每月公路自行車銷售數(shù)量
# 每個月公路自行車銷售數(shù)量
gather_customer_order_road_month_sum=gather_customer_order_road_month.groupby('create_year_month').sum().reset_index()
gather_customer_order_road_month_sum.head()

# 合并公路自行車gather_customer_order_road_month與每月公路自行車銷售數(shù)量
# 用于計算不同型號產(chǎn)品的占比
gather_customer_order_road_month=pd.merge(gather_customer_order_road_month,gather_customer_order_road_month_sum,on='create_year_month')
gather_customer_order_road_month.head()

3.1.2 山地自行車
# 篩選山地自行車
gather_customer_order_Moutain=gather_customer_order[gather_customer_order['cpzl_zw']=='山地自行車']
# 求山地自行車不同產(chǎn)品銷售數(shù)量
gather_customer_order_Moutain_month=gather_customer_order_Moutain.groupby(['create_year_month','product_name'])['order_num'].sum().reset_index()
# 新增一列為山地自行車
gather_customer_order_Moutain_month['cpzl_zw']='山地自行車'
# 每個月公路自行車銷售數(shù)量
gather_customer_order_Moutain_month_sum=gather_customer_order_Moutain_month.groupby('create_year_month').sum().reset_index()
# 合并山地自行車gather_customer_order_Moutain_month與每月公路自行車銷售數(shù)量
gather_customer_order_Moutain_month=pd.merge(gather_customer_order_Moutain_month,gather_customer_order_Moutain_month_sum,on='create_year_month')
gather_customer_order_Moutain_month.head()

3.1.3 旅游自行車
# 篩選旅游自行車
gather_customer_order_tour=gather_customer_order[gather_customer_order['cpzl_zw']=='旅游自行車']
# 旅游自行車每月不同產(chǎn)品銷售數(shù)量
gather_customer_order_tour_month=gather_customer_order_tour.groupby(['create_year_month','product_name'])['order_num'].sum().reset_index()
# 新增一列旅游自行車
gather_customer_order_tour_month['cpzl_zw']='旅游自行車'
# 每月旅游自行車銷售數(shù)量
gather_customer_order_tour_month_sum=gather_customer_order_tour_month.groupby('create_year_month').sum().reset_index()
# 合并旅游自行車gather_customer_order_tour_month與每月旅游自行車銷售數(shù)量
gather_customer_order_tour_month=pd.merge(gather_customer_order_tour_month,gather_customer_order_tour_month_sum,on='create_year_month')
gather_customer_order_tour_month.head()

3.1.4 數(shù)據(jù)合并
# 將山地自行車、旅游自行車、公路自行車每月銷售信息合并
gather_customer_order_month=pd.concat([gather_customer_order_road_month,gather_customer_order_Moutain_month,gather_customer_order_tour_month],axis=0) # axis=0:index
# 新增一列,每月各產(chǎn)品銷售量占每月各類自行車總銷售量比率
gather_customer_order_month['order_num_proportio']=gather_customer_order_month['order_num_x']/gather_customer_order_month['order_num_y']
# 重命名
gather_customer_order_month.rename(columns={'order_num_x':'order_month_product','order_num_y':'sum_order_month'},inplace=True)
gather_customer_order_month.head()

3.2 2019年11月自行車環(huán)比
3.2.1 2019年11月產(chǎn)品銷售數(shù)量環(huán)比
# 計算11月環(huán)比,先篩選10、11月數(shù)據(jù)
gather_customer_order_month_10_11=gather_customer_order_month[gather_customer_order_month['create_year_month'].isin(['2019-10','2019-11'])]
# 排序
gather_customer_order_month_10_11.sort_values(['product_name','create_year_month'],inplace=True)
# 計算產(chǎn)品銷售數(shù)量環(huán)比
product_name=gather_customer_order_month_10_11['product_name'].unique()
order_top_x=pd.Series()
for i in product_name:
a=gather_customer_order_month_10_11[gather_customer_order_month_10_11['product_name']==i]['order_month_product'].pct_change(1)
order_top_x=order_top_x.append(a)
order_top_x.fillna(0,inplace=True)
# 新建列
gather_customer_order_month_10_11['order_num_diff']=order_top_x
gather_customer_order_month_10_11.head()

# 篩選出11月自行車數(shù)據(jù)
gather_customer_order_month_11=gather_customer_order_month_10_11[gather_customer_order_month_10_11['create_year_month']=='2019-11']
gather_customer_order_month_11.head()

3.3 2019年1至11月產(chǎn)品總銷量
# 篩選2019年1月至11月自行車數(shù)據(jù)
gather_customer_order_month_1_11=gather_customer_order_month[gather_customer_order_month['create_year_month']<'2019-12']
# 計算2019年1月至11月產(chǎn)品總銷量
gather_customer_order_month_1_11_sum=gather_customer_order_month_1_11.groupby('product_name')['order_month_product'].sum().reset_index()
# 重命名sum_order_1_11:1-11月產(chǎn)品總銷量
gather_customer_order_month_1_11_sum.rename(columns={'order_month_product':'sum_order_1_11'},inplace=True)
gather_customer_order_month_1_11_sum.head()

3.4 2019年11月產(chǎn)品銷量占各類自行車總銷量比例+產(chǎn)品全年銷售量
# 按相同字段product_name,合并兩張表
gather_customer_order_month_1_11_sum=pd.merge(gather_customer_order_month_11,gather_customer_order_month_1_11_sum,on='product_name')
gather_customer_order_month_1_11_sum.head()

4. 用戶行為分析
# 讀取數(shù)據(jù)庫客戶信息表(2019-1-11客戶)
sql1="select customer_key,birth_date,gender,marital_status from ods_customer where create_date<'2019-12-1'"
df2=pd.read_sql_query(sql1,engine)
# 讀取數(shù)據(jù)庫銷售訂單表(2019-11訂單)
sql2="select * from ods_sales_orders where create_date>='2019-11-1' and create_date<'2019-12-1'"
df3=pd.read_sql_query(sql2,engine)
df_customer=df2.copy()
df_sales_orders_11=df3.copy()
# 合并2張表
sales_customer_order_11=pd.merge(df_sales_orders_11,df_customer,on='customer_key',how='left')
sales_customer_order_11.head()

# 新建列出生年
birth_year=sales_customer_order_11['birth_date'].str.split('-',expand=True).drop([1,2],axis=1).rename(columns={0:'birth_year'}) # expand=True:series→dataframe
# 合并
sales_customer_order_11=pd.concat([sales_customer_order_11,birth_year],axis=1)
sales_customer_order_11.head()

4.1 用戶年齡分析
4.1.1 年齡分組
# 填充缺失客戶數(shù)據(jù)
sales_customer_order_11.fillna('0',inplace=True)
# 修改出生年為int數(shù)據(jù)類型
sales_customer_order_11['birth_year']=sales_customer_order_11['birth_year'].astype('int64')
# 計算用戶年齡
sales_customer_order_11['customer_age']=2020-sales_customer_order_11['birth_year']
# 年齡分層
bins=[30,34,39,44,49,54,59,64]
sales_customer_order_11['age_level']=pd.cut(sales_customer_order_11['customer_age'],bins,labels=['30-34','35-39','40-44','45-49','50-54','55-59','60-64'])
sales_customer_order_11.head()

4.1.2 計算年齡組數(shù)和占比
# 篩選銷售訂單為自行車的訂單
df_customer_order_bycle=sales_customer_order_11[sales_customer_order_11['cplb_zw']=='自行車']
# 計算年齡比例
age_group_count=df_customer_order_bycle.groupby('age_level')['customer_age'].count().reset_index().rename(columns={'customer_age':'age_group_count'})
# 合并
df_customer_order_bycle=pd.merge(df_customer_order_bycle,age_group_count,on='age_level')
# 每位用戶的年齡占該年齡組的比例
df_customer_order_bycle['age_level_rate']=1/df_customer_order_bycle['age_group_count']
df_customer_order_bycle.head()

4.1.3 年齡分層
# 將年齡分為3個層次
df_customer_order_bycle.loc[df_customer_order_bycle['customer_age']>=40,'age_level2']='>=40'
df_customer_order_bycle.loc[df_customer_order_bycle['customer_age']<=29,'age_level2']='<=29'
df_customer_order_bycle.loc[(df_customer_order_bycle['customer_age']>=30)&(df_customer_order_bycle['customer_age']<=39),'age_level2']='30-39'
df_customer_order_bycle.head()

4.1.4 各年齡分層人數(shù)
# 求每個年齡段人數(shù)
age_level2_count=pd.DataFrame(df_customer_order_bycle['age_level2'].value_counts()).reset_index()
# 重命名
age_level2_count.rename(columns={'index':'age_level2','age_level2':'sales_order_key'},inplace=True)
age_level2_count

4.2 用戶性別
gender_count=pd.DataFrame(df_customer_order_bycle['gender'].value_counts()).reset_index()
gender_count.rename(columns={'index':'gender','gender':'cplb_zw'},inplace=True)
gender_count

df_customer_order_bycle=pd.merge(df_customer_order_bycle,age_level2_count,on='age_level2').rename(columns={'sales_order_key_y':'age_level2_count'})
df_customer_order_bycle['age_level2_rate']=1/df_customer_order_bycle['age_level2_count']
df_customer_order_bycle=pd.merge(df_customer_order_bycle,gender_count,on='gender').rename(columns={'cplb_zw_y':'gender_count'})
df_customer_order_bycle['gender_rate']=1/df_customer_order_bycle['gender_count']
df_customer_order_bycle.head()

5. 2019年11月熱品銷售分析
5.1 11月產(chǎn)品銷量top10產(chǎn)品銷量與環(huán)比
我們在分析2019年11月自行車產(chǎn)品銷售表現(xiàn)時已計算出11月所有產(chǎn)品的銷量與環(huán)比,這里不重復計算,直接使用gather_customer_order_month_10_11、gather_customer_order_month_11
5.1.1 計算top10產(chǎn)品
# 計算產(chǎn)品銷售數(shù)量
# 按照銷量降序,取TOP10產(chǎn)品
customer_order_11_top10=gather_customer_order_11.groupby('product_name')['order_num'].sum().reset_index().\
sort_values('order_num',ascending=False).head(10)
customer_order_11_top10

5.1.2 計算top10產(chǎn)品銷量及環(huán)比
這里我們只需要四個字段:create_year_month月份、product_name產(chǎn)品名、order_month_product本月銷量、cpzl_zw產(chǎn)品類別、order_num_diff本月產(chǎn)品銷量環(huán)比(上月)
customer_order_month_10_11=gather_customer_order_month_10_11.drop(['sum_order_month','order_num_proportio'],axis=1)
customer_order_month_10_11=customer_order_month_10_11[customer_order_month_10_11['product_name'].\
isin(customer_order_11_top10['product_name'].unique())]
customer_order_month_10_11['category']='本月TOP10銷量'
customer_order_month_10_11

5.2 11月增速top10產(chǎn)品銷售數(shù)量與環(huán)比
customer_order_month_11=gather_customer_order_month_11.sort_values('order_num_diff',ascending=False).head(10)
customer_order_month_11_top10_seep=gather_customer_order_month_10_11[gather_customer_order_month_10_11['product_name']\
.isin(customer_order_month_11['product_name'])]
篩選我們需要的4個字段:create_year_month月份、product_name產(chǎn)品名、order_month_product本月銷量、cpzl_zw產(chǎn)品類別、order_num_diff本月產(chǎn)品銷量環(huán)比
customer_order_month_11_top10_seep.drop(['sum_order_month','order_num_proportio'],axis=1,inplace=True)
customer_order_month_11_top10_seep['category']='本月Top10增速'
customer_order_month_11_top10_seep.head()

5.3 合并TOP10銷量表customer_order_month_10_11,TOP增速customer_order_month_11_top10_seep
# axis=0按行合并,axis=1按列合并
hot_product_11=pd.concat([customer_order_month_10_11,customer_order_month_11_top10_seep],axis=0)
hot_product_11.tail()

# axis=0按行合并,axis=1按列合并
hot_product_11=pd.merge(customer_order_month_10_11,customer_order_month_11_top10_seep,on=['product_name','create_year_month'])
hot_product_11

最后這些變量都可以批量存入數(shù)據(jù)庫,用power bi 畫圖。