數(shù)據(jù)集簡介

數(shù)據(jù)集來源于某健身房2019年3月至2020年2月會員消費購買行為，數(shù)據(jù)集一共包含四個字段：用戶ID，購買日期，購買數(shù)量和購買金額。屬于非常典型的消費行為數(shù)據(jù)集。

數(shù)據(jù)導(dǎo)入

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt#導(dǎo)入庫及所需的包
from datetime import datetime
plt.rc('font', family='SimHei', size=18)# 顯示中文標簽
plt.style.use ('ggplot')#設(shè)定繪圖風格

df=pd.read_excel(r"D:\PycharmProjects\data\cuscapi.xls", order_dt=['date'])# 數(shù)據(jù)加載
df.head(10)

	user_id	order_dt	order_products	order_amount
0	vs30033073	2020-01-17	1	20
1	vs30026748	2019-12-04	1	20
2	vs10000716	2019-07-05	1	20
3	vs30032785	2019-08-21	2	0
4	vs10000716	2019-10-24	1	20
5	vs30033073	2019-11-29	2	20
6	vs10000621	2019-07-19	2	20
7	vs30029475	2019-05-17	1	20
8	vs30030664	2019-11-11	1	20
9	vs10000773	2019-11-25	1	20

pd.set_option('display.float_format', lambda x: '%.2f' % x)
df.describe()

	order_products	order_amount
count	2013.00	2013.00
mean	1.47	22.90
std	0.91	94.94
min	1.00	0.00
25%	1.00	20.00
50%	1.00	20.00
75%	2.00	20.00
max	12.00	2650.00

分析：
1.會員用戶平均每筆訂單購買1.5個商品，標準差為在0.9，波性較小。中位數(shù)在1個商品，75分位數(shù)在2個產(chǎn)品，說明絕大訂單的購買量都不多。
2.平均每筆訂單消費金額為22.9元，標準差約為95，中位數(shù)在20，平均數(shù)大于中位數(shù)。大多數(shù)會員消費金額集中在小額，小部分用戶貢獻大額消費，符合消費類數(shù)據(jù)的二八分布。
3.一般而言，消費類數(shù)據(jù)的分布都是長尾分布。

user_group=df.groupby('user_id').sum()
user_group.head(10)

user_id	order_products	order_amount
vs10000005	9	189
vs10000621	214	5704
vs10000627	2	0
vs10000716	250	2616
vs10000743	1	20
vs10000757	75	1104
vs10000773	23	460
vs10000775	8	2730
vs10000788	7	144
vs10000794	1	0

user_group.describe()

	order_products	order_amount
count	247.00	247.00
mean	11.97	186.59
std	36.70	641.12
min	1.00	0.00
25%	2.00	0.00
50%	2.00	0.00
75%	3.00	66.00
max	277.00	5704.00

分析：會員用戶平均購買約12個商品，最多的購買了277個商品。會員用戶平均消費金額約為187元，標準差為641，中位數(shù)在0，結(jié)合分位數(shù)和最大值看，屬于正偏分布，存在小部分會員購買大量商品的高消費情況。

df.info()#查看數(shù)據(jù)類型

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2013 entries, 0 to 2012
Data columns (total 4 columns):
user_id           2013 non-null object
order_dt          2013 non-null datetime64[ns]
order_products    2013 non-null int64
order_amount      2013 non-null int64
dtypes: datetime64[ns](1), int64(2), object(1)
memory usage: 63.0+ KB

分析：經(jīng)查，本數(shù)據(jù)集不存在空值。

數(shù)據(jù)處理

數(shù)據(jù)類型轉(zhuǎn)換

#提取月份
df['order_dt']=df['order_dt'].dt.date
df['month']=df['order_dt'].astype('datetime64[M]')
df.head(10)

	user_id	order_dt	order_products	order_amount	month
0	vs30033073	2020-01-17	1	20	2020-01-01
1	vs30026748	2019-12-04	1	20	2019-12-01
2	vs10000716	2019-07-05	1	20	2019-07-01
3	vs30032785	2019-08-21	2	0	2019-08-01
4	vs10000716	2019-10-24	1	20	2019-10-01
5	vs30033073	2019-11-29	2	20	2019-11-01
6	vs10000621	2019-07-19	2	20	2019-07-01
7	vs30029475	2019-05-17	1	20	2019-05-01
8	vs30030664	2019-11-11	1	20	2019-11-01
9	vs10000773	2019-11-25	1	20	2019-11-01

數(shù)據(jù)分析-月度總趨勢分析

df.groupby('month').order_amount.sum().plot()
plt.xlabel('月份')
plt.ylabel('消費金額(元)')
plt.title('不同月份的用戶消費金額',fontsize=20)

output_17_1.png

分析：按月統(tǒng)計每個月的商品消費金額，可以看到，各月份銷量波動起伏較大。

df.groupby('month').order_products.sum().plot()
plt.xlabel('月份')
plt.ylabel('商品個數(shù)')
plt.title('不同月份的產(chǎn)品購買量',fontsize=20)

output_19_1.png

說明：每月的產(chǎn)品購買量呈現(xiàn)前7個月快速上升，后5個月整體下降的趨勢。

df.groupby('month').user_id.count().plot()
plt.xlabel('月份')
plt.ylabel('消費次數(shù)')
plt.title('不同月份的消費次數(shù)',fontsize=20)

output_21_1.png

說明：至7月份消費次數(shù)超過250次，后續(xù)月份的消費次數(shù)開始呈現(xiàn)下降趨勢。

df.groupby('month').user_id.nunique().plot()
plt.xlabel('月份')
plt.ylabel('消費人數(shù)')
plt.title('不同月份的消費人數(shù)',fontsize=20)

output_23_1.png

說明：每月的消費人數(shù)小于每月的消費次數(shù)。至7月份消費人數(shù)達90人，后續(xù)月份的消費人數(shù)開始呈現(xiàn)下降趨勢。

數(shù)據(jù)分析-用戶個體行為分析

df.groupby('user_id').sum().head()

user_id	order_products	order_amount
vs10000005	9	189
vs10000621	214	5704
vs10000627	2	0
vs10000716	250	2616
vs10000743	1	20

user_consume=df.groupby('user_id').sum()
plt.scatter(user_consume['order_products'], user_consume['order_amount'] ) 
plt.xlabel('消費產(chǎn)品個數(shù)')
plt.ylabel('消費金額')
plt.title('用戶消費金額與產(chǎn)品個數(shù)的關(guān)系散點圖',fontsize=20)

output_27_1.png

說明：訂單消費金額和訂單商品量的關(guān)系不呈線性，用戶消費規(guī)律性不強，訂單的極值較多。

consume_products = user_consume['order_products']
consume_amount= user_consume['order_amount'] 

fig= plt.figure(figsize=(10.,6))
fig.add_subplot(1,2,1)
consume_products.hist(bins=10 )
plt.title('用戶購買數(shù)量分布直方圖')
plt.xlabel('購買數(shù)量')
plt.ylabel('人數(shù)')

fig.add_subplot(1,2,2)
consume_amount.hist(bins=10) 
plt.title('用戶購買金額分布直方圖')
plt.xlabel('購買金額')
plt.ylabel('人數(shù)')

output_29_1.png

說明：大部分用戶消費能力不高，整個計算周期內(nèi)購買數(shù)量在50以內(nèi)，消費金額在1000以內(nèi)。

df.groupby('user_id').month.min().value_counts()

2019-08-01   62
2019-07-01    53
2019-09-01    43
2019-10-01    22
2019-11-01    16
2019-03-01    13
2020-01-01    11
2019-06-01     9
2019-05-01     8
2019-12-01     5
2020-02-01     3
2019-04-01     2
Name: month, dtype: int64

df.groupby('user_id').month.min().value_counts().plot()
plt.title('第一次消費會員數(shù)和時間折線圖')
plt.xlabel('首購時間')
plt.ylabel('會員數(shù)')

output_32_1.png

df.groupby('user_id').month.max().value_counts()

2019-08-01    65
2019-09-01    52
2019-07-01    39
2019-10-01    22
2020-01-01    21
2020-02-01    17
2019-11-01    16
2019-12-01     8
2019-03-01     3
2019-06-01     3
2019-05-01     1
Name: month, dtype: int64

df.groupby('user_id').month.max().value_counts().plot()
plt.title('最后一次消費會員數(shù)和時間折線圖')
plt.xlabel('最后購買時間')
plt.ylabel('會員數(shù)')

output_34_1.png

#各會員首次、最后一次消費時間間隔
(df.groupby('user_id')['month'].agg({'num1':'min', 'num2':'max'}).num2-df.groupby('user_id')['month'].agg({'num1':'min', 'num2':'max'}).num1).value_counts()

0 days      177
31 days      24
61 days       6
92 days       6
122 days      6
337 days      5
30 days       4
306 days      3
153 days      3
184 days      3
62 days       2
123 days      2
215 days      2
245 days      2
275 days      1
276 days      1
dtype: int64

說明：
1.用groupby函數(shù)將用戶分組，并且求月份的最小值、最小值即用戶消費行為的第一次消費時間。
2.大部分用戶的第一次消費集中在7、8月份，觀察用戶的最后一次消費時間，將近80%的客戶都在首次消費1個月內(nèi)流失。

數(shù)據(jù)分析-用戶行為中的復(fù)購率和回購率分析

數(shù)據(jù)透視-每位會員各月消費次數(shù)

#統(tǒng)計用戶消費次數(shù)
pivoted_counts=df.pivot_table(index='user_id',columns='month',values='order_dt',aggfunc='count').fillna(0)
columns_month=df.month.dt.date.sort_values().unique()
pivoted_counts.columns=columns_month
pivoted_counts.head()

user_id	2019-03-01	2019-04-01	2019-05-01	2019-06-01	2019-07-01	2019-08-01	2019-09-01	2019-10-01	2019-11-01	2019-12-01	2020-01-01	2020-02-01
vs10000005	2.00	0.00	3.00	0.00	0.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
vs10000621	6.00	17.00	19.00	20.00	17.00	5.00	2.00	18.00	18.00	21.00	16.00	10.00
vs10000627	0.00	0.00	0.00	0.00	2.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
vs10000716	0.00	0.00	0.00	0.00	14.00	19.00	24.00	12.00	30.00	15.00	12.00	5.00
vs10000743	1.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00

復(fù)購率分析

復(fù)購率的定義是在某時間窗口內(nèi)消費兩次及以上的用戶在總消費用戶中占比。這里的時間窗口是月，如果一個用戶在同一天下了兩筆訂單，這里也將他算作復(fù)購用戶。
消費兩次及以上記為1，消費一次記為0，沒有消費記為NaN。

pivoted_counts.transf=pivoted_counts.applymap(lambda x:1 if x>1 else np.NaN if x==0 else 0)
pivoted_counts.transf.head()

user_id	2019-03-01	2019-04-01	2019-05-01	2019-06-01	2019-07-01	2019-08-01	2019-09-01	2019-10-01	2019-11-01	2019-12-01	2020-01-01	2020-02-01
vs10000005	1.00	nan	1.00	nan	nan	nan	nan	nan	nan	0.00	nan	nan
vs10000621	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
vs10000627	nan	nan	nan	nan	1.00	nan	nan	nan	nan	nan	nan	nan
vs10000716	nan	nan	nan	nan	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
vs10000743	0.00	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan

month_counts_reorder_rate=pivoted_counts.transf.sum()/pivoted_counts.transf.count()
plt.plot(month_counts_reorder_rate)
plt.title('每月用戶復(fù)購率圖')
plt.xlabel('時間(月)')
plt.ylabel('百分比')

output_42_1.png

說明：3月至6月新用戶加入數(shù)量較少，拉高了復(fù)購率。在大量新用戶加入并流失的8月的復(fù)購率較低。而在后期，這時的用戶都是大浪淘沙剩下的老客，復(fù)購率繼續(xù)上升。

a,b=plt.subplots(figsize=(10,6))
b.plot(pivoted_counts.transf.count())
b.plot(pivoted_counts.transf.sum())
legends=['消費人數(shù)','二次消費以上用戶人數(shù)']
b.legend(legends)
plt.title('每月消費和二次消費以上用戶人數(shù)')
plt.xlabel('時間(月)')
plt.ylabel('用戶數(shù)')

output_44_1.png

回購率分析

回購率是某一個時間窗口內(nèi)消費的用戶，在下一個時間窗口仍舊消費的占比。比如1月消費用戶1000，他們中有300個2月依然消費，回購率是30%。

pivoted_amount=df.pivot_table(index='user_id',columns='month',values='order_amount',aggfunc='mean').fillna(0)
columns_month=df.month.dt.date.sort_values().unique()
pivoted_amount.columns=columns_month
pivoted_amount.head()

user_id	2019-03-01	2019-04-01	2019-05-01	2019-06-01	2019-07-01	2019-08-01	2019-09-01	2019-10-01	2019-11-01	2019-12-01	2020-01-01	2020-02-01
vs10000005	25.00	0.00	19.67	0.00	0.00	0.00	0.00	0.00	0.00	80.00	0.00	0.00
vs10000621	414.00	20.00	20.00	20.00	17.65	20.00	20.00	20.00	20.00	20.00	20.00	20.00
vs10000627	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
vs10000716	0.00	0.00	0.00	0.00	20.00	41.84	10.83	15.00	15.33	20.00	20.00	20.20
vs10000743	20.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00

#統(tǒng)計會員用戶是否回購
pivoted_purchase=pivoted_amount.applymap(lambda x:1 if x>1 else 0)
pivoted_purchase.head()

user_id	2019-03-01	2019-04-01	2019-05-01	2019-06-01	2019-07-01	2019-08-01	2019-09-01	2019-10-01	2019-11-01	2019-12-01	2020-01-01	2020-02-01
vs10000005	1	0	1	0	0	0	0	0	0	1	0	0
vs10000621	1	1	1	1	1	1	1	1	1	1	1	1
vs10000627	0	0	0	0	0	0	0	0	0	0	0	0
vs10000716	0	0	0	0	1	1	1	1	1	1	1	1
vs10000743	1	0	0	0	0	0	0	0	0	0	0	0

def purchase_return(data):
         status = []
         for i in range(11):
             if data[i] >= 1:
                 if data[i + 1] >= 1:
                     status.append(1)
                 else:
                     status.append(0)
             else:
                 status.append(np.NaN)
         status.append(np.NaN)
         return pd.Series(status)

pivoted_purchase_return = pivoted_purchase.apply(purchase_return,axis=1)
pivoted_purchase_return.columns=columns_month
pivoted_purchase_return .head()

user_id	2019-03-01	2019-04-01	2019-05-01	2019-06-01	2019-07-01	2019-08-01	2019-09-01	2019-10-01	2019-11-01	2019-12-01	2020-01-01	2020-02-01
vs10000005	0.00	nan	0.00	nan	nan	nan	nan	nan	nan	0.00	nan	nan
vs10000621	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	nan
vs10000627	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan
vs10000716	nan	nan	nan	nan	1.00	1.00	1.00	1.00	1.00	1.00	1.00	nan
vs10000743	0.00	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan

pivoted_purchase_return_rate=pivoted_purchase_return.sum()/pivoted_purchase_return.count()
plt.plot(pivoted_purchase_return_rate)
plt.title('12個月內(nèi)用戶回購率圖')
plt.xlabel('時間(月)')
plt.ylabel('百分比')
plt.xticks(rotation=90)

output_50_1.png

a,b=plt.subplots(figsize=(10,6))
b.plot(pivoted_purchase_return.count())
b.plot(pivoted_purchase_return.sum())
legends=['每月消費人數(shù)','每月回購人數(shù)']
b.legend(legends)
plt.title('每月消費和每月回購人數(shù)')
plt.xlabel('時間(月)')
plt.ylabel('用戶數(shù)')

output_51_1.png

a,b=plt.subplots(figsize=(10,6))
b.plot(pivoted_purchase_return_rate)
b.plot(month_counts_reorder_rate)
legends=['每月回購率','每月復(fù)購率']
b.legend(legends)
plt.title('每月回購率和每月復(fù)購率')
plt.xlabel('時間(月)')
plt.ylabel('百分比')

output_52_1.png

說明：大體上，每月用戶的復(fù)購率高于回購率，波動性也較強。新用戶的回購率在30%左右，和老客差異不大。

數(shù)據(jù)分析-用戶行為中層分析

RFM分層

user_rfm=df.pivot_table(index='user_id',values=['order_dt','order_products','order_amount'],aggfunc={'order_dt':'max','order_products':'count','order_amount':'sum'})
user_rfm.head()

user_id	order_amount	order_dt	order_products
vs10000005	189	2019-12-27	6
vs10000621	5704	2020-02-28	169
vs10000627	0	2019-07-23	2
vs10000716	2616	2020-02-28	131
vs10000743	20	2019-03-15	1

user_rfm['period']=(user_rfm.order_dt.max()-user_rfm.order_dt)/np.timedelta64(1,'D')
user_rfm=user_rfm.rename(columns={'period':'R','order_products':'F','order_amount':'M'})
user_rfm.head()

user_id	M	order_dt	F	R
vs10000005	189	2019-12-27	6	63.00
vs10000621	5704	2020-02-28	169	0.00
vs10000627	0	2019-07-23	2	220.00
vs10000716	2616	2020-02-28	131	0.00
vs10000743	20	2019-03-15	1	350.00

#定義分層函數(shù)
def rfm_func(x):
    level=x.apply(lambda x:'1' if x>=0 else '0')
    label=level.R+level.F+level.M
    d={'111':'高價值客戶','011':'重點保持客戶',
       '101':'重點發(fā)展客戶','001':'重點挽留客戶',
      '110':'一般價值客戶','010':'一般保持客戶',
     '100':'一般發(fā)展客戶','000':'潛在客戶'}
    result=d[label]
    return result
user_rfm['label']=user_rfm[['R','F','M']].apply( lambda x:x-x.mean()).apply(rfm_func,axis=1)
user_rfm.head()

user_id	M	order_dt	F	R	label
vs10000005	189	2019-12-27	6	63.00	重點挽留客戶
vs10000621	5704	2020-02-28	169	0.00	重點保持客戶
vs10000627	0	2019-07-23	2	220.00	一般發(fā)展客戶
vs10000716	2616	2020-02-28	131	0.00	重點保持客戶
vs10000743	20	2019-03-15	1	350.00	一般發(fā)展客戶

user_rfm.groupby('label').count()

label	M	order_dt	F	R
一般保持客戶	3	3	3	3
一般發(fā)展客戶	146	146	146	146
潛在客戶	63	63	63	63
重點保持客戶	24	24	24	24
重點發(fā)展客戶	2	2	2	2
重點挽留客戶	2	2	2	2
高價值客戶	7	7	7	7

user_rfm.groupby('label').sum()

label	M	F	R
一般保持客戶	352	34	98.00
一般發(fā)展客戶	2653	272	28793.00
潛在客戶	1723	125	6377.00
重點保持客戶	32494	1416	846.00
重點發(fā)展客戶	2091	5	575.00
重點挽留客戶	2919	9	165.00
高價值客戶	3856	152	1429.00

from matplotlib import font_manager as fm #字體管理器
from matplotlib import cm#
proptease = fm.FontProperties()
proptease.set_size('medium')

labelindex =user_rfm.groupby('label').count().index
labelvalues =user_rfm.groupby('label')['M'].count().tolist()
s = pd.Series(labelvalues, index=labelindex )
labels = s.index
sizes = s.values

explode = (0,0,0,0,0.1,0.1,0.2)  # only "explode" the 1st slice
fig, axes = plt.subplots(1,2,figsize=(10,6))
ax1,ax2 = axes.ravel()#結(jié)合ravel()函數(shù)列出所有子圖

colors = cm.rainbow(np.arange(len(sizes))/len(sizes))# # 隨機生成顏色
# patches：餅片。texts：分類標簽的文本列表。autotexts：百分比部分的文本列表
patches, texts, autotexts = ax1.pie(sizes, labels=labels, autopct='%1.0f%%',explode=explode,
shadow=False, startangle=170, colors=colors, labeldistance=1.2,pctdistance=1.05, radius=0.4)
ax1.axis('equal')#將餅圖顯示為正圓形
plt.setp(texts, fontproperties=proptease)

# 設(shè)置百分比文本樣式
for i in autotexts:
    i.set_size('large')
ax1.set_title('用戶分層結(jié)構(gòu)餅狀圖', loc='center')
ax2.axis('off')#關(guān)閉所有坐標軸線、刻度標記和標簽
ax2.legend(patches, labels, loc='center left',fontsize=10)
plt.tight_layout()#tight_layout會自動調(diào)整子圖參數(shù)，使之填充整個圖像區(qū)域

output_61_0.png

分析：從用戶分層結(jié)果可知，一般發(fā)展客戶占了較大的比重，為59%，潛在客戶排第二位，占比26%。

總分層分析

按照用戶的消費行為，簡單劃分成幾個維度：新用戶、活躍用戶、不活躍用戶、回流用戶。
新用戶(new)的定義是第一次消費。
活躍用戶(active)即老客，在某一個時間窗口內(nèi)有過消費。
不活躍用戶(unactive)則是時間窗口內(nèi)沒有消費過的老客。
回流用戶(return)是在上一個窗口中沒有消費，而在當前時間窗口內(nèi)有過消費。
以上的時間窗口都是按月統(tǒng)計。

def active_status(data):
    status = []
    for i in range(12):
        #若本月沒有消費
        if data[i] == 0:
            if len(status) > 0:
                if status[i-1] == 'unreg': #未注冊
                    status.append('unreg')
                else:
                    status.append('unactive')
            else:
                status.append('unreg')
          #若本月有消費      
        else:
            if len(status) == 0:
                status.append('new')
            else:
                if status[i-1] == 'unactive':
                    status.append('return')
                elif status[i-1] == 'unreg':
                    status.append('new')
                else:
                    status.append('active')
    return pd.Series(status)
pivoted_purchase_status = pivoted_purchase.apply( lambda x:active_status(x),axis=1)
pivoted_purchase_status.columns=columns_month
pivoted_purchase_status .head()

user_id	2019-03-01	2019-04-01	2019-05-01	2019-06-01	2019-07-01	2019-08-01	2019-09-01	2019-10-01	2019-11-01	2019-12-01	2020-01-01	2020-02-01
vs10000005	new	unactive	return	unactive	unactive	unactive	unactive	unactive	unactive	return	unactive	unactive
vs10000621	new	active	active	active	active	active	active	active	active	active	active	active
vs10000627	unreg	unreg	unreg	unreg	unreg	unreg	unreg	unreg	unreg	unreg	unreg	unreg
vs10000716	unreg	unreg	unreg	unreg	new	active	active	active	active	active	active	active
vs10000743	new	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive

pivoted_status_counts=pivoted_purchase_status.replace('unreg',np.NaN).apply(lambda x:pd.value_counts(x))
pivoted_status_counts

	2019-03-01	2019-04-01	2019-05-01	2019-06-01	2019-07-01	2019-08-01	2019-09-01	2019-10-01	2019-11-01	2019-12-01	2020-01-01	2020-02-01
active	nan	7.00	9	16.00	20.00	18.00	11	12	15	14	16	10
new	13.00	2.00	8	9.00	17.00	17.00	10	9	6	7	7	1
return	nan	nan	1	nan	nan	nan	4	5	1	4	2	3
unactive	nan	6.00	5	7.00	12.00	31.00	51	59	69	73	80	92

plt.plot(pivoted_status_counts.T)
plt.title('每月各種用戶類型占比折線圖')
plt.legend(pivoted_status_counts.index)
plt.xlabel('時間(月)')
plt.ylabel('用戶數(shù)')

output_66_1.png

分析：黑色的不活躍用戶占了較大的比重。紅色的活躍用戶較穩(wěn)定，其與紫色的回流用戶相加大抵是本月消費人數(shù)。

回流用戶及活躍用戶分析

return_rate=pivoted_status_counts.apply(lambda x:x/x.sum(),axis=1)
plt.plot(return_rate.loc[['active','return'],].T)
plt.title('每月活躍用戶、回流用戶占比')
plt.xlabel('時間(月)')
plt.ylabel('百分數(shù)')
plt.xticks(rotation=90)

output_69_1.png

說明：結(jié)合回流用戶和活躍用戶看，在后期的消費用戶中，70%是回流用戶，30%是活躍用戶，整體質(zhì)量還好。

數(shù)據(jù)分析-用戶質(zhì)量分析

總質(zhì)量分析

user_amount=df.groupby('user_id').order_amount.sum().sort_values().reset_index()
user_amount['amount_cumsum']=user_amount.order_amount.cumsum()
user_amount.tail()

	user_id	order_amount	amount_cumsum
242	vs10000716	2616	29735
243	vs10000775	2730	32465
244	vs30026748	3296	35761
245	vs30029475	4623	40384
246	vs10000621	5704	46088

amount_total=user_amount.amount_cumsum.max()
user_amount['prop'] = user_amount.amount_cumsum.apply(lambda x:x/amount_total)
plt.plot(user_amount.prop )
plt.title('用戶累計貢獻金額百分比')
plt.xlabel('人數(shù)')
plt.ylabel('百分數(shù)')

output_74_1.png

說明：此次數(shù)據(jù)集用戶總共247人，可見其中47人(約占總?cè)藬?shù)的19%)貢獻了超過80%的銷售金額。

數(shù)據(jù)分析-用戶生命周期分析

第一生命周期

#各會員首次、最后一次消費時間間隔
order_dt_min=df.groupby('user_id').order_dt.min()
order_dt_max=df.groupby('user_id').order_dt.max()
life_time=(order_dt_max-order_dt_min).reset_index()
life_time.head()

	user_id	order_dt
0	vs10000005	273 days
1	vs10000621	351 days
2	vs10000627	1 days
3	vs10000716	238 days
4	vs10000743	0 days

life_time.describe()

	order_dt
count	247
mean	32 days 03:59:01.700404
std	73 days 19:15:10.251372
min	0 days 00:00:00
25%	0 days 00:00:00
50%	1 days 00:00:00
75%	13 days 00:00:00
max	351 days 00:00:00

分析：由描述可知，所有用戶的平均生命周期是32天，中位數(shù)是1天，即存在50%的客戶首次消費即最后一次消費。
最大值351天，即本數(shù)據(jù)集的總天數(shù)，說明存在從開始到最后都消費的高質(zhì)量用戶。

((order_dt_max-order_dt_min)/np.timedelta64(1,'D')).hist(bins=15)
plt.title('用戶生命周期直方圖')
plt.xlabel('天數(shù)')
plt.ylabel('人數(shù)')

output_81_1.png

消費兩次以上的用戶生命周期

life_time['life_time']=life_time.order_dt/np.timedelta64(1,'D')
life_time[life_time.life_time>0].life_time.hist(bins=15)#排除僅消費一次的客戶
plt.title('二次消費以上用戶生命周期直方圖')
plt.xlabel('天數(shù)')
plt.ylabel('人數(shù)')

output_83_1.png

life_time[life_time.life_time>0].life_time.describe()

count   155.00
mean     51.26
std      87.84
min       1.00
25%       2.00
50%       7.00
75%      53.50
max     351.00
Name: life_time, dtype: float64

分析：二次消費以上用戶生命周期為51天，略高于總體。從策略上看，用戶首次消費后應(yīng)該引導(dǎo)其再次消費。

數(shù)據(jù)分析-用戶留存率分析

#留存率指用戶在第一次消費后，有多少比率進行第二次消費。和回流率的區(qū)別是留存傾向于計算第一次消費，并且有多個時間窗口。
user_purchase_retention=pd.merge(left=df,right=order_dt_min.reset_index(),how='inner',on='user_id',suffixes=('','_min'))
user_purchase_retention['date_diff']=(user_purchase_retention.order_dt-user_purchase_retention.order_dt_min)/np.timedelta64(1,'D')
bin=[0,30,60,90,120,150,180,365]
user_purchase_retention['date_diff_bin']=pd.cut(user_purchase_retention['date_diff'],bins=bin)

user_purchase_retention.head(10)

user_id	order_dt	order_products	order_amount	month	order_dt_min	date_diff	date_diff_bin
vs30033073	2020-01-17	1	20	2020-01-01	2019-09-23	116.00	(90, 120]
vs30033073	2019-11-29	2	20	2019-11-01	2019-09-23	67.00	(60, 90]
vs30033073	2019-11-13	2	20	2019-11-01	2019-09-23	51.00	(30, 60]
vs30033073	2019-12-24	2	20	2019-12-01	2019-09-23	92.00	(90, 120]
vs30033073	2019-10-29	2	20	2019-10-01	2019-09-23	36.00	(30, 60]
vs30033073	2020-01-07	2	20	2020-01-01	2019-09-23	106.00	(90, 120]
vs30033073	2019-12-09	2	20	2019-12-01	2019-09-23	77.00	(60, 90]
vs30033073	2020-01-06	1	20	2020-01-01	2019-09-23	105.00	(90, 120]
vs30033073	2019-11-01	2	20	2019-11-01	2019-09-23	39.00	(30, 60]
vs30033073	2019-10-21	2	20	2019-10-01	2019-09-23	28.00	(0, 30]

pivoted_retention=user_purchase_retention.pivot_table(index='user_id',columns='date_diff_bin',values='order_amount',aggfunc=sum,dropna=False)
pivoted_retention.head()

user_id	(0,30]	(30,60]	(60,90]	(90,120]	(120,150]	(150,180]	(180,365]
vs10000005	nan	59	nan	nan	nan	nan	80
vs10000621	240	300	420	400	200	40	1700
vs10000627	0	nan	nan	nan	nan	nan	nan
vs10000716	280	795	240	220	440	280	341
vs10000743	nan	nan	nan	nan	nan	nan	nan

pivoted_retention.mean()

date_diff_bin
(0, 30]       52.70
(30, 60]     148.62
(60, 90]     171.52
(90, 120]    307.59
(120, 150]   112.90
(150, 180]   111.60
(180, 365]   700.36
dtype: float64

pivoted_retention.transf=pivoted_retention.fillna(0).applymap(lambda x:1 if x>0 else 0)
(pivoted_retention.transf.sum()/pivoted_retention.transf.count()).plot.bar()
plt.title('各時間段的用戶留存率')
plt.xlabel('時間跨度(天)')
plt.ylabel('百分數(shù)')

output_90_2.png

分析：第一個月的留存率約超過17.5%,第二個月下降至15%,之后幾個月穩(wěn)定在6%左右，說明后面幾個月流失率較大。

數(shù)據(jù)分析-決策分析

平均購買周期：用戶兩次消費行為的時間間隔。

def diff(group):
    d=group.date_diff.shift(-1)-group.date_diff
    return d
last_diff=user_purchase_retention.sort_values("order_dt").reset_index().groupby('user_id').apply(diff)
last_diff.head(10)

user_id         
vs10000005  31       0.00
            34      42.00
            158      1.00
            160      0.00
            161    230.00
            1715      nan
vs10000621  2        0.00
            3       11.00
            22       1.00
            26       1.00
Name: date_diff, dtype: float64

last_diff.describe()

count   1766.00
mean       4.50
std       14.03
min        0.00
25%        1.00
50%        1.00
75%        4.00
max      230.00
Name: date_diff, dtype: float64

說明：可知用戶的平均消費間隔時間是4.5天。想要召回用戶，在4.5天左右的消費間隔是比較好的。

last_diff.hist(bins=15)
plt.title('用戶平均購買周期直方圖')
plt.xlabel('時間跨度(天)')
plt.ylabel('百分數(shù)')

output_96_1.png

說明：典型的長尾分布，大部分用戶的消費間隔比較短。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

某健身平臺會員用戶消費行為分析

某健身平臺會員用戶消費行為分析

數(shù)據(jù)集簡介

數(shù)據(jù)導(dǎo)入

數(shù)據(jù)處理

數(shù)據(jù)類型轉(zhuǎn)換

數(shù)據(jù)分析-月度總趨勢分析

數(shù)據(jù)分析-用戶個體行為分析

數(shù)據(jù)分析-用戶行為中的復(fù)購率和回購率分析

數(shù)據(jù)透視-每位會員各月消費次數(shù)

復(fù)購率分析

回購率分析

數(shù)據(jù)分析-用戶行為中層分析

RFM分層

總分層分析

回流用戶及活躍用戶分析

數(shù)據(jù)分析-用戶質(zhì)量分析

總質(zhì)量分析

數(shù)據(jù)分析-用戶生命周期分析

第一生命周期

消費兩次以上的用戶生命周期

數(shù)據(jù)分析-用戶留存率分析

數(shù)據(jù)分析-決策分析

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

某健身平臺會員用戶消費行為分析

數(shù)據(jù)集簡介

數(shù)據(jù)導(dǎo)入

數(shù)據(jù)處理

數(shù)據(jù)類型轉(zhuǎn)換

數(shù)據(jù)分析-月度總趨勢分析

數(shù)據(jù)分析-用戶個體行為分析

數(shù)據(jù)分析-用戶行為中的復(fù)購率和回購率分析

數(shù)據(jù)透視-每位會員各月消費次數(shù)

復(fù)購率分析

回購率分析

數(shù)據(jù)分析-用戶行為中層分析

RFM分層

總分層分析

回流用戶及活躍用戶分析

數(shù)據(jù)分析-用戶質(zhì)量分析

總質(zhì)量分析

數(shù)據(jù)分析-用戶生命周期分析

第一生命周期

消費兩次以上的用戶生命周期

數(shù)據(jù)分析-用戶留存率分析

數(shù)據(jù)分析-決策分析

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av