黑色星期五銷售數(shù)據(jù)分析

1. 提出問(wèn)題

數(shù)據(jù)分析不是為了分析而分析，而是要通過(guò)數(shù)據(jù)分析來(lái)達(dá)到某種目的。對(duì)黑色星期五銷售數(shù)據(jù)進(jìn)行分析，是希望通過(guò)數(shù)據(jù)分析來(lái)更好地了解客戶購(gòu)買行為。

針對(duì)數(shù)據(jù)提供的信息，主要從這幾個(gè)方面去分析：

年齡
性別
城市
居住城市年數(shù)
職業(yè)
婚姻狀況
商品和類別

前面6個(gè)是分析用戶畫像，最后一個(gè)是從商品的角度分析。

2. 數(shù)據(jù)理解

2.1 數(shù)據(jù)來(lái)源

數(shù)據(jù)集來(lái)自kaggle平臺(tái)的黑色星期五銷售數(shù)據(jù)Black-Friday，該數(shù)據(jù)集是零售商店中進(jìn)行交易的樣本。

2.2 字段說(shuō)明

該數(shù)據(jù)集總共包含12個(gè)字段，如下：

序號(hào)	字段名	數(shù)據(jù)類型	字段描述	備注
1	User_ID	String	用戶ID
1	Product_ID	String	商品ID
3	Gender	String	性別	F：女，M：男
4	Age	String	年齡	7個(gè)年齡段
5	Occupation	String	職業(yè)	用0-20表示
6	City_Category	String	城市類別	A，B，C
7	Stay_In_Current_City_Years	Integer	居住城市年數(shù)	0，1， 2， 3， 4+
8	Marital_Status	Integer	婚姻狀況	0：已婚，1：未婚
9	Product_Category_1	Integer	產(chǎn)品類別1

2.3 數(shù)據(jù)探索

數(shù)據(jù)讀取

df = pd.read_csv('BlackFriday.csv')

查看行數(shù)和列數(shù)

df.shape

輸出：(537577, 12)，537577行，12列

查看索引、數(shù)據(jù)類型和內(nèi)存信息

df.info()

Product_Category_2和Product_Category_2是存在空值的。

查看簡(jiǎn)要的統(tǒng)計(jì)信息

df.describe()

查看10行數(shù)據(jù)

df.head(10)

3. 數(shù)據(jù)處理

列名重命名

為了方便看，可以先對(duì)列名進(jìn)行重命名成中文。

df = df.rename(columns={'User_ID': '用戶ID', 'Product_ID': '商品ID', 'Gender': '性別', 'Age': '年齡', 'Occupation': '行業(yè)', 'City_Category': '城市類別', 'Stay_In_Current_City_Years': '居住城市年數(shù)', 'Marital_Status': '婚姻狀況', 'Product_Category_1': '產(chǎn)品類別1', 'Product_Category_2': '產(chǎn)品類別2', 'Product_Category_3': '產(chǎn)品類別3', 'Purchase': '采購(gòu)額'})

缺失值處理

(df.shape[0]- df.dropna(how='any').shape[0])/df.shape[0]

產(chǎn)品類別2和產(chǎn)品類別3是有缺失數(shù)據(jù)的，缺失是比例占69%，數(shù)據(jù)量太大，不能刪除，而且產(chǎn)品類別不好填充。但是因?yàn)樵诜治龅倪^(guò)程中這兩個(gè)字段不進(jìn)行分析，所以這里不管缺失值。

df_dd = df.drop_duplicates(subset=['用戶ID'])[['用戶ID', '性別', '年齡', '職業(yè)', '城市類別', '居住城市年數(shù)', '婚姻狀況']].sort_values(by='用戶ID')
df_dd['采購(gòu)額'] = df.groupby('用戶ID')['采購(gòu)額'].sum().sort_index().values

4. 數(shù)據(jù)分析

4.1 性別

explode = (0.1,0)  
fig1, ax1 = plt.subplots(figsize=(10,7))
patches, texts, autotexts = ax1.pie(df_dd['性別'].value_counts(), explode=explode,labels=['男','女'], autopct='%1.1f%%',
        shadow=True, startangle=90, colors=sns.color_palette("Blues_d", 2))

ax1.axis('equal')
plt.tight_layout()
plt.legend()
for t in texts:
    t.set_size('xx-large')
for at in autotexts:
    at.set_size('xx-large')

plt.legend(fontsize='16')
plt.show()

s_gender = df_dd.groupby('性別')['采購(gòu)額'].sum().sort_values()

plt.figure(figsize=(12, 6))
plt.subplot(1, 1, 1)
sc = sns.color_palette("Blues_d", 2)
sns.barplot(s_gender.index, s_gender.values, palette=sc)
plt.xlabel('', fontsize=16)
plt.ylabel('', fontsize=16)
plt.xticks(np.arange(2), ('女', '男'))

plt.title('', fontsize=18)
plt.show()

從消費(fèi)人數(shù)與消費(fèi)金額兩個(gè)維度來(lái)看，男性都遠(yuǎn)遠(yuǎn)超過(guò)女性，這個(gè)結(jié)論與國(guó)內(nèi)男性、女性的消費(fèi)存在很大的差異，有點(diǎn)出乎意料。

4.2 婚姻狀況

explode = (0.1, 0)  
fig1, ax1 = plt.subplots(figsize=(10,7))
patches, texts, autotexts = ax1.pie(df_dd['婚姻狀況'].value_counts(), explode=explode, labels=['已婚','未婚'], autopct='%1.1f%%',
        shadow=True, startangle=90, colors=sns.color_palette("Blues_d", 2))

ax1.axis('equal')
plt.tight_layout()
plt.legend(fontsize=18)
for t in texts:
    t.set_size('xx-large')
for at in autotexts:
    at.set_size('xx-large')
    
plt.legend(fontsize='16')
plt.show()

從圖中看，購(gòu)買人群中已婚的要多于未婚的，結(jié)婚的生活需要購(gòu)買的需要多一點(diǎn)，可能家庭消費(fèi)比較多。在進(jìn)行營(yíng)銷的時(shí)候，要偏向于已婚人群。

fig1, ax1 = plt.subplots(figsize=(12,7))
sc = sns.color_palette("Blues", 2)
sns.countplot(df_dd['婚姻狀況'],hue=df['性別'], palette=sc)

plt.xticks(np.arange(2), ('已婚', '未婚'))
plt.xlabel('', fontsize=16)
plt.ylabel('', fontsize=16)
plt.legend(fontsize=16)
plt.show()

可以再?gòu)男詣e的維度看，無(wú)論是已婚還是未婚，都是男性大于女性，不會(huì)受到婚姻狀況的影響。因?yàn)椴恢喇?dāng)?shù)氐那闆r，無(wú)法下定結(jié)論就是女性沒(méi)有購(gòu)買力?？梢赃M(jìn)一步調(diào)查，如果只是女性的市場(chǎng)沒(méi)有打開，嘗試提高女性購(gòu)買的欲望，可能會(huì)有收獲。

4.3 年齡

fig1, ax1 = plt.subplots(figsize=(12,7))
sc = sns.color_palette("Blues", 2)
sns.countplot(df_dd['年齡'],hue=df['性別'], order=['0-17', '18-25', '26-35', '36-45', '46-50', '51-55', '55+'], palette=sc)
plt.xlabel('', fontsize=16)
plt.ylabel('', fontsize=16)
plt.legend(fontsize=16)
plt.show()

從年齡看，無(wú)論是男性，還是女性，消費(fèi)人數(shù)都是集中18-45歲。其中26-35這個(gè)年齡段最多，這個(gè)年齡段的人消費(fèi)能力大。

s_gender = df_dd.groupby('年齡')['采購(gòu)額'].sum()

plt.figure(figsize=(10, 6))
plt.subplot(1, 1, 1)
sc = sns.color_palette("Blues_r", 8)
sns.barplot(s_gender.index, s_gender.values, order=['0-17', '18-25', '26-35', '36-45', '46-50', '51-55', '55+'], palette=sc)


plt.xlabel('', fontsize=16)
plt.ylabel('', fontsize=16)
plt.grid(axis='x')
plt.title('', fontsize=18)
plt.grid(axis='x')
plt.show()

消費(fèi)金額的分布是跟購(gòu)買人數(shù)的分布式一致的，都是集中在18-45歲，這個(gè)年齡區(qū)間的人購(gòu)買力比較大。

4.4 城市

explode = (0.1, 0, 0)
fig1, ax1 = plt.subplots(figsize=(10,7))
patches, texts, autotexts = ax1.pie(df_dd['城市類別'].value_counts(), explode=explode,labels=df['城市類別'].unique(), autopct='%1.1f%%',
        shadow=True, startangle=90, colors=sns.color_palette("Blues_r", 3))

ax1.axis('equal')
plt.tight_layout()
for t in texts:
    t.set_size('xx-large')
for at in autotexts:
    at.set_size('xx-large')

plt.legend(fontsize='16')
plt.show()

從購(gòu)買的人數(shù)看，C城市人最多，A城市人最少。

explode = (0.1, 0, 0)  
fig1, ax1 = plt.subplots(figsize=(10,7))
patches, texts, autotexts = ax1.pie(df_dd.groupby('城市類別')['采購(gòu)額'].sum(), explode=explode,labels=df['城市類別'].unique(), autopct='%1.1f%%',
        shadow=True, startangle=90, colors=sns.color_palette("Blues_r", 3))

ax1.axis('equal')
plt.tight_layout()
plt.legend()
for t in texts:
    t.set_size('xx-large')
for at in autotexts:
    at.set_size('xx-large')

plt.legend(fontsize='16')
plt.show()

從消費(fèi)總額看，A城市是最低的，C城市雖然購(gòu)買人數(shù)超過(guò)一半，但是消費(fèi)總額卻三分之一都不到。

可以看出，B城市的人購(gòu)買力是最大的，購(gòu)買的人數(shù)雖少，但是每個(gè)人的購(gòu)買金額要大于其他兩個(gè)城市的人。其次是A城市，購(gòu)買力最低的是C城市，雖然C城市購(gòu)買的人數(shù)要多于其他兩個(gè)城市，但是消費(fèi)總額卻低于其他兩個(gè)城市，可以看出A城市的購(gòu)買力比較低。

hue_order=['0-17', '18-25', '26-35', '36-45', '46-50', '51-55', '55+']
order=['A', 'B', 'C']
fig1, ax1 = plt.subplots(figsize=(12,7))
sc = sns.color_palette("Blues_d", 7)
sns.countplot(df_dd['城市類別'],hue=df['年齡'], order=order, hue_order=hue_order, palette=sc)
plt.xlabel('', fontsize=16)
plt.ylabel('', fontsize=16)
plt.legend(fontsize=16)
plt.show()

從3個(gè)城市的年齡段分布看，A城市在各個(gè)年齡段的人數(shù)都是最少的，C城市高齡人數(shù)比較多。

4.5 居住城市年數(shù)

labels=['1年','2年','3年','4年以上','游客']
explode = (0.1, 0.1,0,0,0)
fig1, ax1 = plt.subplots(figsize=(10,7))
patches, texts, autotexts = ax1.pie(df_dd['居住城市年數(shù)'].value_counts(),explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90, colors=sns.color_palette("Blues_d"))
sc = sns.color_palette("hls", 5)
sns.set_palette(sc)
ax1.axis('equal')  
plt.tight_layout()
plt.legend(fontsize=16)
for t in texts:
    t.set_size('xx-large')
for at in autotexts:
    at.set_size('xx-large')
plt.show()

labels=['1年','2年','3年','4年以上','游客']
explode = (0.1, 0.1,0,0,0)
fig1, ax1 = plt.subplots(figsize=(10,7))
patches, texts, autotexts = ax1.pie(df_dd.groupby('居住城市年數(shù)')['采購(gòu)額'].sum(), explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90, colors=sns.color_palette("Blues_d"))
sc = sns.color_palette("hls", 5)
sns.set_palette(sc)
ax1.axis('equal')
plt.tight_layout()
plt.legend(fontsize=16)
for t in texts:
    t.set_size('xx-large')
for at in autotexts:
    at.set_size('xx-large')
plt.show()

從購(gòu)買人數(shù)看，居住在城市第一年的購(gòu)買人數(shù)是最多的，從消費(fèi)總額看，第二年的人購(gòu)買消費(fèi)總額是最高的，但是購(gòu)買人數(shù)是比第一年的人少。隨著居住年數(shù)的增加，購(gòu)買的人數(shù)是遞減的。

居住在城市第二年的人消費(fèi)人數(shù)和消費(fèi)金額都是最高的，其他都比較低，對(duì)于居住2年的可以進(jìn)行適當(dāng)營(yíng)銷，提高留存。

4.6 職業(yè)

fig1, ax1 = plt.subplots(figsize=(12,7))
x = df_dd['職業(yè)'].value_counts().sort_values().index
y = df_dd['職業(yè)'].value_counts().sort_values().values
sns.barplot(x, y, order=x, palette="Blues_d")
plt.xlabel('', fontsize=16)
plt.ylabel('', fontsize=16)
plt.show()

fig1, ax1 = plt.subplots(figsize=(12,7))
x = df_dd.groupby('職業(yè)')['采購(gòu)額'].sum().sort_values().index
y = df_dd.groupby('職業(yè)')['采購(gòu)額'].sum().sort_values().sort_values().values
sns.barplot(x, y, order=x, palette="Blues_d")
plt.xlabel('', fontsize=16)
plt.ylabel('', fontsize=16)
plt.show()

各職業(yè)的消費(fèi)人數(shù)和消費(fèi)總額排名大致一樣，前三名都是4、0、7，購(gòu)買人數(shù)多，消費(fèi)總額高。根據(jù)購(gòu)買人數(shù)的因素，應(yīng)該把更多的商品針對(duì)購(gòu)買職業(yè)人數(shù)多的職業(yè)。因?yàn)闊o(wú)法知道具體職業(yè)是什么，無(wú)從知道更多信息。

5. 結(jié)論

男性的消費(fèi)人數(shù)和消費(fèi)總額都遠(yuǎn)超女性，跟中國(guó)的男女購(gòu)買情況有所差異。
已婚的購(gòu)買人數(shù)比未婚的多。
都是集中在18-45歲，這個(gè)年齡區(qū)間的人購(gòu)買力比較大
B城市的購(gòu)買力最大，購(gòu)買人數(shù)最多的并不一定是購(gòu)買力最大的。
購(gòu)買人數(shù)隨著居住城市年數(shù)的增加而減少，但是居住兩年的人消費(fèi)總額是最高的。
各職業(yè)的消費(fèi)總額跟購(gòu)買人數(shù)相關(guān)，職業(yè)人數(shù)差異還是比較大的。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

黑色星期五銷售數(shù)據(jù)分析