谷歌應用商品APP數(shù)據(jù)集的分析

谷歌應用商品APP的分析報告

Google play store analysis

數(shù)據(jù)集來自kaggle,爬取的谷歌應用商店的APP數(shù)據(jù)
我們今天來探索一下數(shù)據(jù),并且看下哪些因素可以影響顧客評分Rating

環(huán)境 python 3.6, windows 10, jupyter notebook

首先導入相關分析包

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
#導入數(shù)據(jù)集
data =pd.read_csv('googleplaystore.csv')

探索數(shù)據(jù)

# 首先看下數(shù)據(jù)頭
data.head()
image.png
#看下總體情況
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
App               10841 non-null object
Category          10841 non-null object
Rating            9367 non-null float64
Reviews           10841 non-null object
Size              10841 non-null object
Installs          10841 non-null object
Type              10840 non-null object
Price             10841 non-null object
Content Rating    10840 non-null object
Genres            10841 non-null object
Last Updated      10841 non-null object
Current Ver       10833 non-null object
Android Ver       10838 non-null object
dtypes: float64(1), object(12)
memory usage: 1.1+ MB

這份數(shù)據(jù)有10841行,13個字段包括APP名,分類,打分,下載量,評論量,是否付費,價格,最新更新日期,版本

首先要轉(zhuǎn)化數(shù)據(jù)成我們需要的格式,Rating,Size,Price要轉(zhuǎn)換成數(shù)值型,Last updated要轉(zhuǎn)換成時間序列

#改變?yōu)閿?shù)值型
#data.Reviews.value_counts()
pd.to_numeric(data['Reviews'])
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

pandas\src\inference.pyx in pandas.lib.maybe_convert_numeric (pandas\lib.c:55708)()


ValueError: Unable to parse string "3.0M"


During handling of the above exception, another exception occurred:


ValueError                                Traceback (most recent call last)

<ipython-input-5-e509e4352e56> in <module>()
      1 #改變?yōu)閿?shù)值型
      2 #data.Reviews.value_counts()
----> 3 pd.to_numeric(data['Reviews'])


C:\Users\renhl1\Anaconda3\lib\site-packages\pandas\tools\util.py in to_numeric(arg, errors, downcast)
    193             coerce_numeric = False if errors in ('ignore', 'raise') else True
    194             values = lib.maybe_convert_numeric(values, set(),
--> 195                                                coerce_numeric=coerce_numeric)
    196 
    197     except Exception:


pandas\src\inference.pyx in pandas.lib.maybe_convert_numeric (pandas\lib.c:56097)()


ValueError: Unable to parse string "3.0M" at position 10472
#第10472行有問題,看下什么原因
data.loc[10472,]
App               Life Made WI-Fi Touchscreen Photo Frame
Category                                              1.9
Rating                                                 19
Reviews                                              3.0M
Size                                               1,000+
Installs                                             Free
Type                                                    0
Price                                            Everyone
Content Rating                                        NaN
Genres                                  February 11, 2018
Last Updated                                       1.0.19
Current Ver                                    4.0 and up
Android Ver                                           NaN
Name: 10472, dtype: object
#可以看出這行數(shù)據(jù)錯誤,直接刪除
data.drop(10472,inplace=True)
data['Reviews']=data['Reviews'].astype(int)
#更改Size為數(shù)值型
data.Size.unique()
array(['19M', '14M', '8.7M', '25M', '2.8M', '5.6M', '29M', '33M', '3.1M',
       '28M', '12M', '20M', '21M', '37M', '2.7M', '5.5M', '17M', '39M',
       '31M', '4.2M', '7.0M', '23M', '6.0M', '6.1M', '4.6M', '9.2M',
       '5.2M', '11M', '24M', 'Varies with device', '9.4M', '15M', '10M',
       '1.2M', '26M', '8.0M', '7.9M', '56M', '57M', '35M', '54M', '201k',
       '3.6M', '5.7M', '8.6M', '2.4M', '27M', '2.5M', '16M', '3.4M',
       '8.9M', '3.9M', '2.9M', '38M', '32M', '5.4M', '18M', '1.1M', '2.2M',
       '4.5M', '9.8M', '52M', '9.0M', '6.7M', '30M', '2.6M', 
       ……
       '892k', '154k', '860k', '364k', '387k', '626k', '161k', '879k',
       '39k', '970k', '170k', '141k', '160k', '144k', '143k', '190k',
       '376k', '193k', '246k', '73k', '658k', '992k', '253k', '420k',
       '404k', '470k', '226k', '240k', '89k', '234k', '257k', '861k',
       '467k', '157k', '44k', '676k', '67k', '552k', '885k', '1020k',
       '582k', '619k'], dtype=object)
# 去掉錯誤值 Varies with device,替換為空值
data['Size'].replace('Varies with device', np.nan, inplace = True )
data['Size'].isnull().sum() #空值總數(shù)
1695
#由于size里有‘kM’字符,為了轉(zhuǎn)換成數(shù)值型,我們需要用正則表達式進行匹配
import re  #導入正則表達式包
#定義一個函數(shù)來,k改為1000,M改為1000,000
def change(i):
    if i is not np.nan:
        A,B=re.split('[kM]+',i)
        C,D=re.split('[0-9.]+',i)
        if D=='M':
            A=float(A)*1000000
        elif D =='k':
            A=float(A)*1000
        return A
#轉(zhuǎn)換size列為數(shù)值型
data['Size'] =data['Size'].apply(lambda x:change(x))
#用平均值來填充空置
data['Size'].fillna(data.groupby('Category')['Size'].transform('mean'),inplace=True)
#data['Price'].value_counts()   
#看下price里具體哪些數(shù)據(jù)
#變更price為float型
data['Price']=data['Price'].apply(lambda x: float(x[1:]) if x !='0' else 0 )
#首先看下有多少款APPs
len(data.App.unique())
9659
#比data行數(shù)少,說明有重復項,看下具體是哪些APP
data.App.value_counts()
ROBLOX                                                9
CBS Sports App - Scores, News, Stats & Watch Live     8
Candy Crush Saga                                      7
ESPN                                                  7
Duolingo: Learn Languages Free                        7
……
#選擇第一個APP看下內(nèi)容
data[data['App']=='ROBLOX']
image.png
#可以看到Reviews不一樣,去除重復項
#對于多個分類的,只保留一個分類(有100多個APP)
data=data.drop_duplicates(subset=['App'])
data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 9659 entries, 0 to 10840
Data columns (total 13 columns):
App               9659 non-null object
Category          9659 non-null object
Rating            8196 non-null float64
Reviews           9659 non-null int32
Size              9659 non-null float64
Installs          9659 non-null object
Type              9658 non-null object
Price             9659 non-null float64
Content Rating    9659 non-null object
Genres            9659 non-null object
Last Updated      9659 non-null object
Current Ver       9651 non-null object
Android Ver       9657 non-null object
dtypes: float64(3), int32(1), object(9)
memory usage: 1018.7+ KB

具體分析每個字段

#分析category 
cate= data['Category'].groupby(data['Category']).count()
cate=cate.sort_values(ascending=False)
plt.figure(figsize=(15,10))
sns.barplot(x=cate.index,y=cate.values)
plt.xticks(rotation=90)
plt.xlabel('Category')
plt.ylabel('App qty')
plt.title("App qty by category")
<matplotlib.text.Text at 0x1b76369a2e8>
image.png
labels=data['Category'].value_counts().index
sizes= data['Category'].value_counts().values
#做餅圖看各分類占比
plt.figure(figsize = (10,10))
plt.pie(sizes, labels=labels, autopct='%1.1f%%')
plt.title('App qty by category',color = 'blue',fontsize = 15)
<matplotlib.text.Text at 0x1b7638f8128>
image.png

結(jié)論:按分類數(shù)量,排名前3的APP是family 19.6%,game 9.9%,tool 8.5%,而且顯著高于之后分類的APP數(shù)量

#分析Genres
len(data.Genres.value_counts())
118
#genres 總共有120個類型
genr= data['Genres'].groupby(data['Genres']).count()
genr=genr.sort_values(ascending=False)
genr.index[:15] #選擇前15個類型
Index(['Tools', 'Entertainment', 'Education', 'Business', 'Medical',
       'Personalization', 'Productivity', 'Lifestyle', 'Finance', 'Sports',
       'Communication', 'Action', 'Health & Fitness', 'Photography',
       'News & Magazines'],
      dtype='object', name='Genres')
plt.figure(figsize=(15,10))
sns.barplot(x=genr.index[:15],y=genr.values[:15])
plt.xticks(rotation=90)
plt.xlabel('Genres')
plt.ylabel('App qty')
plt.title("App qty by Genres")
<matplotlib.text.Text at 0x1b764a7c278>
image.png
data.describe()
image.png
#看下Rating數(shù)據(jù)分布
fig=plt.figure(figsize=(15,6))
ax1 = fig.add_subplot(131)
ax2 = fig.add_subplot(132)
ax3 = fig.add_subplot(133)
sns.violinplot(y=data['Rating'],data=data,ax=ax1)
sns.kdeplot(data.Rating,ax=ax2,shade=True)
sns.boxplot(y=data.Rating,ax=ax3)
image.png

結(jié)論:50% app 評分在4-4.5之間,均值4.17分

#看下reivews數(shù)據(jù)
#data['Reviews'].value_counts()
fig=plt.figure(figsize=(12,8))
sns.kdeplot(data.Reviews,shade=True)  #Reviews 的密度分布
image.png

絕大部分APP的評論少于10個

#具體看下評論少于200的APP的分布
a=[]
for i in range(0,200,5):
    a.append(i)

fig=plt.figure(figsize=(15,8))
plt.hist(data['Reviews'],a,histtype="bar",rwidth=0.8,alpha=0.4)
plt.xticks(np.arange(0, 100, step=5))
image.png
#找出top 10 reiviews
b=data['Reviews'].value_counts()
b.sort_index(ascending=False)
78158306      1
69119316      1
66577313      1
56642847      1
44891723      1
42916526      1
27722264      1
25655305      1
24900999      1
23133508      1
22426677      1
 ……
16           35
15           30
14           41
13           49
12           58
11           52
10           62
9            64
8            72
7            88
6            945           
4           137
3           170
2           213
1           272
0           593
Name: Reviews, dtype: int64
data[data['Reviews']>20000000]
image.png

看下評論最高的APP除了4個游戲類,竟然主要是facebook系,谷歌系的只有youtube上榜,最后兩個是獵豹移動的

接下來分析下價格的影響,包括tpye和price兩個字段


a=data.Type.value_counts()

labels=data['Type'].value_counts().index
explode = [0.2,0]  #每一塊餅離中心的距離
sizes= data['Type'].value_counts().values
#colors = ['grey','blue','red','yellow','green','brown']

plt.figure(figsize = (9,9))
plt.pie(sizes, labels=labels, autopct='%1.1f%%',explode=explode)
plt.rcParams.update({'font.size': 10})
plt.title('App qty by type',color = 'blue',fontsize = 20)
<matplotlib.text.Text at 0x1b765455208>
image.png

可以看到92.2%的APP免費,付費APP占比7.8%

#分析下price
data['Price'].value_counts()
0.00      8903
0.99       145
2.99       124
1.99        73
4.99        70
3.99        57
1.49        46
5.99        26
2.49        25
9.99        19
399.99      12
6.99        11
14.99        9
4.49         9  
          ... 
Name: Price, dtype: int64

price = data['Price'].value_counts()
price.drop(0,inplace=True)  #刪除免費的,分析付費APP情況
price=price.sort_values(ascending=False)
fig = plt.figure(figsize=(15,10))
sns.kdeplot(data[data['Price']!=0]['Price']) #分析付費APP的密度分布圖
image.png

可以看到絕大部分APP價格低于30美元,但是看到400美元價位有一個凸起,把這類選中看下什么情況

data[data['Price']==399.99]
image.png

在網(wǎng)上查了后發(fā)現(xiàn)這是一個惡搞軟件,沒有任何用處??戳藀lay確實有幾千評論,10W下載,不過沒明白為什么有這么說下載量,有人知道的話可以告訴我

可以之后價格分析中把這些異常值刪除

#我們再具體看下所有分類
#num = str(a.tolist()).count("1")
#num
#絕大部分APP會定價0.99,1.99,2.99等,為了更改的分析,我們把價格值唯一的刪除(也就是只有一個APP定的是這個價格),總共63個值
price =price[price>1]
#a=data['Price'].value_counts().values
fig = plt.figure(figsize=(12,10))
sns.kdeplot(price.values,shade=True)
C:\Users\renhl1\Anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j





<matplotlib.axes._subplots.AxesSubplot at 0x1b765bcbb38>
image.png

fig = plt.figure(figsize=(18,10))
sns.barplot(price.index,price.values)
image.png

付費的絕大部分在10美元以下,排名top5 依次是0.99,2.99,1.99,4.99,3.99美元

#轉(zhuǎn)變last undated 為日期型
data['Last Updated']=pd.to_datetime(data['Last Updated'])
fig = plt.figure(figsize=(10,7))
plt.plot(data['Last Updated'],'.')
image.png
#看 installs情況
data['Installs'].value_counts()
1,000,000+        1417
100,000+          1112
10,000+           1031
10,000,000+        937
1,000+             888
100+               710
5,000,000+         607
500,000+           505
50,000+            469
5,000+             468
10+                385
500+               328
50+                204
50,000,000+        202
100,000,000+       188
5+                  82
1+                  67
500,000,000+        24
1,000,000,000+      20
0+                  14
0                    1
Name: Installs, dtype: int64
install=data['Installs'].groupby(data['Installs']).count()
install =install.sort_values(ascending=False)
fig = plt.figure(figsize=(9,12))
sns.barplot(x=install.values,y=install.index)
plt.ylabel('installed times')
plt.xlabel('App qty')
plt.title("App qty by installed times")
image.png

可以看到APP數(shù)量最多的是1M次下載的,另外還有一個好玩的地方,5開頭的下載量顯著的少于10開頭的下載量

# 下載量超過10億次的APP情況
data[data['Installs']=='1,000,000,000+']
image.png

image.png

超過10億下載量的大多數(shù)是google的產(chǎn)品

#看下下載量跟reviews有沒有關系
reviews=data['Reviews'].groupby(data['Installs']).mean()
fig = plt.figure(figsize=(15,9))
sns.barplot(x=reviews.values,y=reviews.index)
plt.ylabel('installed times')
plt.xlabel('reviews')
plt.title("avg.reivew by installed times")
plt.xscale('log') #刻度改為log
image.png

可以看到下載量確實和評論數(shù)呈正相關

探索評分可能跟哪些參數(shù)有關

首先清理評分為0的數(shù)據(jù)并賦值到一個新數(shù)據(jù)集

#data['Rating'].value_counts()
newdata=data[data['Rating'].notnull()]  #刪除沒有評分的數(shù)據(jù)
newdata.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 8196 entries, 0 to 10840
Data columns (total 13 columns):
App               8196 non-null object
Category          8196 non-null object
Rating            8196 non-null float64
Reviews           8196 non-null int32
Size              8196 non-null float64
Installs          8196 non-null object
Type              8196 non-null object
Price             8196 non-null float64
Content Rating    8196 non-null object
Genres            8196 non-null object
Last Updated      8196 non-null datetime64[ns]
Current Ver       8192 non-null object
Android Ver       8194 non-null object
dtypes: datetime64[ns](1), float64(3), int32(1), object(8)
memory usage: 864.4+ KB
#看下last update 和 rating 有沒有關系
fig = plt.figure(figsize=(10,7))
plt.plot(newdata['Last Updated'],newdata['Rating'],'.')
[<matplotlib.lines.Line2D at 0x1b7670ec5f8>]
image.png
#把年份單獨提取出來,作為新的一列
from datetime import datetime
newdata['updated_year']=newdata['Last Updated'].dt.year

fig = plt.figure(figsize=(15,9))
sns.boxplot(newdata['updated_year'],newdata['Rating'])
plt.xlabel('updated year')
plt.ylabel('rating')
plt.title('rating with different updated year')
image.png

可以得出結(jié)論隨著時間APP的中位數(shù)打分在越來越高,到了2018年首次超過75%的APP分數(shù)超過4分,說明隨著移動應用的完善,低質(zhì)的APP基本沒有了市場

plt.figure(figsize=(12,9))
sns.boxplot(x=newdata['Type'],y=newdata['Rating'],data=newdata)
image.png

可以看到付費APP的評分比免費APP的評分高

#看下reivews和rating是否有相關性
#pearson相關性,值在-1和+1之間,+1表示完全正相關,-1表示完全負相關,0表示沒有相關性
plt.figure(figsize=(10,10))
sns.jointplot(newdata['Reviews'],newdata['Rating'],kind='reg',size =7)
image.png
#看下size是否有相關性
plt.figure(figsize=(10,10))
sns.jointplot(newdata['Size'],newdata['Rating'],kind='reg',size =7)
image.png

結(jié)論:Rating跟Reviews 和 Size 沒有相關性

#看下category 和 rating 的關系
fig =plt.figure(figsize=(15,12))
sns.boxplot(y=newdata['Category'],x=newdata['Rating'],data=newdata)
#plt.xticks(rotation=90)
plt.ylabel('category')
plt.xlabel('rating')
plt.title('rating distribution by category')
<matplotlib.text.Text at 0x1b768aabb00>
image.png

可以看到評分最低的是dating :),評分比較高的分類有art and design, events,personalization,parenting

#看installs和Rating關系
installrate =newdata['Rating'].groupby(newdata['Installs']).count()
installrate
Installs
1+                   3
1,000+             697
1,000,000+        1415
1,000,000,000+      20
10+                 69
10,000+            987
10,000,000+        937
100+               303
100,000+          1094
100,000,000+       188
5+                   9
5,000+             425
5,000,000+         607
50+                 56
50,000+            457
50,000,000+        202
500+               199
500,000+           504
500,000,000+        24
Name: Rating, dtype: int64
#把下載人數(shù)過少的評論去掉,只查看高于100下載的
slected =newdata.loc[(newdata['Installs'] != '1+')&(newdata['Installs'] != '5+')&(newdata['Installs'] != '10+')&(newdata['Installs'] != '50+')]
#看下 installs 和 rating 的關系
fig =plt.figure(figsize=(15,9))
sns.boxplot(x=slected['Installs'],y=slected['Rating'])
plt.xticks(rotation=45)
plt.xlabel('Installed qty')
plt.ylabel('rating')
plt.title('rating distribution by category')
image.png
#分數(shù)集中在4-4.5,rating跟installs 沒有很強的相關性
#看下跟Price關系,前面tpye相當于付費0元 和大于0元的比較,這里再細分付費金額的區(qū)別
#drop 0元 和 異常的i'm rick APP
selected =newdata.loc[(newdata['Price']!=0) & (newdata['Price']<200)]
#看下 installs 和 rating 的關系    
fig =plt.figure(figsize=(15,9))
sns.jointplot(x=selected['Price'],y=selected['Rating'],kind='reg')
#xplt.xticks(rotation=45)
plt.xlabel('Price')
plt.ylabel('rating')
plt.title('rating distribution vs. price')
image.png

分值-0.029,price和rating 缺乏相關性

#看下category 和 genres 
data['App'].groupby([data['Category'],data['Genres']]).count()
Category             Genres                               
ART_AND_DESIGN       Art & Design                              57
                     Art & Design;Action & Adventure            1
                     Art & Design;Creativity                    5
                     Art & Design;Pretend Play                  1
AUTO_AND_VEHICLES    Auto & Vehicles                           85
BEAUTY               Beauty                                    53
BOOKS_AND_REFERENCE  Books & Reference                        222
BUSINESS             Business                                 420
COMICS               Comics                                    55
                     Comics;Creativity                          1
COMMUNICATION        Communication                            315
DATING               Dating                                   171
EDUCATION            Education                                 99
                     Education;Action & Adventure               1
                     Education;Brain Games                      3
                     Education;Creativity                       3
                     Education;Education                        8
                     Education;Music & Video                    1
                     Education;Pretend Play                     4
ENTERTAINMENT        Entertainment                             92
                     Entertainment;Brain Games                  2
                     Entertainment;Creativity                   1
                     Entertainment;Music & Video                7
EVENTS               Events                                    64
FAMILY               Action;Action & Adventure                  9
                     Adventure;Action & Adventure               4
                     Adventure;Brain Games                      1
                     Adventure;Education                        1
                     Arcade;Action & Adventure                 14
                     Arcade;Pretend Play                        1
                                                             ... 
GAME                 Simulation;Education                       1
                     Sports                                     6
                     Strategy                                  17
                     Trivia                                    38
                     Word                                      23
HEALTH_AND_FITNESS   Health & Fitness                         288
HOUSE_AND_HOME       House & Home                              74
LIBRARIES_AND_DEMO   Libraries & Demo                          84
LIFESTYLE            Lifestyle                                368
                     Lifestyle;Pretend Play                     1
MAPS_AND_NAVIGATION  Maps & Navigation                        131
MEDICAL              Medical                                  395
NEWS_AND_MAGAZINES   News & Magazines                         254
PARENTING            Parenting                                 46
                     Parenting;Brain Games                      1
                     Parenting;Education                        7
                     Parenting;Music & Video                    6
PERSONALIZATION      Personalization                          376
PHOTOGRAPHY          Photography                              281
PRODUCTIVITY         Productivity                             374
SHOPPING             Shopping                                 202
SOCIAL               Social                                   239
SPORTS               Sports                                   325
TOOLS                Tools                                    826
                     Tools;Education                            1
TRAVEL_AND_LOCAL     Travel & Local                           218
                     Travel & Local;Action & Adventure          1
VIDEO_PLAYERS        Video Players & Editors                  162
                     Video Players & Editors;Music & Video      1
WEATHER              Weather                                   79
Name: App, dtype: int64
#不同category,付費用戶比例
a=data['App'].groupby([data['Category'],data['Type']]).count()
c=[]
d=[]
for i in a.index.values:
    c.append(i[0])
    d.append(i[1])
typedata=pd.DataFrame({'Category':c,'Type':d,'values':list(a.values)})
fig =plt.figure(figsize=(15,12))
sns.barplot(y=typedata[typedata['Type']=='Paid']['Category'],x=typedata[typedata['Type']=='Paid']['values'],color='yellow',alpha=0.8,label='Paid')
sns.barplot(y=typedata[typedata['Type']=='Free']['Category'],x=typedata[typedata['Type']=='Free']['values'],color='green',alpha = 0.2,label='Free')
<matplotlib.axes._subplots.AxesSubplot at 0x1b76968add8>
image.png

可以看出付費用戶占最高的是ENTERTAINMENT,'LIBRARIES_AND_DEMO,BEAUTY,SHOPPING

#看下安卓版本和rating 關系
newdata['Android Ver'].value_counts()
4.1 and up            1811
4.0.3 and up          1141
4.0 and up            1042
Varies with device     947
4.4 and up             713
2.3 and up             547
5.0 and up             447
4.2 and up             316
2.3.3 and up           232
2.2 and up             203
3.0 and up             201
4.3 and up             185
2.1 and up             112
1.6 and up              87
6.0 and up              42
7.0 and up              41
3.2 and up              31
2.0 and up              27
5.1 and up              16
1.5 and up              16
3.1 and up               8
2.0.1 and up             7
4.4W and up              5
8.0 and up               5
7.1 and up               3
4.0.3 - 7.1.1            2
1.0 and up               2
5.0 - 8.0                2
4.1 - 7.1.1              1
7.0 - 7.1.1              1
5.0 - 6.0                1
Name: Android Ver, dtype: int64
fig = plt.figure(figsize=(15,9))
sns.boxplot(x=newdata['Rating'],y=newdata['Android Ver'])
plt.xlabel('rating')
plt.ylabel('android ver')
<matplotlib.text.Text at 0x1b76a0e3240>
image.png

支持安卓版本和rating沒有特別相關性

#看戲分級和Rating關系
data['Content Rating'].value_counts()
Everyone           7903
Teen               1036
Mature 17+          393
Everyone 10+        322
Adults only 18+       3
Unrated               2
Name: Content Rating, dtype: int64
fig = plt.figure(figsize=(15,9))
sns.boxplot(x=newdata['Content Rating'],y=newdata['Rating'])
plt.xlabel('content rating')
plt.ylabel('rating')
<matplotlib.text.Text at 0x1b76b76c198>
image.png

conclusion

本篇共分析了谷歌應用商店APP數(shù)據(jù)集,共9659個APPs

評分rating的均值是4.17, 50%的APP分值在4-4.5分

app分類數(shù)量排名前3的APP是family 19.6%,game 9.9%,tool 8.5%的總APP數(shù)量占比

付費用戶占比7.8%,其中ENTERTAINMENT,'LIBRARIES_AND_DEMO,BEAUTY,SHOPPING等分類的付費APP最高,付費的價格絕大部分在10美元以下,排名top5 依次是0.99,2.99,1.99,4.99,3.99美元、

大多數(shù)APP支持安卓4.0以上版本,還支持安卓2.0,3.0的APP已經(jīng)很少了

超過10億下載量的大多數(shù)是google系的產(chǎn)品,但是評論量最高的是facebook系產(chǎn)品

影響Rating分值的因子有Type,Category,updated year

最后編輯于
?著作權歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

友情鏈接更多精彩內(nèi)容