#導(dǎo)入數(shù)據(jù)
import graphlab
song_data = graphlab.SFrame("song_data.gl/")
#查看數(shù)據(jù)結(jié)構(gòu)
song_data.head()
數(shù)據(jù)結(jié)構(gòu)如下:

數(shù)據(jù)由user_id,song_id,listen_count,title,artist,song這幾列構(gòu)成。
- Which of the artists below have had the most unique users listening to their songs?('Kanye West,'Foo Fighters,Taylor Swift,Lady GaGa)
print song_data[song_data['artist'] == 'Kanye West']
將artist為Kanye West的數(shù)據(jù)全部選定,得到如下數(shù)據(jù):

然后對(duì)用戶(user_id)進(jìn)行統(tǒng)計(jì),這里使用unique()函數(shù),其可以輸出其中不重復(fù)的用戶名
print song_data[song_data['artist'] == 'Kanye West']['user_id'].unique()
這樣就將所有用戶統(tǒng)計(jì)了出來,輸入結(jié)果如下:

len(song_data[song_data['artist'] == 'Kanye West']['user_id'].unique())
輸出結(jié)果:2522
對(duì)剩下的三人進(jìn)行重復(fù)的操作
len(song_data[song_data['artist'] == 'Foo Fighters']['user_id'].unique())
len(song_data[song_data["artist"] == "Taylor Swift"]["user_id"].unique())
len(song_data[song_data["artist"] == "Lady GaGa"]["user_id"].unique())
輸出結(jié)果:2055,3246,2928
2 . Which of the artists below is the most popular artist, the one with highest total listen_count, in the data set?
3 .
Which of the artists below is the least popular artist, the one with smallest total listen_count, in the data set?
這里要用到groupby(key_columns, operations, *args)
其可以將關(guān)鍵列按給出的列聚合。
i. key_columns , which takes the column we want to group, in our case, 'artist'
ii. operations , where we define the aggregation operation we using, in our case, we want to sum over the 'listen_count'.
data = song_data.groupby(key_columns='artist', operations={'total_count': graphlab.aggregate.SUM('listen_count')}).sort('total_count', ascending=False)
print data[0]
print data[-1]
輸出結(jié)果如下:
{'total_count': 43218, 'artist': 'Kings Of Leon'}
{'total_count': 14, 'artist': 'William Tabbert'}