某個網(wǎng)站/服務(wù)器上,請求鏈接的用戶信息。資料集中有電腦類型、瀏覽器鏈接等。用python進(jìn)行簡單的處理和繪圖分析。
# INPUT uses python 3.6
import json
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
path = 'usagov_bitly_data2012-03-16-1331923249.txt'
records = [json.loads(line) for line in open(path)]
frame = pd.DataFrame(records)
results = pd.Series([x.split()[0] for x in frame.a.dropna()])
# print(results[:5])
cframe = frame[frame.a.notnull()]
operating_systems = np.where(cframe['a'].str.contains('Windows'),
'Windows','Not Windows')
by_tz_os = cframe.groupby(['tz',operating_systems])
agg_counts = by_tz_os.size().unstack().fillna(0)
indexer = agg_counts.sum(1).argsort()
count_subset = agg_counts.take(indexer)[-10:]
normed_subset = count_subset.div(count_subset.sum(1),axis = 0)
normed_subset.plot(kind='barh',stacked = True)
plt.show()
# OUT
Not Windows Windows
tz
America/Sao_Paulo 13.0 20.0
Europe/Madrid 16.0 19.0
Pacific/Honolulu 0.0 36.0
Asia/Tokyo 2.0 35.0
Europe/London 43.0 31.0
America/Denver 132.0 59.0
America/Los_Angeles 130.0 252.0
America/Chicago 115.0 285.0
245.0 276.0
America/New_York 339.0 912.0

operating_char.png
2018.7.16
學(xué)習(xí)筆記《用python進(jìn)行數(shù)據(jù)分析》,非原創(chuàng),僅作學(xué)習(xí)存檔用途。 在草稿箱放太久有點(diǎn)忘了。