問題

爬取的數(shù)據(jù)

api結果

JOSN數(shù)據(jù)
https://map.baidu.com/?newmap=1&reqflag=pcmap&biz=1&from=webmap&da_par=direct&pcevaname=pc4.1&qt=s&c=1&wd=%E7%81%AB%E9%94%85&da_src=pcmappg.map&on_gel=1&l=5&gr=2&b=(9677951.714231128,1730014.8154582584;13993877.97804421,4767148.112215612)&pn=0&auth=W5zNFT6g0%40%40TZT24DwfQCHL0B35%40g6SWuxLBNERBNLHtBalTBnlcAZzvYgP1PcGCgYvjPuVtvYgPMGvgWv%40uVtvYgPPxRYuVtvYgP%40vYZcvWPCuVtvYgP%40ZPcPPuVtvYgPhPPyheuVtvhgMuxVVty1uVtCGYuNtJLmCUZIdbTbNdB9A1cv3uVtGccZcuVtPWv3GuBtR9KxXwPYIUvhgMZSguxzBEHLNRTVtcEWe1GD8zv7u%40ZPuzteL1wWveuBt0iyfixAN152T1N51wquTTGLFfy9GUIsxC2wvaaZyY&seckey=6VKbNl%2BEMipcFlMIHjKIybzZLU9kk4sn5QN4bSBp2q0%3D%2CWG3XhzBOLtVO6h67UdvVmni8%2B3Bln8Ck4hk87v17WepaJJGH6dhpwmT6%2BDFFFxM0PzMMjvdyWI3GVS3KZF3h7VAmP8Uf%2FgHuvAYxhdl%2Bg3tTRJP61I1dKdR8CStOr3FOveukTMBWS1MCb27ID9RcqkXg%2FooVgmwzXbrwkgpjDldMXr3TwWmVV01n8v032nwY&device_ratio=2&tn=B_NORMAL_MAP&nn=0&u_loc=11869559,3030727&ie=utf-8&t=1637483768544&newfrom=zhuzhan_webmap
python爬蟲
其實這次的爬蟲很簡單,長話短說
import requests
import pandas as pd
url = "https://map.baidu.com/?newmap=1&reqflag=pcmap&biz=1&from=webmap&da_par=direct&pcevaname=pc4.1&qt=s&c=1&wd=%E7%81%AB%E9%94%85&da_src=pcmappg.map&on_gel=1&l=3&gr=1&b=(1174135.838217428,-5261081.159856323;14879172.105572648,10545685.908882445)&pn=0&auth=RLVSxNUz96XX1SYEcDCQSa4cR9aNvG96uxLBNExRNBTtA%3Dk6Amkbz8yvYgP1PcGCgYvjPuVtvYgPMGvgWv%40uxtw8055yS8v7uvYgP%40vYZcvWPCuVtvYgP%40ZPcPPuVtvYgPhPPyheuVtvhgMuxVVty1uVtCGYuxtE20w5V198P8J9v7u1cv3uztexZFTHrwzDv5ooioGdFPWv3GuVtPYIuVtPYIUvhgMZSguxzBEHLNRTVtcEWe1aDYyuVt%40ZPuzteL1wWveuztghxehwz4DPGz6DB4vjnOOAGzu%3D%3D8xC&seckey=7W5%2BIhO%2B9UrQ2WX2V%2BE0KmrfRnfNjQ0Xg0YEc5Iu%2Fz8%3D%2CszNX4bkys1gfYd65mAyKk1JSwTAQrk4A7iOGDLHckvKsmeFsawFeIWwGdwoAZuFMnHE%2BuWYgg3T%2FtZUz1hiMdKD0LrtR1TZBjQm%2Fyt5c7mduGqYLkdWpXM0c%2FJ%2FgiJFFtmSDOxvMAiCeWh%2BUqQNFnJfZBvzsRINMb8JuZYiO%2Fiq3KhhPoNAWt%2BYfDFOJdd3u&device_ratio=2&tn=B_NORMAL_MAP&nn=0&u_loc=11871969,3028287&ie=utf-8&t=1637418793652&newfrom=zhuzhan_webmap"
payload={}
headers = {
'Cookie': 'BAIDUID=C715F06E5DE06ABAF85A3CE841D57766:FG=1'
}
response = requests.request("GET", url, headers=headers, data=payload)
# print(response.json())
data = response.json()['more_city']
df = pd.DataFrame(data)
print(df)

結果
pandas處理數(shù)據(jù)
我們需要對city列處理

需要處理的列
我們頭腦一定要清晰,我以前做了蠻久,但是現(xiàn)在一些就這次來了,我們的思路,分列,行列轉(zhuǎn)換,然后提取數(shù)據(jù)。
導入模塊
import numpy as np
import pandas as pd
from pyecharts import options as opts
from pyecharts.charts import Geo
from pyecharts.globals import ChartType,SymbolType
%matplotlib inline
讀取數(shù)據(jù)
data = pd.read_excel('2021-11-20-23-44-7-79835853400-火鍋數(shù)據(jù).xlsx')
df = pd.DataFrame(data['city'])
df.head()

數(shù)據(jù)
數(shù)據(jù)分列
df = df['city'].str.split(',', expand=True)
df.shape

數(shù)據(jù)大小,后面有用
行列轉(zhuǎn)換
表格轉(zhuǎn)一列
df_stack = df.stack()
df_stack = pd.DataFrame(df_stack)
print(df_stack.head(10))
print(df_stack.shape)

一列數(shù)據(jù)

數(shù)據(jù)大小
行列轉(zhuǎn)換
我們這里解釋一下,2513×1,我們要轉(zhuǎn)換成359×7。
df_temp =pd.DataFrame(np.reshape(df_stack.to_numpy(), (359,7)))
df_course = df_temp[[0,5,6]]
正則表達式提取
df_effect = df_course[5].str.extract('([\u4e00-\u9fa5]+)', expand=True)
df_effect['Name'] = df_effect[0]
df_effect.drop(0, axis=1, inplace=True)
df_effect['Code'] = df_course[0].str.extract('([0-9]*$)', expand=True)
df_effect['Num'] = df_course[6].str.extract('(\d+.*\d+)', expand=True)
df_effect.head()

結果
導出數(shù)據(jù)
df_effect.to_excel('2021-11-20-23-44-7-79835853400-火鍋數(shù)據(jù)-效果.xlsx')
省份數(shù)據(jù)可視化
省份數(shù)據(jù)提取
data.head()

需要可視化的數(shù)據(jù)
df_province = data[['province','province_id','num']]
df_province.head()
df_province.to_excel('火鍋_省份數(shù)據(jù).xlsx')

省份數(shù)據(jù)
pyecharts數(shù)據(jù)可視化
我們這次選擇pyecharts來數(shù)據(jù)可視化,注意Pyecharts里面省份要一樣,比如上海市不能識別,只有上海才可以識別。
df_province['province'] = df_province['province'].str.replace('省','')
df_province['province'] = df_province['province'].str.replace('市','')
df_province['province'].replace({'內(nèi)蒙古自治區(qū)':'內(nèi)蒙古','廣西壯族自治區(qū)':'廣西','新疆維吾爾自治區(qū)':'新疆',
'寧夏回族自治區(qū)':'寧夏','西藏自治區(qū)':'西藏'},inplace=True)
還有Pyecharts只識別列表數(shù)據(jù),所以我們需要數(shù)據(jù)類型改變。
province = df_province['province'].tolist()
num = df_province.num.tolist()
print('地區(qū)',province)
print('數(shù)量',num)

數(shù)據(jù)
我們需要一一對應
geo_test_data = list(zip(province,num))
print(geo_test_data)
繪制地圖
from pyecharts import options as opts
from pyecharts.charts import Map
map=Map()
map.add('頻數(shù)',geo_test_data,'china')
map.set_global_opts(
title_opts=opts.TitleOpts(title='全國火鍋省份分布圖'),
visualmap_opts=opts.VisualMapOpts(min_=100,max_=15000,split_number=5,is_piecewise=True)) #圖例是否分段
map.render_notebook()
map.render("全國火鍋分布圖.html")
我們來看看效果。

全國火鍋分布圖
Tableau數(shù)據(jù)可視化
其實python來數(shù)據(jù)可視化比較痛苦的,所以我們在tableau里面數(shù)據(jù)可視化一樣,好痛苦。

中國火鍋儀表盤
總結
太痛苦,要不是為了錢,錢到了,當然要記錄一下,不記錄的話,以后又忘記了,那怎么辦,而且還可以傳播出去。還有數(shù)據(jù)有缺失,不一定準確,數(shù)據(jù)來源于百度。強烈抗議osm把灣灣劃出我國,這個不代表本人觀點,祖國萬歲。