title: 摩拜單車(chē)爬蟲(chóng)+GIS測(cè)試
date: 2018-02-20 17:06:25
tags:
摩拜單車(chē)爬蟲(chóng)+GIS測(cè)試
這次源于一次偶然的抓包,在手機(jī)上使用packet capture 對(duì)微信小程序抓包,偶然發(fā)現(xiàn)了如下的數(shù)據(jù):
{
"code": 0,
"message": "",
"biketype": 0,
"autoZoom": true,
"radius": 150,
"object": [
{
"distId": "8630779582",
"distX": 116.28002632187257,
"distY": 39.91022600618757,
"distNum": 1,
"distance": "62",
"bikeIds": "8630779582#",
"biketype": 1,
"type": 0
},
{
"distId": "8630367786",
"distX": 116.27939881379473,
"distY": 39.91056046398812,
"distNum": 1,
"distance": "117",
"bikeIds": "8630367786#",
"biketype": 1,
"type": 0
},
{
"distId": "0106005017",
"distX": 116.27949896034224,
"distY": 39.91104357270857,
"distNum": 1,
"distance": "131",
"bikeIds": "0106005017#",
"biketype": 2,
"type": 0
}
那自然是高興地不得了,標(biāo)準(zhǔn)的JSON So Eazy ??!
借鑒之前爬學(xué)而思的程序【上次爬了一大堆資料,嘿嘿嘿】
將API地址和請(qǐng)求內(nèi)容分成兩個(gè)變量,將其中的經(jīng)緯度信息分割出來(lái),分別各使用一個(gè)變量,再對(duì)其進(jìn)行拼接即可。
#獲取數(shù)據(jù)
def bike(longitude,latitude):
url = 'https://mwx.mobike.com/mobike-api/rent/nearbyBikesInfo.do'
#print('請(qǐng)求數(shù)據(jù)')
latitude = str(latitude)
longitude = str(longitude)
data="verticalAccuracy=0&latitude="+latitude+"&errMsg=getLocation:ok&accuracy=30&horizontalAccuracy=30&speed=-1&longitude="+longitude+"&citycode=010&wxcode=003GVtgj2jeyBF0cxQgj229Igj2GVtg1"
result = getHtml(url,data)
return result
之后由于還不會(huì)其他的爬蟲(chóng)方法,就使用最簡(jiǎn)單的import urllib.request
代碼如下
def getHtml(url,data):
user_agent='Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36'
headers = {
'charset':'utf-8',
'platform':'4',
'referer':'https://servicewechat.com/wx40f112341ae33edb/1/',
'content-type':'application/x-www-form-urlencoded',
'user-agent':'MicroMessenger/6.5.4.1000 NetType/WIFI Language/zh_CN',
'host':'mwx.mobike.com',
'connection':'Keep-Alice',
'accept-encoding':'gzip',
'cache-control':'no-cache'
}
#data = urllib.parse.urlencode(values)
#print(url+'?'+data)
timeout = 1
socket.setdefaulttimeout(timeout)
print(url+'?'+data)
response_result = urllib.request.Request(url+'?'+data)
html = urllib.request.urlopen(response_result).read()
return html
header也一樣是抓包出來(lái)的,反正就是copy嘛~~~
最后就是經(jīng)緯度的范圍,沒(méi)有什么太好的方法。
劃定一個(gè)區(qū)域
循環(huán)套循環(huán)
于是就出現(xiàn)了如下的代碼
for longitude in range(1160000,1168000,80):
longitude = longitude/10000
for latitude in range(396000,403000,80):
latitude = latitude/10000
#print(longitude)
#print(latitude)
getbike(latitude,longitude)
之所以要把經(jīng)緯度擴(kuò)大10000倍,是因?yàn)?code>range()并不接受float型步長(zhǎng),所以只能將其變成整形,之后再變回去,,唉。
最后是所需要的庫(kù):
import urllib.parse
import urllib.request
import json
import csv
import os
import time
import socket
import urllib.request
代碼已經(jīng)傳到Github
地址:https://github.com/w1109790800/Mobike-Reptile
Gis的分析結(jié)果沒(méi)太大進(jìn)展,大冬天的誰(shuí)騎車(chē),,,
[圖片上傳失敗...(image-a22b8c-1519123082633)]
最后附一部分?jǐn)?shù)據(jù)【數(shù)據(jù)量太大,需要的話去github中下載吧】
distId distX distY
106162799 116.0003669 39.60155466
106741325 116.0027729 39.59998057
106190593 116.0053821 39.60265709
106003828 116.0063136 39.59663351
106162799 116.0003669 39.60155466
106190593 116.0053821 39.60265709
106741325 116.0027729 39.59998057
106029871 116.0005932 39.61392113
106622939 116.0118808 39.60652963
106133660 115.9955102 39.61525816
106003828 116.0063136 39.59663351
106029871 116.0005932 39.61392113
106006181 116.0018439 39.6175667
106029871 116.0005932 39.61392113
.
.
.
PS:博客地址:http://blog.wangyuyang.top/