日韩高清啪,久久久久久久日韩免费

requests

實(shí)現(xiàn)爬蟲第一步：數(shù)據(jù)抓取。

不知道從什么時(shí)候，貌似談到Python技術(shù)，必談爬蟲。

講到爬蟲也不得不說(shuō)到Python

Python這門語(yǔ)言對(duì)抓取網(wǎng)頁(yè)有什么相關(guān)的技術(shù)方案呢。

那本文就介紹如何實(shí)現(xiàn)抓取網(wǎng)頁(yè)內(nèi)容：RequestsHTTP庫(kù)的使用。

有了好用的工具，就可以愉快的搞事情啦。

安裝`Requests`

pip方式安裝

pip install requests

源碼安裝

git clone git://github.com/kennethreitz/requests.git

Requests使用

來(lái)感受下Requests如何使用

>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
u'{"type":"User"...'
>>> r.json()
{u'private_gists': 419, u'total_private_repos': 77, ...}

上面是官方文檔給出的示例。代碼很直觀，相信大家能看明白。

response.get方法： HTTP GET方式請(qǐng)求URL
r.status_code 響應(yīng)的HTTP狀態(tài)碼
r.headers ：HTTP頭信息
r.encoding ：編碼格式
r.text 網(wǎng)頁(yè)內(nèi)容

下面將通過(guò)請(qǐng)求、響應(yīng)的過(guò)程分別介紹Requests相應(yīng)的方法。

請(qǐng)求

導(dǎo)入模塊

import requests

GET 請(qǐng)求：

r = requests.get('https://github.com/timeline.json')

POST 請(qǐng)求

r = requests.post("http://httpbin.org/post")

POST 上傳

url = 'http://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}
r = requests.post(url, files=files)

其它HTTP方法

r = requests.put("http://httpbin.org/put")
r = requests.delete("http://httpbin.org/delete")
r = requests.head("http://httpbin.org/get")
r = requests.options("http://httpbin.org/get")

攜帶參數(shù)

payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get("http://httpbin.org/get", params=payload)

payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.post("http://httpbin.org/post", data=payload)

import json
url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
r = requests.post(url, data=json.dumps(payload))

設(shè)置超時(shí)

r = requests.get('https://github.com', timeout=5)

禁止重定向

r = requests.get('http://github.com', allow_redirects=False)

設(shè)置請(qǐng)求頭

url = 'https://api.github.com/some/endpoint'
headers = {'user-agent': 'my-app/0.0.1'}
r = requests.get(url, headers=headers)

設(shè)置代理

import requests

proxies = {
  "http": "http://10.10.1.10:3128",
  "https": "http://10.10.1.10:1080",
}

requests.get("http://example.org", proxies=proxies)

發(fā)送Cookie

url = 'http://httpbin.org/cookies'
cookies = dict(cookies_are='working')

r = requests.get(url, cookies=cookies)
r.text
'{"cookies": {"cookies_are": "working"}}'

import requests
params = {'username': 'Ryan', 'password': 'password'}
r = requests.post("http://pythonscraping.com/pages/cookies/welcome.php", params) print("Cookie is set to:")
print(r.cookies.get_dict())
print("-----------")
print("Going to profile page...")
r = requests.get("http://pythonscraping.com/pages/cookies/profile.php",
                      cookies=r.cookies)
print(r.text)

保持回話

import requests
s = requests.Session()

s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get("http://httpbin.org/cookies")

print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'

import requests
session = requests.Session()
params = {'username': 'username', 'password': 'password'}
s = session.post("http://pythonscraping.com/pages/cookies/welcome.php", params) print("Cookie is set to:")
print(s.cookies.get_dict())
print("-----------")
print("Going to profile page...")
s = session.get("http://pythonscraping.com/pages/cookies/profile.php") print(s.text)

忽略SSL證書驗(yàn)證

requests.get('https://kennethreitz.com', verify=False)

響應(yīng)內(nèi)容

狀態(tài)碼

r = requests.get('http://httpbin.org/get')
>>> r.status_code

URL

r.url

響應(yīng)頭

r.headers
r.headers['Content-Type']
r.headers.get('content-type')

Cookie

r.cookies
r.cookies['example_cookie_name']

文本編碼

r.encoding

響應(yīng)內(nèi)容

r.text

二進(jìn)制內(nèi)容

r.content

JSON 響應(yīng)內(nèi)容

r.json()

原始響應(yīng)內(nèi)容

r.raw

以上內(nèi)容就是Requests的常用方法。還有一些高級(jí)用法沒(méi)列舉，請(qǐng)查閱官方文檔。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

爬蟲技術(shù)－使用Requests抓取網(wǎng)頁(yè)內(nèi)容

爬蟲技術(shù)－使用Requests抓取網(wǎng)頁(yè)內(nèi)容

安裝`Requests`

Requests使用

請(qǐng)求

導(dǎo)入模塊

GET 請(qǐng)求：

POST 請(qǐng)求

POST 上傳

其它HTTP方法

攜帶參數(shù)

設(shè)置超時(shí)

禁止重定向

設(shè)置請(qǐng)求頭

設(shè)置代理

發(fā)送Cookie

保持回話

忽略SSL證書驗(yàn)證

響應(yīng)內(nèi)容

狀態(tài)碼

URL

響應(yīng)頭

Cookie

文本編碼

響應(yīng)內(nèi)容

二進(jìn)制內(nèi)容

JSON 響應(yīng)內(nèi)容

原始響應(yīng)內(nèi)容

官方文檔

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

爬蟲技術(shù)－使用Requests抓取網(wǎng)頁(yè)內(nèi)容

安裝Requests

Requests使用

請(qǐng)求

導(dǎo)入模塊

GET 請(qǐng)求：

POST 請(qǐng)求

POST 上傳

其它HTTP方法

攜帶參數(shù)

設(shè)置超時(shí)

禁止重定向

設(shè)置請(qǐng)求頭

設(shè)置代理

發(fā)送Cookie

保持回話

忽略SSL證書驗(yàn)證

響應(yīng)內(nèi)容

狀態(tài)碼

URL

響應(yīng)頭

Cookie

文本編碼

響應(yīng)內(nèi)容

二進(jìn)制內(nèi)容

JSON 響應(yīng)內(nèi)容

原始響應(yīng)內(nèi)容

官方文檔

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

安裝`Requests`