requests作用
發(fā)送網(wǎng)絡(luò)請(qǐng)求,返回響應(yīng)數(shù)據(jù)
requests中文文檔
發(fā)送get請(qǐng)求
發(fā)送帶header的請(qǐng)求
發(fā)送帶參數(shù)的請(qǐng)求
發(fā)送get請(qǐng)求
【demo01】獲取百度首頁(yè)信息
import requests
# 目標(biāo)url
url = 'https://www.baidu.com'
# 向目標(biāo)url發(fā)送get請(qǐng)求
response = requests.get(url)
# 打印響應(yīng)內(nèi)容
print(response.text)
response的常用屬性:
- response.text 響應(yīng)體 str類型
- respones.content 響應(yīng)體 bytes類型
- response.status_code 響應(yīng)狀態(tài)碼
- response.request.headers 響應(yīng)對(duì)應(yīng)的請(qǐng)求頭
- response.headers 響應(yīng)頭
- response.request.cookies 響應(yīng)對(duì)應(yīng)請(qǐng)求的cookie
- response.cookies 響應(yīng)的cookie(經(jīng)過(guò)了set-cookie動(dòng)作)
獲取網(wǎng)頁(yè)源碼的通用方式:
- response.content.decode()
- response.content.decode("GBK")
- response.text
【demo02】保存網(wǎng)絡(luò)圖片
import requests
# 圖片的url
url = 'https://www.baidu.com/img/bd_logo1.png'
# 響應(yīng)本身就是一個(gè)圖片,并且是二進(jìn)制類型
response = requests.get(url)
# print(response.content)
# 以二進(jìn)制+寫入的方式打開文件
with open('baidu.png', 'wb') as f:
# 寫入response.content bytes二進(jìn)制類型
f.write(response.content)
發(fā)送帶header的請(qǐng)求
帶header的原因:
模仿瀏覽器,欺騙服務(wù)器,獲取和瀏覽器一致的內(nèi)容
header形式:
字典
用法:
requests.get(url, headers=headers)
【demo03】模擬瀏覽器獲取百度首頁(yè)
# 獲取百度首頁(yè)
import requests
url='https://www.baidu.com'
# 請(qǐng)求頭中帶上User-Agent
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36'}
response=requests.get(url,headers=headers)
# 打印請(qǐng)求頭信息
print(response.request.headers)
發(fā)送帶參數(shù)的請(qǐng)求
請(qǐng)求參數(shù)形式:
字典
kw = {'wd':'長(zhǎng)城'}
用法:
requests.get(url,params=kw)
【demo04】發(fā)送帶參數(shù)的請(qǐng)求
# 發(fā)送帶參數(shù)的請(qǐng)求
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36'}
url = 'https://www.baidu.com/s?'
kw = {'wd': 'python'}
# 帶上參數(shù)發(fā)起請(qǐng)求
response = requests.get(url, headers=headers, params=kw)
print(response.content)
【作業(yè)】獲取新浪首頁(yè),查看response.text 和response.content.decode()的區(qū)別
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36'}
url = 'https://www.sina.com.cn/'
response = requests.get(url, headers=headers)
print(response.text)
print(response.content.decode())
結(jié)果:
response.text返回亂碼
response.content.decode()沒(méi)有返回亂碼
結(jié)論:
response.text是根據(jù)網(wǎng)頁(yè)的響應(yīng)來(lái)猜測(cè)編碼,如果不指定的話,默認(rèn)是Unicode型的數(shù)據(jù)(ISO-8859-1)
【作業(yè)】實(shí)現(xiàn)任意貼吧的爬蟲,保存網(wǎng)頁(yè)到本地
import requests
import sys
class Tieba(object):
def __init__(self, name, pn):
self.name = name
self.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36'
}
self.url = 'http://tieba.baidu.com/f?kw={}&pn='.format(self.name)
self.url_list = [self.url + str(i * 50) for i in range(pn)]
def get_data(self, url):
response = requests.get(url, headers=self.headers)
return response.content
def save_data(self, data, index):
filename = self.name + "_{}.html".format(index)
with open(filename, 'wb')as f:
f.write(data)
def run(self):
# 遍歷url列表
for url in self.url_list:
index = self.url_list.index(url)
# 發(fā)送請(qǐng)求
data = self.get_data(url)
# 保存
self.save_data(data, index)
if __name__ == '__main__':
name = input("輸入貼吧名:")
pn = input("輸入頁(yè)數(shù):")
tieba = Tieba(name, int(pn))
tieba.run()