日韩深夜影片,一区二区人妻有码

image

urllib是python自帶的一個包，主要用于做爬蟲的(暫時接觸到的是這樣)。爬蟲也叫網(wǎng)絡(luò)蜘蛛，主要功能是獲取網(wǎng)頁數(shù)據(jù)。urllib包含四個模塊.request用于模擬發(fā)送請求,error處理異常模塊.parse提供url處理,robotparser處理網(wǎng)站的reboot.txt。今天只學(xué)一學(xué)request，畢竟正加班呢。

爬網(wǎng)站數(shù)據(jù)

使用request獲取Python主頁內(nèi)容：

import urllib.request

response = urllib.request.urlopen('https://www.python.org')
print(response.read().decode('utf-8'))

兩行代碼輕松拿到網(wǎng)頁源碼

image

使用print(type(response))，我們可以看出打印結(jié)果為HTTPResponse,我們能從HTTPResponse中拿到哪些數(shù)據(jù)呢，打開HTTPResponse源碼:

        self.headers = self.msg = None

        # from the Status-Line of the response
        self.version = _UNKNOWN # HTTP-Version
        self.status = _UNKNOWN  # Status-Code
        self.reason = _UNKNOWN  # Reason-Phrase

        self.chunked = _UNKNOWN         # is "chunked" being used?
        self.chunk_left = _UNKNOWN      # bytes left to read in current chunk
        self.length = _UNKNOWN          # number of bytes left in response
        self.will_close = _UNKNOWN      # conn will close at end of response

如圖，我們可以拿到headers,version,status,reason,chunked,chunk_left,length等屬性。有興趣可以打印看看。

上面的代碼中，如果想添加其他參數(shù)呢，比如上傳請求體data?？纯?code>request的urlopen()方法。

def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
            *, cafile=None, capath=None, cadefault=False, context=None):

除了傳遞url之外，還可以上傳請求數(shù)據(jù)，超時時間，證書等，拿淘寶IP接口測試一下：

import urllib.parse
import urllib.request

data = bytes(urllib.parse.urlencode({'ip': '63.223.108.42'}), encoding='utf8')
response = urllib.request.urlopen('http://ip.taobao.com/service/getIpInfo.php', data=data)
print(response.read())

結(jié)果:

{
    "code": 0,
    "data": {
        "ip": "63.223.108.42",
        "country": "\xe7\xbe\x8e\xe5\x9b\xbd",
        "area": "",
        "region": "\xe5\x8d\x8e\xe7\x9b\x9b\xe9\xa1\xbf",
        "city": "\xe8\xa5\xbf\xe9\x9b\x85\xe5\x9b\xbe",
        "county": "XX",
        "isp": "\xe7\x94\xb5\xe8\xae\xaf\xe7\x9b\x88\xe7\xa7\x91",
        "country_id": "US",
        "area_id": "",
        "region_id": "US_WA",
        "city_id": "US_1107",
        "county_id": "xx",
        "isp_id": "3000107"
    }
}

image

拿到了想要的數(shù)據(jù)，但是如果想加請求頭headers咋辦呢，上面的方法沒有這個參數(shù)，可以使用構(gòu)建request的方式來添加請求頭信息。往下看。

構(gòu)建request

先看看request的基本用法，可以看出，現(xiàn)在并不是直接通過傳遞url參數(shù)的方式來發(fā)送請求，而是通過包裝request的方式。

import urllib.request

request = urllib.request.Request('https://python.org')
response = urllib.request.urlopen(request)
print(response.read().decode('utf-8'))

再看下Request的構(gòu)造方法:

   def __init__(self, url, data=None, headers={},
                 origin_req_host=None, unverifiable=False,
                 method=None):

data: 請求數(shù)據(jù)體
headers: 請求頭
origin_req_host: 當前IP地址，做爬蟲的時候就是通過這個偽裝IP
unverifiable: 請求是否是無法驗證的,默認為false.
method: 指定請求方法，比如GET,POST,PUT...

還是使用淘寶的接口，看看使用方式，懶得寫接口:

from urllib import request,parse

url = 'http://ip.taobao.com/service/getIpInfo.php'
headers = {
    'User-Agent':'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)',
    'Host':'hip.taobao.com'
}
dict1 = {
    'ip':'63.223.108.42'
}
data = bytes(parse.urlencode(dict1),encoding='utf-8')
req = request.Request(url=url,data=data,headers=headers,method='GET')
response = request.urlopen(req)
print(response.read().decode('utf-8'))

同樣，我們?nèi)匀豢梢阅玫缴线叺臄?shù)據(jù)?，F(xiàn)在可以添加頭信息了，那如果想實現(xiàn)認證，cookie處理等，上面的方法處理不了，繼續(xù)往下.

Handler

urllib使用handler來處理這些操作，針對不同的功能，有不同的handler，比如處理cookie,認證，代理等等，所有handler都繼承自基類BaseHandler.

HTTPDefaultErrorHandler：處理HTTP錯誤
HTTPRedirectHandler：處理重定向
HTTPCookieProcessor：處理cookie
ProxyHandler：處理代理
HTTPPasswordMgr：管理密碼
HTTPBasicAuthHandler：管理認證

還有一些可以在BaseHandler中查看。

登陸

handler是處理邏輯的，發(fā)送請求我們使用OpenerDirector。我們用OpenerDirector包裝handler，然后發(fā)送請求到服務(wù)器。直接上源碼:

from urllib.request import HTTPPasswordMgrWithDefaultRealm,HTTPBasicAuthHandler,build_opener
from urllib.error import URLError

username = 'username'
password = 'password'
url = 'https://jenkins.labradors.work/login?from=%2F'
p = HTTPPasswordMgrWithDefaultRealm()
p.add_password(None,url,username,password)
auth_handler = HTTPBasicAuthHandler(p)
opener = build_opener(auth_handler)

try:
    result = opener.open(url)
    html = result.read().decode('utf-8')
    print(html)
except URLError as e:
    print(e.reason)

將用戶名和密碼包裝到HTTPPasswordMgrWithDefaultRealm,然后再將HTTPPasswordMgrWithDefaultRealm包裝為HTTPBasicAuthHandler,最后通過OpenerDirector發(fā)送請求。

代理

from urllib.error import URLError
from urllib.request import ProxyHandler, build_opener

proxy_handler = ProxyHandler({
    'http': 'http://127.0.0.1:1086',
    'https': 'https://127.0.0.1:1086'
})
opener = build_opener(proxy_handler)
try:
    response = opener.open('https://www.baidu.com')
    print(response.read().decode('utf-8'))
except URLError as e:
    print(e.reason)

在本地搭建一個代理，運行在1086端口，利用ProxyHandler包裝，然后發(fā)送數(shù)據(jù)...

Cookies

import http.cookiejar, urllib.request

cookie = http.cookiejar.CookieJar()
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('http://www.baidu.com')
for item in cookie:
    print(item.name+"="+item.value)

利用HTTPCookieProcessor構(gòu)建opener發(fā)送數(shù)據(jù)，最后打印出所有的cookie值。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Python urllib使用(一)

Python urllib使用(一)

爬網(wǎng)站數(shù)據(jù)

構(gòu)建request

Handler

登陸

代理

Cookies

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Python urllib使用(一)

爬網(wǎng)站數(shù)據(jù)

構(gòu)建request

Handler

登陸

代理

Cookies

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av