男生操女生网站,久久久在线一区二区,丰满国产精品视频二区

模擬登錄

模擬登錄常用于大型數(shù)據(jù)爬取，通過模擬登錄，獲得網(wǎng)站發(fā)給用戶有效的 cookies，在爬蟲爬取數(shù)據(jù)時，可以增加網(wǎng)站對爬蟲的信任度，從而達到更好的爬取效果。

準備

Requests
BeautifulSoup
re
cookielib

開始

模擬登錄果殼

思路：

瀏覽器訪問果殼登錄頁面，打開調(diào)試器，分析表單 html 元素（需要郵箱、密碼、驗證碼三項）
分析驗證碼路徑，構造 python 代碼獲取驗證碼
使用 requests 的 session() 方法，為每次請求建立關系 (http 為無狀態(tài))
用 BeautifulSoup 解析 requests 請求回來的頁面，找到相應的 post 表單，分析并填寫表單每一項
首次登錄用 cookielib 保存網(wǎng)站為用戶分配的 cookies
第一次登錄成功后，之后采取 cookies 登錄即可完成模擬登錄

獲取表單內(nèi)容

瀏覽器打開調(diào)試器快捷鍵：ctrl+shift+c，在表單處隨意填寫，點擊果殼網(wǎng)上的登錄按鈕：

果殼登錄.png

查看調(diào)試器 Network ，查找到 sign_in/ 用的是 POST 方法提交表單，且Form Data 如下：

果殼表單.png

表單說明
csrf_token	防止XSS攻擊的隨機字符串
username	用戶名
password	密碼
captcha	驗證碼
captcha_rand	獲取驗證碼的隨機值
permanent	y（固定值）

經(jīng)過查看 html 源代碼可以知道，csrf_token、captcha_rand 都可以在頁面找到：

查找表單.png

從上往下依次是 csrf_token、captcha_rand、以及驗證碼地址，觀察驗證碼地址 https://account.guokr.com/captcha/1940664610/， https://account.guokr.com/captcha/是固定的，后面的數(shù)字部分是隨機的，即 captcha_rand，部分代碼：

session = requests.session()
def get_csrf_captcha_rand(url):
    response = session.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'lxml')
    csrf_token = soup.select('input#csrf_token')[0]
    captcha_rand = soup.select('input#captchaRand')[0]
    match_cs = re.findall(r'.*?value="(.*)".*', str(csrf_token))[0]
    match_rand = re.findall(r'.*?value="(.*?)".*', str(captcha_rand))[0]
    return match_cs, match_rand

代碼注釋：通過有連接的 session 請求果殼登錄 url，用 BeautifulSoup 解析網(wǎng)頁，獲取 csrf_token、captcha_rand，然后返回。

獲取驗證碼圖片（下載到本地并打開讓用戶輸入）

通過字符串拼接 get_csrf_captcha_rand 方法返回的 captcha_rand，得到https://account.guokr.com/captcha/1940664610/，然而這串數(shù)字是隨機的 10 位數(shù)，通常是以當前時間生成，代碼：

def get_captcha(rand): # 保存captcha.png圖片
    import time
    time = str(int(time.time() * 1000))
    captcha_url = 'https://account.guokr.com/captcha/{}/?v={}'.format(rand, time)
    response = session.get(captcha_url, headers=headers)
    with open('captcha.png', 'wb') as f:
        f.write(response.content)
        f.close()
    from PIL import Image
    try:
        captcha_image = Image.open('captcha.png')
        captcha_image.show()
        captcha_image.close()
    except:
        print 'captcha.png not found!'
    code = raw_input('please check the captcha code and enter it:')
    return code

代碼注釋：將當前時間轉(zhuǎn)換成字符串，與驗證碼圖片 url 拼接，訪問該 url 后，將圖片保存到本地并用 PIL 圖片庫展示給用戶進行輸入，最后再將用戶的輸入返回。

提交表單，獲取 cookies

集齊上面的表單字段后，就可以正式登錄：

def guokr_login(account, password):  # 正式登錄
    url = 'https://account.guokr.com/sign_in/'
    csrf_captcha_rand = get_csrf_captcha_rand(url)
    post_data = {
        'csrf_token': csrf_captcha_rand[0],
        'username': account,
        'password': password,
        'captcha': get_captcha(csrf_captcha_rand[1]),
        'captcha_rand': csrf_captcha_rand[1],
        'permanent': 'y'
    }
    response = session.post(url, data=post_data, headers=headers)
    session.cookies.save()

代碼注釋：拼湊表單，用 session 建立連接，最后保存 cookies 用作后面的登錄，這段代碼最重要的是獲取登錄后的 cookies，以下是 cookies 內(nèi)容：

cookies.png

從這段 cookies 文本來看，有效期大概為 1 個月。

判斷 cookies 是否有效

拿到 cookies 后，要試一試 cookies 之后的登錄是否有效，寫一個判斷登錄是否有效的函數(shù)幫助判斷，在瀏覽器中，找一個需要登錄才能訪問的 url：

def is_login():  # 判斷是否為登錄狀態(tài)   http://www.guokr.com/i/0890827117/ allow_redirects=False
    personal_url = 'http://www.guokr.com/user/feeds/'
    response = session.get(personal_url, headers=headers)
    if response.status_code != 200:
        return False
    else:
        return True

代碼注釋：找一個需要登錄狀態(tài)才能訪問的 url 訪問，如果response.status_code為 200，則 cookies 有效，之后可以用此 cookies 訪問果殼網(wǎng)，獲取想要爬取的數(shù)據(jù)，但如果不行的話，就要再次調(diào)試。至這篇記錄文章發(fā)表的時候，這種模擬登錄的思路仍然有效，若果殼網(wǎng)站做了調(diào)整，則需要做出相應改變。

全部代碼如下：

# -*- coding:utf-8 -*-
import requests
from bs4 import BeautifulSoup
import cookielib
import re
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 UBrowser/6.1.3397.16 Safari/537.36',
}
session = requests.session()
session.cookies = cookielib.LWPCookieJar('cookies.txt')
try:  # 嘗試加載cookies
    session.cookies.load(ignore_discard=True)
except:
    print 'cookies failed to load!'
else:
    print 'cookies has been loading!'

def get_csrf_captcha_rand(url):  # 在頁面中找到csrf_token和captcha_rand
    response = session.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'lxml')
    csrf_token = soup.select('input#csrf_token')[0]
    captcha_rand = soup.select('input#captchaRand')[0]
    match_cs = re.findall(r'.*?value="(.*)".*', str(csrf_token))[0]
    match_rand = re.findall(r'.*?value="(.*?)".*', str(captcha_rand))[0]
    return match_cs, match_rand

def get_captcha(rand): # 保存captcha.png圖片
    import time
    time = str(int(time.time() * 1000))
    captcha_url = 'https://account.guokr.com/captcha/{}/?v={}'.format(rand, time)
    response = session.get(captcha_url, headers=headers)
    with open('captcha.png', 'wb') as f:
        f.write(response.content)
        f.close()
    from PIL import Image
    try:
        captcha_image = Image.open('captcha.png')
        captcha_image.show()
        captcha_image.close()
    except:
        print 'captcha.png not found!'
    code = raw_input('please check the captcha code and enter it:')
    return code

def guokr_login(account, password):  # 正式登錄
    url = 'https://account.guokr.com/sign_in/'
    csrf_captcha_rand = get_csrf_captcha_rand(url)
    post_data = {
        'csrf_token': csrf_captcha_rand[0],
        'username': account,
        'password': password,
        'captcha': get_captcha(csrf_captcha_rand[1]),
        'captcha_rand': csrf_captcha_rand[1],
        'permanent': 'y'
    }
    response = session.post(url, data=post_data, headers=headers)
    session.cookies.save()  # 保存cookies

def is_login():  # 判斷是否為登錄狀態(tài)   http://www.guokr.com/i/0890827117/ allow_redirects=False
    personal_url = 'http://www.guokr.com/user/feeds/'
    response = session.get(personal_url, headers=headers)
    if response.status_code != 200:
        return False
    else:
        return True

guokr_login('賬號', '密碼')
is_login()

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

模擬登錄之果殼網(wǎng)

模擬登錄之果殼網(wǎng)

模擬登錄

準備

開始

模擬登錄果殼

思路：

獲取表單內(nèi)容

獲取驗證碼圖片（下載到本地并打開讓用戶輸入）

提交表單，獲取 cookies

判斷 cookies 是否有效

相關閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

模擬登錄之果殼網(wǎng)

模擬登錄

準備

開始

模擬登錄果殼

思路：

獲取表單內(nèi)容

獲取驗證碼圖片 （下載到本地并打開讓用戶輸入）

提交表單，獲取 cookies

判斷 cookies 是否有效

相關閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

獲取驗證碼圖片（下載到本地并打開讓用戶輸入）