日韩国产欧美女优一区,东京热大香蕉,超碰精品 AV

一、爬蟲準(zhǔn)備

語言：python
瀏覽器：google chrome
工具：request模塊

首先我們在百度圖片搜索頁面輸入需要搜索的關(guān)鍵詞（比如：明星）頁面結(jié)果如下

image.png

按F12進(jìn)入開者，隨便檢查列表一張圖片可以看到找到圖片的地址，copy src里面的圖片地址
https://ss1.bdstatic.com/70cFvXSh_Q1YnxGkpoWK1HF6hhy/it/u=371978350,138525231&fm=26&gp=0.jpg，待用

image.png

選擇Network All ，刷新一下頁面，看到和瀏覽器一樣的請求，類型為document

image.png

這個(gè)就是瀏覽器返回的頁面，點(diǎn)擊這個(gè)請求，并cont+f查找剛才復(fù)制的圖片地址，可以發(fā)現(xiàn)能在js代碼中找到該圖片地址，也就是這個(gè)頁面的圖片地址不是靜態(tài)頁面生成的，是js動(dòng)態(tài)生成的。這樣不能通過request.get(url)審查元素獲取圖片地址，不過也可以通過正則表達(dá)式來獲取js代碼里面的圖片地址，這樣方式我不推薦

image.png

那么，圖片地址是在那里獲取的呢，切換Netword下的All標(biāo)簽到XHR，我們可以看到這個(gè)請求，打開一看https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E6%98%8E%E6%98%9F&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=&hd=&latest=&copyright=&word=%E6%98%8E%E6%98%9F&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&expermode=&force=&pn=30&rn=30&gsm=1e&1552975216767=

image.png

這個(gè)好像就是我們想要的列表數(shù)據(jù)，打開一看，確定是，獲取

image.png

，這樣我們可以通過請求上面的地址，獲取我們想要的數(shù)據(jù)，仔細(xì)看看地址的參數(shù)queryWord、word是我們編碼的搜索關(guān)鍵字，不用編碼也沒問題，rn是每一頁的圖片數(shù)，默認(rèn)30，pn是第幾個(gè)圖片,通常rn的倍數(shù)，其他的參數(shù)都是固定了，只需要改變這三個(gè)參數(shù)來獲取列表圖片了。

注意事項(xiàng)

有些圖片地址直接復(fù)制到瀏覽器上是請求不到圖片的，也就是requsts.get(image_url)是獲取不到圖片的，后來查找到瀏覽器正常的操作是有帶有頭部Referer，指向搜索地址

image.png

python 代碼實(shí)現(xiàn)如下

import requests
import re
import time
import os
import urllib.parse
import json

page_num=30
photo_dir="D:\\data\\pic\\face\\photo"

def getThumbImage(word):
    num=0
    url = "http://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word={0}&pn={1}"
    while num<50:

        page_url=url.format(urllib.parse.quote(word),num*page_num)
        print(page_url)
        response=requests.get(page_url)
        pic_urls=re.findall('"thumbURL":"(.*?)",',response.text,re.S)
        
        if pic_urls:
        
            for pic_url in pic_urls:
                name=pic_url.split('/')[-1]
                print(pic_url)
                headers={
                    "Referer":page_url,
                }
                html=requests.get(pic_url,headers=headers)
                with open(os.path.join(word_dir,name),'wb')as f:
                    f.write(html.content)
        num=num+1

def getThumb2Image(word):
    num=0
    url = "https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&is=&fp=result&queryWord={0}&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=&hd=&latest=&copyright=&word={0}&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&expermode=&force=&pn={1}&rn="+str(page_num)+"&gsm=1e&1552975216767="
    while num<50:

        page_url=url.format(urllib.parse.quote(word),num*page_num)
        print(page_url)
        response=requests.get(page_url)
        pic_urls=re.findall('"thumbURL":"(.*?)",',response.text,re.S)
        for pic_url in pic_urls:
            name=pic_url.split('/')[-1]
            print(pic_url)
            headers={
                "Referer":page_url,
            }
            html=requests.get(pic_url,headers=headers)
            with open(os.path.join(word_dir,name),'wb')as f:
                f.write(html.content)
        num=num+1
        

if __name__ == "__main__":
    word = input("請輸入搜索關(guān)鍵詞(可以是人名，地名等): ")
    word_dir=os.path.join(photo_dir,word)
    if not os.path.exists(word_dir):
        os.mkdir(word_dir)
    getThumb2Image(word)

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

python 爬蟲百度圖片之列表圖

python 爬蟲百度圖片之列表圖

一、爬蟲準(zhǔn)備

注意事項(xiàng)

python 代碼實(shí)現(xiàn)如下

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

python 爬蟲百度圖片之列表圖

一、爬蟲準(zhǔn)備

注意事項(xiàng)

python 代碼實(shí)現(xiàn)如下

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

一、爬蟲準(zhǔn)備