1.找到百度圖片網(wǎng)站并輸入搜索詞
打開(kāi)https://image.baidu.com/search/index?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=index&fr=&hs=0&xthttps=111110&sf=1&fmq=&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=%E7%BE%8E%E5%A5%B3&oq=%E7%BE%8E%E5%A5%B3&rsp=-1
2.分析網(wǎng)頁(yè)
按F12打開(kāi)開(kāi)發(fā)者模式。


3.使用python模擬瀏覽器向?yàn)g覽器發(fā)送請(qǐng)求并獲取響應(yīng)
用到的模塊:requests。模塊使用之前要先導(dǎo)入(import requests)。
模塊安裝見(jiàn)http://www.itdecent.cn/p/d4262c8d8af8
進(jìn)入url,按F12進(jìn)入開(kāi)發(fā)者模式,network--all--top250?start=0&filter=--headers,獲取url,和requests方法。
響應(yīng)回來(lái)的數(shù)據(jù):HTML+CSS+JS+data,由瀏覽器進(jìn)行解析執(zhí)行
代碼:
# 導(dǎo)入模塊
import requests
# 發(fā)送請(qǐng)求
resp = requests.get(url, headers=headers)
4.用json在線解析器解析響應(yīng)回來(lái)的json數(shù)據(jù)
將正確的url打開(kāi),Ctrl+A全選,Ctrl+C復(fù)制,粘貼到j(luò)son在線解析器中,每一個(gè)object是一個(gè)圖片
5.解析數(shù)據(jù),將響應(yīng)轉(zhuǎn)換成json格式
url = 'https://image.baidu.com/search/acjson?tn=resultjson_com&logid=12117865351080430388&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E7%BE%8E%E5%A5%B3&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&hd=&latest=©right=&word=%E7%BE%8E%E5%A5%B3&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&expermode=&force=&cg=girl&pn=30&rn=30&gsm=1e&1612964334559='
resp = requests.get(url, headers=headers)
從解析結(jié)果看到我們要找的data是在一個(gè)字典里,object是在一個(gè)列表里,列表中有N多個(gè)字典,圖片鏈接就在字典中。
url = 'https://image.baidu.com/search/acjson?tn=resultjson_com&logid=12117865351080430388&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E7%BE%8E%E5%A5%B3&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&hd=&latest=©right=&word=%E7%BE%8E%E5%A5%B3&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&expermode=&force=&cg=girl&pn=30&rn=30&gsm=1e&1612964334559='
resp = requests.get(url, headers=headers)
# 繼續(xù)提取數(shù)據(jù),遍歷列表中的數(shù)據(jù),根據(jù)鍵獲取thumbURL的值
? ? ?# 最后一個(gè)object沒(méi)有數(shù)據(jù),故此處加一個(gè)判斷
? ? ? ? ?lst.append(item['thumbURL'])
7.請(qǐng)求url為每張圖片的地址,獲取數(shù)據(jù),再存儲(chǔ)數(shù)據(jù)
url = 'https://image.baidu.com/search/acjson?tn=resultjson_com&logid=12117865351080430388&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E7%BE%8E%E5%A5%B3&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&hd=&latest=©right=&word=%E7%BE%8E%E5%A5%B3&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&expermode=&force=&cg=girl&pn=30&rn=30&gsm=1e&1612964334559='
resp = requests.get(url, headers=headers)
? ? ? ? ?lst.append(item['thumbURL'])
? ? ?resp = requests.get(item, headers=headers)
? ? ?#,創(chuàng)建img文件夾, wb:寫(xiě)入二進(jìn)制數(shù)據(jù)
? ? ?with open('img/'+str(count)+'.jpg', 'wb') as file:
response.status_code,檢查請(qǐng)求是否成功
response.content,把response對(duì)象轉(zhuǎn)換成二進(jìn)制數(shù)據(jù)
response.text,把response對(duì)象轉(zhuǎn)換成字符串?dāng)?shù)據(jù)


