一、準(zhǔn)備工作
1、安裝BeautifulSoup4
最快捷的是直接使用pip安裝
pip install beautifulsoup4
2、BeautifulSoup4基礎(chǔ)教程
基礎(chǔ)使用文檔鏈接
https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/
3、常用方法筆記整理

image.png
二、實際項目練習(xí)
1、練習(xí)網(wǎng)址:http://www.chineseidcard.com/

image.png
2、請求接口分析返回數(shù)據(jù)
http://www.chineseidcard.com/?region=110101&birthday=19900307&sex=1&num=5&r=30
想要的數(shù)據(jù)就具體的身份證信息

image.png
通過分析這些關(guān)鍵信息保存在這個table標(biāo)簽下
<table class="table" style="margin-bottom:0;">
<tbody>
<tr>
<th style="text-align:right;width:20%;vertical-align: middle;"></th>
<td style="vertical-align: middle;">110101199003072631</td>
</tr>
<tr>
<th style="text-align:right;width:20%;vertical-align: middle;"></th>
<td style="vertical-align: middle;">110101199003070492</td>
</tr>
<tr>
<th style="text-align:right;width:20%;vertical-align: middle;"></th>
<td style="vertical-align: middle;">110101199003075314</td>
</tr>
<tr>
<th style="text-align:right;width:20%;vertical-align: middle;"></th>
<td style="vertical-align: middle;">110101199003078398</td>
</tr>
<tr>
<th style="text-align:right;width:20%;vertical-align: middle;"></th>
<td style="vertical-align: middle;">110101199003071532</td>
</tr>
</tbody>
</table>
3、先模擬請求,獲取到頁面返回數(shù)據(jù)
#coding:utf-8
from bs4 import BeautifulSoup
import requests
import json
def gethtml(IDnum):
url = "http://www.chineseidcard.com/"
headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
"X-Requested-With":"XMLHttpRequest"
}
params = {
"region":"110101",
"birthday":"19900307",
"sex":"1",
"num":IDnum,
"r":30
}
res = requests.get(url,headers=headers,params=params)
data = json.loads(res.text,encoding="utf-8")
4、BeautifulSoup4來查找標(biāo)簽
#coding:utf-8
from bs4 import BeautifulSoup
import requests
import json
def gethtml(IDnum):
url = "http://www.chineseidcard.com/"
headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
"X-Requested-With":"XMLHttpRequest"
}
params = {
"region":"110101",
"birthday":"19900307",
"sex":"1",
"num":IDnum,
"r":30
}
res = requests.get(url,headers=headers,params=params)
data = json.loads(res.text,encoding="utf-8")
soup = BeautifulSoup(data,"html.parser")
# 獲取第2個table標(biāo)簽下的數(shù)據(jù)
table = soup.find_all('table',class_='table')[1]
#獲取單個身份證號
cardID = id.find_all('td')[0].string
5、遍歷結(jié)果,返回所有身份證號信息
table = soup.find_all('table',class_='table')[1]
這個主要是因為所有返回結(jié)果中,身份證信息是保存在第2個table中

image.png
#coding:utf-8
from bs4 import BeautifulSoup
import requests
import json
def gethtml(IDnum):
url = "http://www.chineseidcard.com/"
headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
"X-Requested-With":"XMLHttpRequest"
}
params = {
"region":"110101",
"birthday":"19900407",
"sex":"1",
"num":IDnum,
"r":30
}
res = requests.get(url,headers=headers,params=params)
data = json.loads(res.text,encoding="utf-8")
soup = BeautifulSoup(data,"html.parser")
# 獲取第2個table標(biāo)簽下的數(shù)據(jù)
table = soup.find_all('table',class_='table')[1]
#獲取單個身份證號
# cardID = id.find_all('td')[0].string
#遍歷每一個td節(jié)點
for i in range(len(table.find_all('td'))):
td_label = table.find_all('td')[i]
#獲取td標(biāo)簽下的文本
cardID = td_label.string
print(cardID)
if __name__ == "__main__":
gethtml(5)
返回結(jié)果如下:
110101199004070873
110101199004077979
110101199004076853
110101199004079552
110101199004076634