Python學(xué)習(xí)筆記——BeautifulSoup4數(shù)據(jù)提取+隨機(jī)身份證提取

一、準(zhǔn)備工作

1、安裝BeautifulSoup4

最快捷的是直接使用pip安裝

pip install beautifulsoup4

2、BeautifulSoup4基礎(chǔ)教程

基礎(chǔ)使用文檔鏈接
https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/

3、常用方法筆記整理

image.png

二、實際項目練習(xí)

1、練習(xí)網(wǎng)址:http://www.chineseidcard.com/

image.png

2、請求接口分析返回數(shù)據(jù)

http://www.chineseidcard.com/?region=110101&birthday=19900307&sex=1&num=5&r=30
想要的數(shù)據(jù)就具體的身份證信息

image.png

通過分析這些關(guān)鍵信息保存在這個table標(biāo)簽下

<table class="table" style="margin-bottom:0;">
                    <tbody>
                                                <tr>
                            <th style="text-align:right;width:20%;vertical-align: middle;"></th>
                            <td style="vertical-align: middle;">110101199003072631</td>
                        </tr>
                                                <tr>
                            <th style="text-align:right;width:20%;vertical-align: middle;"></th>
                            <td style="vertical-align: middle;">110101199003070492</td>
                        </tr>
                                                <tr>
                            <th style="text-align:right;width:20%;vertical-align: middle;"></th>
                            <td style="vertical-align: middle;">110101199003075314</td>
                        </tr>
                                                <tr>
                            <th style="text-align:right;width:20%;vertical-align: middle;"></th>
                            <td style="vertical-align: middle;">110101199003078398</td>
                        </tr>
                                                <tr>
                            <th style="text-align:right;width:20%;vertical-align: middle;"></th>
                            <td style="vertical-align: middle;">110101199003071532</td>
                        </tr>
                        
                    </tbody>
                </table>

3、先模擬請求,獲取到頁面返回數(shù)據(jù)

#coding:utf-8
from bs4 import BeautifulSoup
import requests
import json

def gethtml(IDnum):
    url = "http://www.chineseidcard.com/"
    headers = {
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
        "X-Requested-With":"XMLHttpRequest"
    }
    params = {
        "region":"110101",
        "birthday":"19900307",
        "sex":"1",
        "num":IDnum,
        "r":30
    }
    res = requests.get(url,headers=headers,params=params)
    data = json.loads(res.text,encoding="utf-8")

4、BeautifulSoup4來查找標(biāo)簽

#coding:utf-8
from bs4 import BeautifulSoup
import requests
import json

def gethtml(IDnum):
    url = "http://www.chineseidcard.com/"
    headers = {
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
        "X-Requested-With":"XMLHttpRequest"
    }
    params = {
        "region":"110101",
        "birthday":"19900307",
        "sex":"1",
        "num":IDnum,
        "r":30
    }
    res = requests.get(url,headers=headers,params=params)
    data = json.loads(res.text,encoding="utf-8")
    soup = BeautifulSoup(data,"html.parser")

    # 獲取第2個table標(biāo)簽下的數(shù)據(jù)
    table = soup.find_all('table',class_='table')[1]
    #獲取單個身份證號
    cardID = id.find_all('td')[0].string

5、遍歷結(jié)果,返回所有身份證號信息
table = soup.find_all('table',class_='table')[1]
這個主要是因為所有返回結(jié)果中,身份證信息是保存在第2個table中


image.png
#coding:utf-8
from bs4 import BeautifulSoup
import requests
import json

def gethtml(IDnum):
    url = "http://www.chineseidcard.com/"
    headers = {
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
        "X-Requested-With":"XMLHttpRequest"
    }
    params = {
        "region":"110101",
        "birthday":"19900407",
        "sex":"1",
        "num":IDnum,
        "r":30
    }
    res = requests.get(url,headers=headers,params=params)
    data = json.loads(res.text,encoding="utf-8")
    soup = BeautifulSoup(data,"html.parser")

    # 獲取第2個table標(biāo)簽下的數(shù)據(jù)
    table = soup.find_all('table',class_='table')[1]
    #獲取單個身份證號
    # cardID = id.find_all('td')[0].string

    #遍歷每一個td節(jié)點
    for i in range(len(table.find_all('td'))):
        td_label = table.find_all('td')[i]
        #獲取td標(biāo)簽下的文本
        cardID = td_label.string
        print(cardID)

if __name__ == "__main__":
    gethtml(5)

返回結(jié)果如下:

110101199004070873
110101199004077979
110101199004076853
110101199004079552
110101199004076634
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容