亚洲国产国模AV,午夜福利蜜桃视频网站,国产精品久久吹潮

1 環(huán)境

Windows7 x64

Python 3.7

2 流程

i) 配置相關(guān)庫

ii) 爬取網(wǎng)頁源代碼信息

iii) 用函數(shù)爬取特定標簽里不同參數(shù)的文字

3 代碼

3.1 配置相關(guān)庫（request和BS4)

輸入

from urllib.request import urlopen #獲取用以請求打開網(wǎng)頁的庫
from bs4 import BeautifulSoup #獲取解析網(wǎng)頁的庫

輸出

導(dǎo)入爬蟲相關(guān)庫

3.2 爬取網(wǎng)頁源代碼

輸入

html=urlopen("http://www.pythonscraping.com/pages/warandpeace.html") #獲取html結(jié)構(gòu)與內(nèi)容
bs0bj=BeautifulSoup(html) #提取name信息

輸出

抓取特定網(wǎng)頁www.pythonscraping.com/pages/warandpeace.html的結(jié)構(gòu)與內(nèi)容
BeautifulSoup從網(wǎng)頁源代碼中爬取name信息

備注

name 屬性用于對提交到服務(wù)器端的表單數(shù)據(jù)進行標識，或者在客戶端通過 JavaScript 引用表單數(shù)據(jù)。

只有設(shè)置了 name 屬性的表單元素，才能在提交表單時傳遞它們的值。

3.3 爬取特定標簽里不同參數(shù)的文字

BeautifulSoup里的find()和findAll()函數(shù)，可通過標簽的不同屬性，查找需要的標簽組或單個標簽

3.3.1 文本參數(shù)text

輸入

nameList=bs0bj.findAll(text="the prince") #查找網(wǎng)頁中包含"the prince"內(nèi)容的標簽
print(len(nameList)) #統(tǒng)計字符"the prince"出現(xiàn)次數(shù)

輸出

查找并打印網(wǎng)頁中包含"the prince"內(nèi)容的標簽
統(tǒng)計字符"the prince"出現(xiàn)次數(shù)

備注

text使用標簽的文本內(nèi)容去匹配，而不是用標簽屬性
len()函數(shù)返回字符串長度或項目個數(shù)（變量內(nèi)含多個項目/元素時）

3.3.2 關(guān)鍵詞參數(shù)keyword

輸入

allText = bs0bj.findAll(id="text") #關(guān)鍵詞參數(shù)keyword，可選擇具有指定屬性的標簽
print(allText[0].get_text())

輸出

打印網(wǎng)頁所有text文本內(nèi)容

備注

關(guān)鍵詞參數(shù)keyword可以選擇具有指定屬性的標簽

3.3.3 標簽參數(shù)tag

輸入

tagList=bs0bj.findAll({"h1","h2"})#返回一個包含HTML文檔h1標題標簽的列表
print(tagList[0].get_text())

輸出

返回一個包含HTML文檔中h1、h2標題標簽的列表

備注

標簽參數(shù)tag可以傳一個或多個標簽名稱組成的Python列表做標簽參數(shù)

3.3.4 屬性參數(shù)attributes

輸入

nameList=bs0bj.findAll("span",{"class":"green"}) #提取所有span標簽下的綠色文字內(nèi)容
for name in nameList: #注意for的用法：遍歷列表所有名字
    print(name.get_text()) #清除標簽信息，打印人物名稱列表

輸出

用bs0bj.findAll（tagName, tagAttributes）抽取只包含在<span class="green"></span>span>標簽里的文字，得到war and peace人物名稱列表

備注

bs0bj.tagName只能獲取頁面中第一個指定標簽，而bs0bj.findAll（tagName, tagAttributes)獲取頁面中所有指定標簽
name.get_text()會把HTML文檔中所有標簽、超鏈接、段落清除，返回一串不帶標簽的文字，所以通常最后打印、存儲、操作數(shù)據(jù)時才使用。一般情況下應(yīng)保留HTML文檔的標簽結(jié)構(gòu)，便于BeautifulSoup對象查找。
注意for的用法：遍歷列表所有名字

4 全文

代碼全文如下：


###############################################################################
# Crawler
# Author: Lenox
# Data:2019.05.15
# License: BSD 3.0
###############################################################################
?
# 配置相關(guān)庫
from urllib.request import urlopen #獲取請求打開網(wǎng)頁的庫
from bs4 import BeautifulSoup #獲取解析網(wǎng)頁的庫
?
# 爬取網(wǎng)頁源代碼信息
html=urlopen("http://www.pythonscraping.com/pages/warandpeace.html")#獲取html結(jié)構(gòu)與內(nèi)容
bs0bj=BeautifulSoup(html)
?
# 爬取特定標簽里不同參數(shù)的文字
# 文本參數(shù)text
nameList=bs0bj.findAll(text="the prince")#查找網(wǎng)頁中包含"the prince"內(nèi)容的標簽
print(len(nameList)) #統(tǒng)計字符"the prince"出現(xiàn)次數(shù)
?
# 關(guān)鍵詞參數(shù)keyword
allText = bs0bj.findAll(id="text")#關(guān)鍵詞參數(shù)keyword，可選擇具有指定屬性的標簽
print(allText[0].get_text())
?
# 標簽參數(shù)tag
tagList=bs0bj.findAll({"h1","h2"})#返回一個包含HTML文檔h1標題標簽的列表
print(tagList[0].get_text())
?
# 屬性參數(shù)attributes
nameList=bs0bj.findAll("span",{"class":"green"})#提取所有span標簽下的綠色文字內(nèi)容
for name in nameList: #注意for的用法，遍歷列表所有名字
 print(name.get_text()) #清除標簽信息，打印人物名稱列表
?

5 參考

[1]《Python網(wǎng)絡(luò)數(shù)據(jù)采集》【美】Ryan Mitchell著；陶俊杰、陳小莉譯

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

編程相關(guān) | Python簡單網(wǎng)頁標簽抓取

編程相關(guān) | Python簡單網(wǎng)頁標簽抓取

1 環(huán)境

2 流程

3 代碼

3.1 配置相關(guān)庫（request和BS4)

3.2 爬取網(wǎng)頁源代碼

備注

3.3 爬取特定標簽里不同參數(shù)的文字

3.3.1 文本參數(shù)text

輸入

輸出

備注

3.3.2 關(guān)鍵詞參數(shù)keyword

備注

3.3.3 標簽參數(shù)tag

輸入

輸出

備注

3.3.4 屬性參數(shù)attributes

備注

4 全文

5 參考

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

編程相關(guān) | Python簡單網(wǎng)頁 標簽抓取

1 環(huán)境

2 流程

3 代碼

3.1 配置相關(guān)庫（request和BS4)

3.2 爬取網(wǎng)頁源代碼

備注

3.3 爬取特定標簽里不同參數(shù)的文字

3.3.1 文本參數(shù)text

輸入

輸出

備注

3.3.2 關(guān)鍵詞參數(shù)keyword

備注

3.3.3 標簽參數(shù)tag

輸入

輸出

備注

3.3.4 屬性參數(shù)attributes

備注

4 全文

5 參考

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

編程相關(guān) | Python簡單網(wǎng)頁標簽抓取