久久久96在线播放,97国产精品免,cliu久久久

安裝

win+X 命令提示符（使用管理員權(quán)限啟動控制臺）
輸入安裝命令

pip install beautifulsoup4

Beautiful Soup庫的安裝小測

演示HTML頁面地址：http://python123.io/ws/demo.html

demo = r.text
from bs4 import BeautifulSoup
soup = BeautifulSoup(demo, "html.parser")
print(soup.prettify())

BeautifulSoup庫的基本元素

Beaufitul Soup庫的引用
Beautiful Soup庫，也叫beautifulsoup4或bs4

from bs4 import BeautifulSoup

from bs4 import BeautifulSoup
soup = BeautifulSoup("<html>data</html>","html.parser")
soup2 = BeautifulSoup(open("D://demo.html"), "html.parser")

BeautifulSoup對應(yīng)一個HTML/XML文檔的全部內(nèi)容

Beautiful Soup庫解析器

解析器	使用方法	條件
bs4的HTML解析器	BeautifulSoup(mk, 'html.parser')	安裝bs4庫
lxml的HTML解析器	BeautifulSoup(mk, 'lxml')	pip install lxml
lxml的XML解析器	BeautifulSoup(mk, 'xml')	pip install lxml
html5lib的解析器	BeautifulSoup(mk, 'html5lib')	pip install html5lib

Beautiful Soup類的基本元素

基本元素	說明
Tag	標(biāo)簽，最基本的信息組織黨員，分別用<>和</>標(biāo)明開頭和結(jié)尾
Name	標(biāo)簽的名字，<p>...</p>的名字是‘p’，格式：<tag>.name
Attributes	標(biāo)簽的屬性，字典形式組織，格式：<tag>.attrs
NavigableString	標(biāo)簽內(nèi)非屬性字符串，<>...</>中字符串，格式：<tag>.string
Comment	標(biāo)簽內(nèi)字符串的注釋部分，一種特殊的Comment類型

基于bs4庫的HTML內(nèi)容遍歷方法

回顧demo.html

>>> import requests
>>> r = requests.get("http://python123.io/ws/demo.html")
>>> demo = r.text
>>> demo
'<html><head><title>This is a python demo page</title></head><body><p class="title"><b>The demo python introduces several python courses.</b></p><p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:<a  class="py1" id="link1">Basic Python</a> and <a  class="py2" id="link2">Advanced Python</a>.</p></body></html>'

HTML基本格式

<html>
    <head>
        <title>This is a python demo page</title>
    </head>
    <body>
        <p class="title">
            <b>The demo python introduces several python courses.</b>
        </p>
        <p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
            <a  class="py1" id="link1">Basic Python</a>
             and 
            <a  class="py2" id="link2">Advanced Python</a>
            .
        </p>
    </body>
</html>

標(biāo)簽樹的下行遍歷

屬性	說明
.contents	子節(jié)點(diǎn)的列表，將<tag>所有兒子節(jié)點(diǎn)存入列表
.children	子節(jié)點(diǎn)的迭代類型，與.contents類似，用于循環(huán)遍歷兒子節(jié)點(diǎn)
.descendants	子孫節(jié)點(diǎn)的迭代類型，包含所有子孫節(jié)點(diǎn)，用于循環(huán)遍歷

for child in soup.body.children:
    print(child)

標(biāo)簽樹的上行遍歷

屬性	說明
.parent	節(jié)點(diǎn)的父親標(biāo)簽
.parents	節(jié)點(diǎn)先輩標(biāo)簽的迭代類型，用于循環(huán)遍歷先輩節(jié)點(diǎn)

>>> soup = BeautifulSoup(demo, "html.parser")
>>> for parent in soup.a.parents:
           if parent is None:
               print(parent)
           else:
               print(parent.name)

標(biāo)簽樹的平行遍歷

屬性	說明
.next_sibling	返回按照HTML文本順序的下一個平行節(jié)點(diǎn)標(biāo)簽
.previous_sibling	返回按照HTML文本順序的上一個平行節(jié)點(diǎn)標(biāo)簽
.next_siblings	迭代類型，返回按照HTML文本順序的后續(xù)所有平行節(jié)點(diǎn)標(biāo)簽
.previous_siblings	迭代類型，返回按照HTML文本順序的前續(xù)所有平行節(jié)點(diǎn)標(biāo)簽

for sibling in soup.a.next_siblings:
    print(sibling)
for sibling in soup.a.previous_siblings:
    print(sibling)

基于bs4庫的HTML格式輸出

bs4庫的prettify()方法

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Beautiful Soup庫

Beautiful Soup庫

安裝

Beautiful Soup庫的安裝小測

BeautifulSoup庫的基本元素

Beautiful Soup庫解析器

Beautiful Soup類的基本元素

基于bs4庫的HTML內(nèi)容遍歷方法

標(biāo)簽樹的下行遍歷

標(biāo)簽樹的上行遍歷

標(biāo)簽樹的平行遍歷

基于bs4庫的HTML格式輸出

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Beautiful Soup庫

安裝

Beautiful Soup庫的安裝小測

BeautifulSoup庫的基本元素

Beautiful Soup庫解析器

Beautiful Soup類的基本元素

基于bs4庫的HTML內(nèi)容遍歷方法

標(biāo)簽樹的下行遍歷

標(biāo)簽樹的上行遍歷

標(biāo)簽樹的平行遍歷

基于bs4庫的HTML格式輸出

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av