亚洲精品在线看视频,色av日干导航,三级片在线观看日韩

image.png

1、解析數(shù)據(jù)

bs對象 = BeautifulSoup(要解析的文本，“解析器”)
note:
在括號中，要輸入兩個參數(shù)，第0個參數(shù)是要被解析的文本，注意了，它必須必須必須是字符串。
括號中的第1個參數(shù)用來標(biāo)識解析器，我們要用的是一個Python內(nèi)置庫：html.parser。
（它不是唯一的解析器，但是比較簡單的）

import requests
#引入BS庫
from bs4 import BeautifulSoup
res = requests.get('https://localprod.pandateacher.com/python-manuscript/crawler-html/spider-men5.0.html') 
html = res.text
soup = BeautifulSoup(html,'html.parser') #把網(wǎng)頁解析為BeautifulSoup對象
#soup的類型為 <class 'bs4.BeautifulSoup'>
#返回值與 res.text 的值是一樣的

那BS的作用或者是區(qū)別在哪里呢？
雖然 res.text 和 soup 打印出的內(nèi)容表面上看長得一模一樣，但是它們屬于不同的類：
（1）<class 'str'> 與<class 'bs4.BeautifulSoup'>。前者是字符串，后者是已經(jīng)被解析過的BeautifulSoup對象。
（2）BeautifulSoup對象在直接打印它的時候會調(diào)用該對象內(nèi)的str方法，所以直接打印 bs 對象顯示字符串str。
（3）之后BeautifulSoup庫會用來提取數(shù)據(jù)，如果不是 BeautifulSoup 對象，就沒辦法調(diào)用其屬性和方法。

image.png

2、提取數(shù)據(jù)

（1）find() 與 find_all()

image.png

上圖括號里的class_，這里有一個下劃線，是為了和python語法中的類 class區(qū)分，避免程序沖突。當(dāng)然，除了用class屬性去匹配，還可以使用其它屬性，比如style屬性等。

import requests # 調(diào)用requests庫
from bs4 import BeautifulSoup # 調(diào)用BeautifulSoup庫
res = requests.get('https://localprod.pandateacher.com/python-manuscript/crawler-html/spider-men5.0.html')# 返回一個Response對象，賦值給res
html= res.text# 把Response對象的內(nèi)容以字符串的形式返回
soup = BeautifulSoup( html,'html.parser') # 把網(wǎng)頁解析為BeautifulSoup對象
items = soup.find_all(class_='books') # 通過定位標(biāo)簽和屬性提取我們想要的數(shù)據(jù)
for item in items:
    print('想找的數(shù)據(jù)都包含在這里了：\n',item) # 打印item

(2) Tag對象

image.png

import requests
from bs4 import BeautifulSoup
res = requests.get("https://localprod.pandateacher.com/python-manuscript/crawler-html/spider-men5.0.html")
html = res.text
soup = BeautifulSoup(html, 'html.parser')#bs對象
items = soup.find_all(class_= "books")#現(xiàn)在就是Tag了,因此換方法嘍
for item in items:
    #print(item)
    kind = item.find('h2')
    title = item.find(class_="title")
    content = item.find('p')
    print(kind.text,"\n",title['href'],"\n",title.text,"\n",content.text,"\n")
注意的是根據(jù)屬性名提取的時候，必需要在此標(biāo)簽里，不能跨越到上級標(biāo)簽。

從最開始用requests庫獲取數(shù)據(jù)，到用BeautifulSoup庫來解析數(shù)據(jù)，再繼續(xù)用BeautifulSoup庫提取數(shù)據(jù)，不斷經(jīng)歷的是我們操作對象的類型轉(zhuǎn)換。其實作用和用力的地方一定是對象！
一張圖總結(jié)下：

image.png

import requests
# 引用requests庫
from bs4 import BeautifulSoup
# 引用BeautifulSoup庫

res_foods = requests.get('http://www.xiachufang.com/explore/')
# 獲取數(shù)據(jù)
bs_foods = BeautifulSoup(res_foods.text,'html.parser')
# 解析數(shù)據(jù)

tag_name = bs_foods.find_all('p',class_='name')
# 查找包含菜名和URL的<p>標(biāo)簽
tag_ingredients = bs_foods.find_all('p',class_='ing ellipsis')
# 查找包含食材的<p>標(biāo)簽
list_all = []
# 創(chuàng)建一個空列表，用于存儲信息
for x in range(len(tag_name)):
# 啟動一個循環(huán)，次數(shù)等于菜名的數(shù)量
    list_food = [tag_name[x].text[18:-14],tag_name[x].find('a')['href'],tag_ingredients[x].text[1:-1]]
    # 提取信息，封裝為列表。注意此處[18:-14]切片和之前不同，是因為此處使用的是<p>標(biāo)簽，而之前是<a>
    list_all.append(list_food)
    # 將信息添加進(jìn)list_all
print(list_all)
# 打印


# 以下是另外一種解法


list_foods = bs_foods.find_all('div',class_='info pure-u')
# 查找最小父級標(biāo)簽

list_all = []
# 創(chuàng)建一個空列表，用于存儲信息

for food in list_foods:

    tag_a = food.find('a')
    # 提取第0個父級標(biāo)簽中的<a>標(biāo)簽
    name = tag_a.text[17:-13]
    # 菜名，使用[17:-13]切掉了多余的信息
    URL = 'http://www.xiachufang.com'+tag_a['href']
    # 獲取URL
    tag_p = food.find('p',class_='ing ellipsis')
    # 提取第0個父級標(biāo)簽中的<p>標(biāo)簽
    ingredients = tag_p.text[1:-1]
    # 食材，使用[1:-1]切掉了多余的信息
    list_all.append([name,URL,ingredients])
    # 將菜名、URL、食材，封裝為列表，添加進(jìn)list_all

print(list_all)
# 打印
···

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

python總結(jié)-BeautifulSoup

python總結(jié)-BeautifulSoup

1、解析數(shù)據(jù)

2、提取數(shù)據(jù)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

python總結(jié)-BeautifulSoup

1、解析數(shù)據(jù)

2、提取數(shù)據(jù)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

2、提取數(shù)據(jù)