筆記7:對本地網頁解析并數(shù)據(jù)抓取統(tǒng)計分析

from bs4 import BeautifulSoup

path=r'E:\index.html'

data_list= []

with open(path,'r') as file:

Soup=BeautifulSoup(file,'lxml')

brices=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right')

titles=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a')

levels=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right')

counts=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p:nth-of-type(2)')

imgs=Soup.select('body > div > div > div.col-md-9 > div > div > div > img')

# print(counts)

for brice,title,level,count,img in zip(brices,titles,levels,counts,imgs):

data={

'brice':brice.get_text(),

'title':title.get_text(),

'level':level.get_text(),

'count':len(count.find_all(attrs={'class':'glyphicon glyphicon-star'})),#采用find_all(attrs = {屬性:內容}) 匹配方法,確定星星數(shù)量

'img':img.get('src')

}

data_list.append(data)

foriinsorted(data_list,key=lambdax:x['count'],reverse=True):

print('title {} --count is {} -- brice is {}'.format(i['title'], i['count'], i['brice']))

輸出結果:

title EarPod --count is 5 -- brice is $24.99

title New Pocket --count is 4 -- brice is $64.99

title New sunglasses --count is 4 -- brice is $74.99

title iphone gamepad --count is 4 -- brice is $94.99

title Best Bed --count is 4 -- brice is $214.5

title iWatch --count is 4 -- brice is $500

title Park tickets --count is 4 -- brice is $15.5

title Art Cup --count is 3 -- brice is $84.99


小結:

BeautifulSoup參考文檔鏈接:

https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/#find-all-tag

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

友情鏈接更多精彩內容