日本熟妇色,成人超碰在线观看,国产熟妇久久

title: python復(fù)習(xí)第16天：網(wǎng)頁解析器之xpath
date: 2020-04-06 23:00:24
tags:
- python
- 爬蟲
categories: python復(fù)習(xí)
top: 17

在XML文件中查找信息的一套規(guī)則/語言，根據(jù)XML的元素或者屬性進行遍歷。
推薦教程:https://www.runoob.com/xpath/xpath-syntax.html

Xpath 開發(fā)工具

開源表達式編輯工具：XMLQuire
Chorme插件：Xpath Helper
可以使用谷歌瀏覽器直接粘貼xpath路徑，但是可能通用性不強

使用方法

安裝lxml

pip install lxml
conda install lxml

導(dǎo)入etree

from lxml import etree

構(gòu)建html樹

from lxml import etree

text = """
<!DOCTYPE html>
<html lang="zh">
    <head>
        <meta charset="UTF-8">
        <title>這是標(biāo)題</title>
    </head>
    <body>
    <div style="color:#FF0000"><p>這是段落1</p> </div>
    <div style="color:#FFFF00"><p>這是段落2</p> </div>
    <div style="color:#000000"><p>這是段落2</p> </div>
    </body>
</html>
"""
html = etree.HTML(text)

選取節(jié)點

nodename：選取此節(jié)點的所有節(jié)點
/：根節(jié)點或者下一節(jié)點

result = html.xpath('/html')
print(result)
"""
[<Element html at 0x7f2448926b40>]
"""

// ：選取節(jié)點，不考慮位置

result = html.xpath('//div')
print(result)
"""
[<Element div at 0x7f6de30e7960>, <Element div at 0x7f6de30e7910>, <Element div at 0x7f6de30e78c0>]
"""

. ：選取當(dāng)前節(jié)點

result = html.xpath('//div')
for r in result:
    s = r.xpath('./p')  # 選取當(dāng)前div節(jié)點下的p節(jié)點
    print(s)
"""
[<Element p at 0x7fd70835c780>]
[<Element p at 0x7fd70835c730>]
[<Element p at 0x7fd70835c780>]
"""

.. ：選取當(dāng)前節(jié)點的父節(jié)點

result = html.xpath('//div')  # 選取div節(jié)點
for r in result:
    s = r.xpath('..')  # 選取當(dāng)前節(jié)點的父節(jié)點，div的父節(jié)點是body
    print(s)
"""
[<Element body at 0x7f5d65f6f820>]
[<Element body at 0x7f5d65f6f820>]
[<Element body at 0x7f5d65f6f820>]
"""

@：選取屬性

result = html.xpath('//div[@style="color:#FF0000"]')  # 選取div節(jié)點,其中style = "color:#FF0000"
print(result)
"""
[<Element div at 0x7f6673d7b9b0>]
"""

/ ：一般安裝路徑查找，表示它的子節(jié)點
// ：表示它的后代，包括子、孫

提取屬性或者文本

text()：提取當(dāng)前節(jié)點的文本

result = html.xpath('//div/p/text()')  # 選取//div/p下的文本
print(result)
"""
['這是段落1', '這是段落2', '這是段落2']
"""

string(.)：提取當(dāng)前節(jié)點以及子孫節(jié)點的所有文本

result = html.xpath('string(.)')
print(result)
"""

    
        
        這是標(biāo)題
    
    
    這是段落1 
    這是段落2 
    這是段落2 
    


"""

@：提取某個屬性，以提取百度官網(wǎng)的所有url為例

import requests
from lxml import etree
from pprint import pprint
import re
url = 'https://www.baidu.com'
headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)\
     Chrome/80.0.3987.149 Safari/537.36'
}
response = requests.get(url, headers=headers)
text = response.text
html = etree.HTML(text)
result = html.xpath('//a/@href')  # 選取所有a節(jié)點下的鏈接
p = re.compile('http.?://.*?')  # 編寫正則表達式，提取http或者https開頭的網(wǎng)頁
list2 = []
for r in result:
    result2 = p.match(r)  # 檢測是否匹配
    if result2:  # 如果匹配
        list2.append(r)
pprint(list2)
"""
['https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F&sms=5',
 'https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_pc_1',
 'http://news.baidu.com',
 'https://www.hao123.com',
 'http://map.baidu.com',
 'http://v.baidu.com',
 'http://tieba.baidu.com',
 'http://xueshu.baidu.com',
 'https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F&sms=5',
 'http://www.baidu.com/gaoji/preferences.html',
 'http://www.baidu.com/more/',
 'http://ir.baidu.com',
 'http://e.baidu.com/?refer=888',
 'http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000001',
 'http://tieba.baidu.com/f?kw=&fr=wwwt',
 'http://zhidao.baidu.com/q?ct=17&pn=0&tn=ikaslist&rn=10&word=&fr=wwwt',
 'http://music.taihe.com/search?fr=ps&ie=utf-8&key=',
 'http://image.baidu.com/search/index?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&word=',
 'http://v.baidu.com/v?ct=301989888&rn=20&pn=0&db=0&s=25&ie=utf-8&word=',
 'http://map.baidu.com/m?word=&fr=ps01000',
 'http://wenku.baidu.com/search?word=&lm=0&od=0&ie=utf-8']
"""

謂語-Predicates

/School/Student[1] :選取School下面第一個節(jié)點
/School/Student[last()] : 選取School下面最后一個節(jié)點
/School/Student[position()<3] : 選取School下面前三個節(jié)點
//Student[@score="99"] 選取屬性帶有99的節(jié)點

Xpath運算符

運算符	描述	實例	返回值
\|	計算兩個節(jié)點集	//book \| //cd	返回所有擁有 book 和 cd 元素的節(jié)點集
+	加法	6 + 4	10
-	減法	6 - 4	2
*	乘法	6 * 4	24
div	除法	8 div 4	2
=	等于	price=9.80	如果 price 是 9.80，則返回 true。如果 price 是 9.90，則返回 false。
!=	不等于	price!=9.80	如果 price 是 9.90，則返回 true。如果 price 是 9.80，則返回 false。
<	小于	price<9.80	如果 price 是 9.00，則返回 true。如果 price 是 9.90，則返回 false。
<=	小于或等于	price<=9.80	如果 price 是 9.00，則返回 true。如果 price 是 9.90，則返回 false。
>	大于	price>9.80	如果 price 是 9.90，則返回 true。如果 price 是 9.80，則返回 false。
>=	大于或等于	price>=9.80	如果 price 是 9.90，則返回 true。如果 price 是 9.70，則返回 false。
or	或	price=9.80 or price=9.70	如果 price 是 9.80，則返回 true。如果 price 是 9.50，則返回 false。
and	與	price>9.00 and price<9.90	如果 price 是 9.80，則返回 true。如果 price 是 8.50，則返回 false。
mod	計算除法的余數(shù)	5 mod 2	1

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

python復(fù)習(xí)第16天：網(wǎng)頁解析器之xpath

python復(fù)習(xí)第16天：網(wǎng)頁解析器之xpath

Xpath 開發(fā)工具

使用方法

選取節(jié)點

提取屬性或者文本

謂語-Predicates

Xpath運算符

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

python復(fù)習(xí)第16天：網(wǎng)頁解析器之xpath

Xpath 開發(fā)工具

使用方法

選取節(jié)點

提取屬性或者文本

謂語-Predicates

Xpath運算符

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av