利用requests庫與lxml解析,獲取豆瓣電影展示的所有正在上映的電影。沒有進(jìn)行進(jìn)一步詳情查看。因此比較簡單。

結(jié)果展示
所有單個(gè)正在上映電影信息以字典存儲(chǔ),并整合在一張列表里。整理后格式如下:
[ {
'title': '反貪風(fēng)暴4',
'score': '6.3',
'duration': '100分鐘',
'actors': '古天樂/鄭嘉穎/林峯',
'thumbnail': 'https: //img3.doubanio.com/view/photo/s_ratio_poster/public/p2551353482.jpg'
},{...},{...},....]
代碼如下
# -*- coding: utf-8 -*-
"""
Created on Sun Apr 7 22:58:18 2019
@author: ericariel
"""
import requests
from lxml import etree
movies=[]
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"
}
url="https://movie.douban.com/cinema/nowplaying/wuhan/"
resp=requests.get(url,headers=headers)
html=etree.HTML(resp.text)
ul=html.xpath('//ul[@class="lists"]')[0]
#print(etree.tostring(ul,encoding='utf-8').decode('utf-8'))
lis=ul.xpath("./li")
for li in lis:
title=li.xpath('@data-title')[0]
score=li.xpath('@data-score')[0]
duration=li.xpath('@data-duration')[0]
actors=li.xpath('@data-actors')[0]
thumbnail=li.xpath('.//img/@src')[0]
movie={
'title':title,
'score':score,
'duration':duration,
'actors':actors,
'thumbnail':thumbnail}
movies.append(movie)
print(movies)