婷婷精品视频,av资源天堂入口,人妻日日日

一.分析思路

網(wǎng)易云音樂熱歌榜的頁面采用嵌入內(nèi)聯(lián)框架的方式，若爬蟲直接從官網(wǎng)入口進(jìn)入訪問熱歌榜
http://music.163.com/#/discover/toplist?id=3778678，是無法獲取到iframe框架內(nèi)的數(shù)據(jù)的，因此應(yīng)當(dāng)用另外的方法進(jìn)行訪問。當(dāng)然最簡單的方法可以用selenium+chrome的方式進(jìn)行獲取數(shù)據(jù)（使用switch_to.frame方法，網(wǎng)上有很多教程）。

第二種方法采用分析api的方式進(jìn)行，內(nèi)嵌的框架實(shí)際上也是一個(gè)網(wǎng)頁資源，因此只要我們找到它的鏈接，然后獲取到這一頁面，自然也能獲取到數(shù)據(jù)。本文采取此方式進(jìn)行分析和編寫程序。

二.分析需求

獲取熱歌榜上所有歌曲名稱，以及歌手，歌曲時(shí)長，專輯等信息。

三.使用chrome開發(fā)工具分析頁面

1.打開網(wǎng)易云音樂熱歌榜頁面，按下F12，打開chrome的開發(fā)者工具，找到iframe元素，發(fā)現(xiàn)其src為空，并且可以看到iframe中的內(nèi)容是使用javascript生成的：

image.png

2.因此我們需要分析api找到我們所需的鏈接。再選擇network，選擇Doc,然后重新加載界面，發(fā)現(xiàn)文件有兩個(gè)：

image.png

3.分別在這兩文件的response中搜索榜單上的任意一首歌，發(fā)現(xiàn)榜單在第二個(gè)document中：

image.png

4.右鍵這一Doc，選擇copy/copy link address獲取內(nèi)嵌框架的鏈接。
5.用此鏈接另外打開一個(gè)界面發(fā)現(xiàn)界面又跳回原來的頁面，因此我們?cè)谠瓉淼木W(wǎng)頁右擊選擇查看框架源代碼，搜索后發(fā)現(xiàn)榜單數(shù)據(jù)是以json的格式存在<textarea>這一標(biāo)簽下：

image.png

6.我們采用chrome擴(kuò)展插件json handle來分析這些json數(shù)據(jù)：

image.png

這樣就獲取到我們想要的數(shù)據(jù)了
四.編寫程序
代碼如下：

#—*—coding=utf-8
import requests
import sys
import json
import time
reload(sys)
sys.setdefaultencoding('utf8')
from bs4 import BeautifulSoup
url1 = 'http://music.163.com/discover/toplist?id=3778678'#云音樂熱歌榜
#UA必須要設(shè)置，未設(shè)置獲取的網(wǎng)頁不完整
headers = {  
    'Cookie':'__e_=1515461191756; _ntes_nnid=af802a7dd2cafc9fef605185da6e73fb,1515461190617; _ntes_nuid=af802a7dd2cafc9fef605185da6e73fb; JSESSIONID-WYYY=HMyeRdf98eDm%2Bi%5CRnK9iB%5ChcSODhA%2Bh4jx5t3z20hhwTRsOCWhBS5Cpn%2B5j%5CVfMIu0i4bQY9sky%5CsvMmHhuwud2cDNbFRD%2FHhWHE61VhovnFrKWXfDAp%5CqO%2B6cEc%2B%2BIXGz83mwrGS78Goo%2BWgsyJb37Oaqr0IehSp288xn5DhgC3Cobe%3A1515585307035; _iuqxldmzr_=32; __utma=94650624.61181594.1515583507.1515583507.1515583507.1; __utmc=94650624; __utmz=94650624.1515583507.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmb=94650624.4.10.1515583507',  
    'Host':'music.163.com',  
    'Refere':'http://music.163.com/',  
    'Upgrade-Insecure-Requests':'1',  
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'  
}  

response = requests.get(url1,headers=headers)
print response.status_code

html = response.text
soup = BeautifulSoup(html,'lxml')
update_time = soup.find('span',attrs={'class':'sep s-fc3'}).text
print update_time

#找到j(luò)son數(shù)據(jù)
textarea = soup.find('textarea').text
i = 1
contents = json.loads(str(textarea))

#將數(shù)據(jù)輸出到wangyi.log文件中
fo = open('wangyi.log','w')
sys.stdout = fo
for a in range(len(contents)):
    #發(fā)行時(shí)間
    t1 = time.localtime(contents[a].get('publishTime')/1000)
    t2 = time.strftime("%Y-%m-%d %H:%M:%S",t1)
    #歌曲時(shí)長
    t3 = contents[a].get('duration')/1000
    min = str(t3/60)
    sec = str(t3%60)
    if len(sec)<2:
        sec = '0'+str(sec)
    #歌手
    artist = contents[a].get('artists')[0].get('name')
    #歌名
    music_name = contents[a].get('name')
    #專輯
    album = contents[a].get('album').get('name')
    print i,'.',music_name,u' 播放時(shí)長：',min+':'+str(sec)#.encode('gbk','ignore')
    print u'歌手：',artist
    print u'專輯：',album
    #其他信息
    if contents[a].get('alias'):
        alias = contents[a].get('alias')[0]
        print alias
    print u'發(fā)行時(shí)間：',t2
    i += 1
    print'--------------------------------------------------------------------'

輸出結(jié)果：

image.png

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

python爬取網(wǎng)易云音樂熱歌榜單（獲取iframe中數(shù)據(jù)，src為空）

python爬取網(wǎng)易云音樂熱歌榜單（獲取iframe中數(shù)據(jù)，src為空）

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

python爬取網(wǎng)易云音樂熱歌榜單（獲取iframe中數(shù)據(jù)，src為空）

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

python爬取網(wǎng)易云音樂熱歌榜單（獲取iframe中數(shù)據(jù)，src為空）