久久视频一区二区,啊啊啊啊啊视频,VR AV色播播

簡單描述一下爬取的基本思路：

在google上搜索海賊王，選定風(fēng)之動漫網(wǎng)為目標(biāo)進(jìn)行爬取數(shù)據(jù),如:http://manhua.fzdm.com/2/846/index_1.html
觀察每個頁面url規(guī)律，846是代表話數(shù)，index_page.html代表是多少頁
檢查頁面的圖片便簽，找出唯一能指定該圖片的CSS表達(dá)式
使用requests來get到頁面的報文，使用BeautifulSoup來解析報文
原計劃使用MongoDB存儲圖片地址，處于暫時操作mongodb還不夠熟練，直接使用了列表操作
使用urllib來進(jìn)行下載圖片到本地

from bs4 import BeautifulSoup
import requests
import time
import pymongo
import urllib.request
import os
path = '/Users/meixuhong/OnePiece/'

# ================================== 設(shè)計數(shù)據(jù)庫 ====================================
client = pymongo.MongoClient('localhost',27017)
onepiece = client['onepiece']
onepiece_pic = onepiece['onepiece_pic']

# ================================== 抓取多頁數(shù)據(jù) ==================================
def parseMultiplePages(chapter,page_num):
    img_urls = []
    for page_num in range(1,page_num+1):
        time.sleep(4)
        wb_data = requests.get('http://manhua.fzdm.com/2/{}/index_{}.html'.format(chapter,page_num))
        soup = BeautifulSoup(wb_data.text,'lxml')
        imgs = soup.select('div#mh > li > a > img')

        for img in imgs:
            data = {
                'img': img.get('src')
            }
            print(data)
            # onepiece_pic.insert_one(data)
            img_urls.append(data['img'])
    print('img_urls is a list as:',img_urls)
    return img_urls

# 837話的前16頁
# parseMultiplePages(837,16)

# ================================== 下載漫畫并命名 ==================================
def dl_images(chapter,img_urls):
    #==判斷并創(chuàng)建目錄==
    subPath = path + str(chapter) + '/'
    isExists = os.path.exists(subPath)
    if not isExists:
        print('create the path: {}...'.format(subPath))
        os.mkdir(subPath)
    else:
        print('the path already exsiting ...')
    # ==判斷并創(chuàng)建目錄==

    for i in range(1,len(img_urls)+1):
        # 使用urllib.request.urlretrieve(url, fine_path_name)下載文件
        urllib.request.urlretrieve(img_urls[i-1],subPath+str(i)+'_'+img_urls[i-1].split('/')[-1])
        print('\n{} downloaded and has been named as {}.\n'.format(img_urls[i-1],subPath+str(i)+'_'+img_urls[i-1].split('/')[-1]))

# ================================== 下載多話漫畫 ==================================
def dl_chapters(chapter_from_,chapter_to_):
    for i in range(chapter_from_ , chapter_to_ + 1):
        dl_images(i,parseMultiplePages(i,18))

dl_chapters(800,848)

程序完全只考慮了功能實(shí)現(xiàn)，沒有考慮多做考慮，以后海賊王更新的時候不用到處找資源慢慢等待了，滿足我個人需求。

OnePiece

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

使用Python抓抓海賊王離線看漫畫

使用Python抓抓海賊王離線看漫畫

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

使用Python抓抓海賊王離線看漫畫

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av