亚洲色情一区二区不卡,超碰人妻少妇97在线,日韩激情亚洲视频

目標(biāo)：抓取圖片網(wǎng)站 http://hunter-its.com上的圖片

1.建立項(xiàng)目 beauty

scrapy startproject beauty

2.cd到目錄，并新建爬蟲(chóng),使用基礎(chǔ)模板

cd beauty

scrapy genspider hunter hunter-its.com

image.png

3.pycharm打開(kāi)項(xiàng)目,先編寫(xiě)item

打開(kāi)item.py文件，定義名字和地址

import scrapy

class BeautyItem(scrapy.Item):

    name = scrapy.Field()
    address = scrapy.Field()

image.png

4.編寫(xiě)spider，爬蟲(chóng)文件

導(dǎo)入之前定義的BeautyItem模塊，和Request模塊

from beauty.items import BeautyItem
from scrapy.http import Request

使用xpath獲取全部的圖片節(jié)點(diǎn)
pics = response.xpath('//div[@class="pic"]/ul/li')
循環(huán)獲取li節(jié)點(diǎn)中的所有圖片和地址

        for pic in pics:
            item = BeautyItem()
            name = pic.xpath('./a/img/@alt').extract()[0]
            address = pic.xpath('./a/img/@src').extract()[0]

            item['name'] = name
            item['address'] = address

            yield item

遞歸調(diào)用函數(shù)，爬取多頁(yè)數(shù)據(jù)

            for i in range(2, 8):
                url = 'http://hunter-its.com/m/'+str(i)+'.html'
                print(url)
                yield Request(url, callback=self.parse)

完整代碼

# -*- coding: utf-8 -*-
import scrapy
from beauty.items import BeautyItem
from scrapy.http import Request


class HunterSpider(scrapy.Spider):
    name = 'hunter'
    allowed_domains = ['hunter-its.com']
    start_urls = ['http://hunter-its.com/m/1.html']

    def parse(self, response):
        #獲取全部的圖片節(jié)點(diǎn)
        pics = response.xpath('//div[@class="pic"]/ul/li')

        for pic in pics:
            item = BeautyItem()
            name = pic.xpath('./a/img/@alt').extract()[0]
            address = pic.xpath('./a/img/@src').extract()[0]

            item['name'] = name
            item['address'] = address

            yield item

            for i in range(2, 8):
                url = 'http://hunter-its.com/m/'+str(i)+'.html'
                print(url)
                yield Request(url, callback=self.parse)

image.png

5.編寫(xiě)數(shù)據(jù)處理腳本pipelines.py,導(dǎo)入requests模塊

import requests

class BeautyPipeline(object):
    def process_item(self, item, spider):

        #模擬瀏覽器
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'}
        #使用request模塊，發(fā)送get請(qǐng)求
        r = requests.get(url=item['address'], headers=headers, timeout=4)

        print(item['address'])
        #下載圖片，存儲(chǔ)在本地文件目錄下
        with open(r'/Users/vincentwen/Downloads/hunter/'+ item['name'] + '.jpg', 'wb') as f:
            f.write(r.content)

image.png

6.修改setting ITEM_PIPELINES

ITEM_PIPELINES = {
   'beauty.pipelines.BeautyPipeline': 100,
}

image.png

7.運(yùn)行爬蟲(chóng)

scrapy crawl hunter

image.png

覺(jué)得文章有用，請(qǐng)用支付寶掃描，領(lǐng)取一下紅包！打賞一下

支付寶紅包碼

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Scrapy 抓取圖片

Scrapy 抓取圖片

1.建立項(xiàng)目 beauty

2.cd到目錄，并新建爬蟲(chóng),使用基礎(chǔ)模板

3.pycharm打開(kāi)項(xiàng)目,先編寫(xiě)item

4.編寫(xiě)spider，爬蟲(chóng)文件

5.編寫(xiě)數(shù)據(jù)處理腳本pipelines.py,導(dǎo)入requests模塊

6.修改setting ITEM_PIPELINES

7.運(yùn)行爬蟲(chóng)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Scrapy 抓取圖片

1.建立項(xiàng)目 beauty

2.cd到目錄，并新建爬蟲(chóng),使用基礎(chǔ)模板

3.pycharm打開(kāi)項(xiàng)目,先編寫(xiě)item

4.編寫(xiě)spider，爬蟲(chóng)文件

5.編寫(xiě)數(shù)據(jù)處理腳本pipelines.py,導(dǎo)入requests模塊

6.修改setting ITEM_PIPELINES

7.運(yùn)行爬蟲(chóng)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av