九九九精品九九九九九,亚洲色视频区,99热精品视

運(yùn)行環(huán)境：
* Python 2.7.12  
* Scrapy 1.2.2
* Mac OS X 10.10.3 Yosemite

繼續(xù)爬取Scrapy 1.2.2文檔提供的練習(xí)網(wǎng)址：

"http://quotes.toscrapy.com"

可以暫時(shí)不用考慮爬蟲被封的情況，用于初級(jí)爬蟲練習(xí)。

目標(biāo)

使用items來(lái)包裝需要爬取的內(nèi)容。把內(nèi)容都用items.py來(lái)進(jìn)行管理，便于把抓取的內(nèi)容傳遞進(jìn)pipelines進(jìn)行后期處理。同時(shí)，把內(nèi)容都放進(jìn)items.py以后，可以解耦合爬蟲文件spider.py，責(zé)任更加明晰：爬蟲負(fù)責(zé)去發(fā)請(qǐng)求，解析網(wǎng)址；items.py負(fù)責(zé)管理抓取到的內(nèi)容。

改寫第一個(gè)爬蟲

步驟1：聲明items

首先，我們針對(duì)第一個(gè)爬蟲進(jìn)行改寫。

在項(xiàng)目目錄下有items.py文件。這是存放items的地方，也就是存放抓取內(nèi)容的地方。我們需要在items.py中告訴Scrapy我們要抓取的內(nèi)容叫什么名字，也就是需要聲明items。

items.py文件改寫如下：

import scrapy

class QuotesItem(scrapy.Item):
    quote = scrapy.Field()
    author = scrapy.Field()
    tags = scrapy.Field()

表示我們要抓取的內(nèi)容是：quote，author以及tags。

步驟2：引入items.py的類

建立新爬蟲文件quotes_2_4.py，并將第一個(gè)爬蟲文件的內(nèi)容復(fù)制如下：

import scrapy

class QuotesSpider(scrapy.Spider):
    name = 'quotes_2_1'
    start_urls = [
        'http://quotes.toscrape.com'
    ]
    allowed_domains = [
        'toscrape.com'
    ]

    def parse(self,response):
        for quote in response.css('div.quote'):
            yield{
                'quote': quote.css('span.text::text').extract_first(),
                'author': quote.css('small.author::text').extract_first(),
                'tags': quote.css('div.tags a.tag::text').extract(),
            }

首先更改兩個(gè)內(nèi)容：

在文件開頭引入items，from quotes_2.items import QuotesItem。quotes_2.items表示quotes_2項(xiàng)目下的items文件，import QuotesItem是引入QuotesItem這個(gè)類，上一段就是在這個(gè)類中聲明的items。（如果聲明了多個(gè)類，可以使用from <項(xiàng)目名>.items import *表示引入items.py中所有的類。
更改爬蟲名字，name = 'quotes_2_4'。

步驟3：改寫parse()函數(shù)

然后，需要更改parse()函數(shù)中的yield{}內(nèi)容，parse()函數(shù)改寫如下：

    def parse(self,response):
        for quote in response.css('div.quote'):
            item = QuotesItem()
            item['quote'] = quote.css('span.text::text').extract_first()
            item['author'] = quote.css('small.author::text').extract_first()
            item['tags'] = quote.css('div.tags a.tag::text').extract()
            yield item

具體內(nèi)容是：

實(shí)例化item，item = QuotesItem()。
對(duì)item中的變量賦值。
yield item。

這樣就實(shí)現(xiàn)了items來(lái)包裝抓取內(nèi)容，達(dá)到items.py來(lái)管理內(nèi)容的目的。

最終的爬蟲文件如下：

import scrapy
from quotes_2.items import QuotesItem

class QuotesSpider(scrapy.Spider):
    name = 'quotes_2_4'
    start_urls = [
        'http://quotes.toscrape.com',
    ]
    allowed_domains = [
        'toscrape.com',
    ]

    def parse(self,response):
        for quote in response.css('div.quote'):
            item = QuotesItem()
            item['quote'] = quote.css('span.text::text').extract_first()
            item['author'] = quote.css('small.author::text').extract_first()
            item['tags'] = quote.css('div.tags a.tag::text').extract()
            yield item

運(yùn)行爬蟲

$ scrapy crawl quotes24 -o results_2_4_01.json

可以達(dá)到第一個(gè)爬蟲一樣的效果。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

極簡(jiǎn)Scrapy爬蟲4：items包裝

極簡(jiǎn)Scrapy爬蟲4：items包裝

目標(biāo)

改寫第一個(gè)爬蟲

步驟1：聲明items

步驟2：引入items.py的類

步驟3：改寫parse()函數(shù)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

極簡(jiǎn)Scrapy爬蟲4：items包裝

目標(biāo)

改寫第一個(gè)爬蟲

步驟1：聲明items

步驟2：引入items.py的類

步驟3：改寫parse()函數(shù)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av