練習(xí)1.抓取一個(gè)頁(yè)面的內(nèi)容
網(wǎng)址：http://stackoverflow.com/questions?sort=votes
圖如下：

注意：運(yùn)行一個(gè)spider.py的命令 scrapy runspider stackoverflow.py
輸出到一個(gè)文件中 scrapy runspider stackoverflow.py -o stackoverflow.csv

# -*- coding: utf-8 -*-
import scrapy

class StackOverFlowSpider(scrapy.Spider):
    name = "stackoverflow" #你在項(xiàng)目中跑蜘蛛的時(shí)候，要用到它的名字
    start_urls = ['http://stackoverflow.com/questions?sort=votes']
    
    #parse是解析函數(shù)
    def parse(self,response):
        for question in response.xpath('//div[@class="question-summary"]'):
            title = question.xpath('.//div[@class="summary"]/h3/a/text()').extract_first()
            links = response.urljoin(question.xpath('.//div[@class="summary"]/h3/a/@href').extract_first())
            content = question.xpath('.//div[@class="excerpt"]/text()').extract_first().strip()
            votes = question.xpath('.//span[@class="vote-count-post high-scored-post"]/strong/text()').extract_first()
            #votes = question.xpath('.//strong/text()').extract_first()
            answers = question.xpath('.//div[@class="status answered-accepted"]/strong/text()').extract_first()

            yield{
                'title':title,
                'links':links,
                'content':content,
                'votes': votes,
                'answers':answers
            }```
輸出到文件中如下：

![2](http://upload-images.jianshu.io/upload_images/5076126-29c8906471d5346a.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

**練習(xí)2.給一個(gè)列表，其中都是url**
來(lái)看下一頁(yè)類(lèi)型：（就是給一個(gè)列表去抓取網(wǎng)頁(yè)）有每個(gè)頁(yè)數(shù)
網(wǎng)址：http://www.cnblogs.com/pick/#p1

![3](http://upload-images.jianshu.io/upload_images/5076126-0f1730c18b9ddb41.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

-- coding: utf-8 --

import scrapy

class CnblogSpider(scrapy.Spider):
name = "cnblogs"
allowed_domains = ["http://www.cnblogs.com"]
start_urls = ['http://www.cnblogs.com/pick/#p%s' %p for p in range(1,3)]
def parse(self,response):
for article in response.xpath('//div[@class="post_item"]'):
title = article.xpath('.//div[@class="post_item_body"]/h3/a/text()').extract_first()
#鏈接不完整用：response.urljoin()
title_link = article.xpath('.//div[@class="post_item_body"]/h3/a/@href').extract_first()
content = article.xpath('.//p[@class="post_item_summary"]/text()').extract_first()
anthor = article.xpath('.//div[@class="post_item_foot"]/a/text()').extract_first()
anthor_link = article.xpath('.//div[@class="post_item_foot"]/a/@href').extract_first()
comment = article.xpath('.//span[@class="article_comment"]/a/text()').extract_first().strip()
view = article.xpath('.//span[@class="article_view"]/a/text()').extract_first()

        print title
        print title_link
        print content
        print anthor
        print anthor_link
        print comment
        print view

        yield{
            'title':title,
            'title_link':title_link,
            'content':content,
            'anthor':anthor,
            'anthor_link':anthor_link,
            'comment':comment,
            'view':view
        }```

輸出到文件中如下：

重點(diǎn)技巧：一開(kāi)始加屬性就是類(lèi)似id一樣的精確定位，后面子標(biāo)簽有屬性不一定加，看需要屬性還是文本內(nèi)容

練習(xí)3.還是下一頁(yè)，只有一個(gè)next，假如網(wǎng)址里面沒(méi)有1和2等等的數(shù)字

# -*- coding: utf-8 -*-
import scrapy

class QuetoSpider(scrapy.Spider):
    name = 'queto'
    start_urls = ['http://quotes.toscrape.com/tag/humor/']

    def parse(self,response):
        for quote in response.xpath('//div[@class="quote"]'):
            content = quote.xpath('.//span[@class="text"]/text()').extract_first()
            author = quote.xpath('.//small[@class="author"]/text()').extract_first()

            yield{
                'content' : content,
                'author' :author
            }
        #解析下一頁(yè)
        next_page = response.xpath('//li[@class="next"]/a/@href').extract_first()
        if next_page is not None:
            next_page = response.urljoin(next_page)
            #返回頁(yè)面
            yield scrapy.Request(next_page,callback=self.parse)```
解析下一個(gè)頁(yè)面，next_page里面是網(wǎng)址鏈接，返回response，有個(gè)回掉函數(shù)，再用自己的parse，這是一個(gè)遞歸的過(guò)程。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

1.Scrapy爬蟲(chóng)之靜態(tài)網(wǎng)頁(yè)爬取之三spider.py練習(xí)

1.Scrapy爬蟲(chóng)之靜態(tài)網(wǎng)頁(yè)爬取之三spider.py練習(xí)

-- coding: utf-8 --

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

1.Scrapy爬蟲(chóng)之靜態(tài)網(wǎng)頁(yè)爬取之三spider.py練習(xí)

-- coding: utf-8 --

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av