大香蕉黄页,日韩熟女一区啪啪,日韩99偷啪内射

運(yùn)行環(huán)境：
* Python 2.7.12  
* Scrapy 1.2.2
* Mac OS X 10.10.3 Yosemite

Scrapy 1.2.2文檔提供了一個(gè)練習(xí)用的網(wǎng)址：

"http://quotes.toscrapy.com"

可以暫時(shí)不用考慮爬蟲被封的情況，用于初級(jí)爬蟲練習(xí)。

目標(biāo)

爬取該網(wǎng)站的名言（quote）、作者（author）以及標(biāo)簽（tag）。

整體代碼

步驟1：建立項(xiàng)目

在希望保存項(xiàng)目的目錄下，使用命令行輸入：

scrapy startproject quotes_2

其中scrapy startproject是命令，quotes_2是項(xiàng)目名稱，可以隨便取。

步驟2：編寫爬蟲

最開始，只實(shí)現(xiàn)一個(gè)小目標(biāo)：只爬取第一頁的內(nèi)容。

在項(xiàng)目目錄中，有一個(gè)spiders文件夾（本例中為/quotes_2/quotes2/spiders/），新建爬蟲文件quotes_2_1.py，整體內(nèi)容如下：

import scrapy

class QuotesSpider(scrapy.Spider):
    name = 'quotes_2_1'
    start_urls = [
        'http://quotes.toscrape.com'
    ]
    allowed_domains = [
        'toscrape.com'
    ]

    def parse(self,response):
        for quote in response.css('div.quote'):
            yield{
                'quote': quote.css('span.text::text').extract_first(),
                'author': quote.css('small.author::text').extract_first(),
                'tags': quote.css('div.tags a.tag').extract(),
            }

分析內(nèi)容

import scrapy

引入scrapy包。
必備三件套：name，start_urls, parse()
```
class QuotesSpider(scrapy.Spider):
  name = 'quotes_2_1'
  start_urls = [
      'http://quotes.toscrape.com'
  ]

  def parse(self,response):
```
在爬蟲中，必須有這三個(gè)項(xiàng)目。
- name：爬蟲的名字。字符串形式，如果是windows系統(tǒng)最好使用雙引號(hào)。
- start_urls：起始網(wǎng)址。為列表形式，一定要使用括號(hào)。
- parse()：解析函數(shù)。對(duì)返回的服務(wù)器返回的響應(yīng)（response）進(jìn)行解析的函數(shù)，參數(shù)為(self,response)。全網(wǎng)爬取也可以換成rules。
另外，allowed_domains規(guī)定了爬取的范圍，如果不希望爬取外聯(lián)網(wǎng)站，可使用該可選項(xiàng)。
parse()函數(shù)
```
      for quote in response.css('div.quote'):
```
response.css是scrapy的CSS選擇器（selector），在后面的括號(hào)中規(guī)定條件，就可以對(duì)需要爬取的內(nèi)容進(jìn)行定位。這里'div.quote'的意思是找到名字叫"quote"的div。

因?yàn)椴榭淳W(wǎng)頁源代碼，可以發(fā)現(xiàn)每一條名言都是在"quote"的div中。
```
<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“A day without sunshine is like, you know, night.”</span>
        <span>by <small class="author" itemprop="author">Steve Martin</small>
        <a href="/author/Steve-Martin">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" itemprop="keywords" content="humor,obvious,simile"> 
            
            <a class="tag" href="/tag/humor/page/1/">humor</a>
            
            <a class="tag" href="/tag/obvious/page/1/">obvious</a>
            
            <a class="tag" href="/tag/simile/page/1/">simile</a>
            
        </div>
</div>
```
定位到每一條名言以后，可以用python的for … in …進(jìn)行遍歷。
```
          yield{
              'quote': quote.css('span.text::text').extract_first(),
              'author': quote.css('small.author::text').extract_first(),
              'tags': quote.css('div.tags a.tag').extract(),
          }
```
對(duì)于每一條名言，可以用yield{}得到需要的元素。注意類似于字典格式，每一條需要有逗號(hào)分隔開。每一個(gè)抓取的元素，也許需要進(jìn)行定位，但是因?yàn)樵谘h(huán)中，使用quote.css()定位，然后進(jìn)行提取。

有兩個(gè)關(guān)鍵點(diǎn)：
- ::text：是CSS選擇器的語法，表示指定該元素的文本內(nèi)容。
- .extract()：表示把所有內(nèi)容提取出來。如果只提取第一項(xiàng)，使用.extract_first()或者.extract()[0]，推薦使用前者，因?yàn)樘崛〉臅r(shí)候沒有第一項(xiàng)的話，.extract_first()不會(huì)報(bào)錯(cuò)，而后者會(huì)。
?

步驟3：運(yùn)行爬蟲

進(jìn)入命令行，在項(xiàng)目的目錄下（該例為/quotes/），

查看爬蟲：輸入scrapy list，如果爬蟲內(nèi)容沒有問題，這會(huì)顯示爬蟲的名稱（name）。如果有問題就會(huì)報(bào)錯(cuò)。
運(yùn)行爬蟲：scrapy crawl quotes_2_1 -o result_2_1_01.json，可以得到以下結(jié)果（中間內(nèi)容忽略）：
```
[
{"quote": "\u201cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\u201d", "author": "Albert Einstein", "tags": ["change", "deep-thoughts", "thinking", "world"]},
......
{"quote": "\u201cA day without sunshine is like, you know, night.\u201d", "author": "Steve Martin", "tags": ["humor", "obvious", "simile"]}
]
```
其中

scrapy crawl：運(yùn)行爬蟲的命令。

quotes_2_1：在爬蟲代碼quotes_2_1.py中指定過name = 'quotes_2_1'，使用該處的名字（name）。

-o result_2_1_01.json: 輸出到j(luò)son文件。-o是可選參數(shù)，result_2_1_01.json名字可以隨便取，但是格式一般為json，jl或者csv等。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

極簡(jiǎn)Scrapy爬蟲1：爬取單頁內(nèi)容

極簡(jiǎn)Scrapy爬蟲1：爬取單頁內(nèi)容

目標(biāo)

整體代碼

步驟1：建立項(xiàng)目

步驟2：編寫爬蟲

分析內(nèi)容

步驟3：運(yùn)行爬蟲

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

極簡(jiǎn)Scrapy爬蟲1：爬取單頁內(nèi)容

目標(biāo)

整體代碼

步驟1：建立項(xiàng)目

步驟2：編寫爬蟲

分析內(nèi)容

步驟3：運(yùn)行爬蟲

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av