Architecture overview

<h3>Architecture overview</h3>
This paper describes the architecture of Scrapy and how its components interact.


Data flow

The data flow in Scrapy is controlled by the execution engine ,and goes like this:

  1. The Engine gets the inital requests to crawl from the Spider.
  2. The Engine schedules the requests in the Scheduler and ask for the next requests for crawl.
  3. The Schedular return the next requests to the Engine.
  4. The Engine sends the requests to the Donwnloader, passing through the Downloader Middlewares (see process_request()).
  5. Once the page finishes downloading the downloader generates a response(with that page) and sends it to the engine,passing through the downloader middlewares (see process_response()).
  6. The engine receives the response from the downloader and sends it to he spider for processing,passing through spider middleware (see process_spider_input()).
  7. The spider processes the response and returns the scraped items and new requests to the engine ,passing through the spider middleware(see process_spider_output()).
  8. The engine sends the processed items to the item pipelines ,then send processed requests to the scheduler and ask for possible next request to crawl.
  9. The process repeats (from step 1 ) until there are no more requests from the scheduler.

<h3>componets</h3>
<h4>Scrapy Engine</h4>
The engine is responsible for contrilling the data flow betweent all components of the system,and trigger events when certeain actions occur. See the data flow above for more details.
<h4>Scheduler</h4>
The Scheduler receives the requests from the engine and enqueues them and feeding them later(also to the engine) when the engine requests them.
<h4>Downloader</h4>
The Downloader is responsible for fetching web pages and feeding them to the engine which .in turn,feeds them to the spiders.
<h4>Spiders></h4>
Spiders are custom classes written by Scrapy users to parse responses and extract items from them or additional requests to follow.
<h4>Item Pipeline</h4>
cleansing,validationand persistems.

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

  • PLEASE READ THE FOLLOWING APPLE DEVELOPER PROGRAM LICENSE...
    念念不忘的閱讀 13,660評論 5 6
  • **2014真題Directions:Read the following text. Choose the be...
    又是夜半驚坐起閱讀 11,122評論 0 23
  • 在今年之前,每次回家都能聽到諸如"多吃一點,要長胖""看你瘦的,風都能吹走"等等。家里要干農活,我一上去,人就說...
    likeekil閱讀 257評論 0 1
  • 源碼編譯 編譯最新版webrtc源碼和編譯好的整個項目10多個Gwebrtc源webrtc技術實踐depot_to...
    殘劍閱讀 1,630評論 0 2
  • “哇,那家的菜好好吃哦!”她邊說邊比劃,眼睛瞇成了一條縫,整個人沉浸其中,一看到她那神情,我的胃酸馬上開始泛濫?!?..
    鳳之子閱讀 371評論 0 2

友情鏈接更多精彩內容