- 運(yùn)行scrapy程序
scrapy crawl kaili_spider
- 編程最好都用空格
- scrapy方法傳參默認(rèn)第一個(gè)傳self
- scrapy輸出抓取內(nèi)容到文件
yield item
scrapy crawl kaili_spider -o kaili_spider.json
- scrapy中parse不能返回item列表,但作為callback的parse_item卻可以參照。
- scrapy輸出log
scrapy crawl tencent_crawl --logfile 'ten.log' -L INFO
- scrapy調(diào)度器對(duì)請(qǐng)求隊(duì)列的處理方式是請(qǐng)求在隊(duì)列中是按后進(jìn)先出的順序到調(diào)度器的(知乎真是一個(gè)好網(wǎng)站)
- scrapy xpath返回對(duì)象還想繼續(xù)調(diào)用xpath則不要調(diào)用extract
- Python疑點(diǎn)解答
- Scrapy設(shè)置定時(shí)任務(wù)
- scrapy spider配置pipeline
- deploy spider to scrapyd
python c:\Python27\Scripts\scrapyd-deploy <target> -p <project>
<target>:scrapy.cfg中[deploy:后的名字
<project>:項(xiàng)目名稱
- 命令scrapyd要到項(xiàng)目根目錄運(yùn)行才能啟動(dòng)(不足:不能定時(shí)執(zhí)行)
- apscheduler可以使用RotatingFileHandler按文件大小分割log
- logger.exception可以打印錯(cuò)誤堆棧
logger = logging.getLogger(name)
try:
...
except:
logger.exception('error')
16.scrapy.Request的dont_filter=True用來(lái)重復(fù)訪問(wèn)url(對(duì)登錄失敗后重試特別有用),scrapy默認(rèn)只對(duì)一個(gè)url訪問(wèn)一次(碰到特別具體的問(wèn)題還是得看官方文檔呀!哪怕是英文的!)