作者:黃成 時(shí)間:2018年04月09日
1. 安裝hanziconv
安裝一個(gè)簡(jiǎn)繁體轉(zhuǎn)換的包:
$ pip install hanziconv
2. 自定義一個(gè)itempiples
找到項(xiàng)目中的
pipelines.py文件-
添加自定義的pipeline:
from hanziconv import HanziConv class HanziconvPipeline(object): def process_item(self, item, spider): project_info = item['project_info'] for key, value in project_info.items(): if value is not None: if isinstance(value, unicode): value = HanziConv.toTraditional(str(value)) print key, value project_info[key] = value else: # 不為中文不處理 pass else: # value為None 初始化為空串 project_info[key] = "" return item此代碼為本人項(xiàng)目代碼,判斷value為unicode,則轉(zhuǎn)換為繁體;
若要將繁體轉(zhuǎn)換為簡(jiǎn)體,請(qǐng)將
toTraditional改為toSimplified。3. 配置項(xiàng)目pipeline
- 找到
settings.py中的ITEM_PIPELINES - 添加自定義的pipelines:
ITEM_PIPELINES = { 'scrapy_redis.pipelines.RedisPipeline': 400, '<project_name>.pipelines.HanziconvPipeline': 300 }:warning: <project_name>需手動(dòng)修改為自己的項(xiàng)目名稱(chēng)!
- 找到