在使用CARD數(shù)據(jù)庫(kù)時(shí),針對(duì)宏基因組數(shù)據(jù),需要另外下載WildCARD_data,下載完成后需對(duì)數(shù)據(jù)進(jìn)行預(yù)處理,才可以使用。
在進(jìn)行數(shù)據(jù)預(yù)處理時(shí),出現(xiàn)了報(bào)錯(cuò)信息:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
wget -O wildcard_data.tar.bz2 https://card.mcmaster.ca/latest/variants
mkdir -p wildcard
tar -xvf wildcard_data.tar.bz2 -C wildcard
rgi wildcard_annotation -i wildcard --card_json ./card.json -v 3.0.8 > wildcard_annotation.log 2>&1
- 出現(xiàn)報(bào)錯(cuò),報(bào)錯(cuò)信息如下:
ERROR: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1:
invalid start byte in codecs.py line 322
- 查閱資料,發(fā)現(xiàn)是字符編碼的問題,0x8b說明是gzip壓縮過的數(shù)據(jù)
- 進(jìn)入wildcard文件夾,里面的fasta數(shù)據(jù)都是壓縮形式的:card-genomes.txt.gz
- 解決方法:
cd wildcard
gunzip *.gz
rgi wildcard_annotation -i wildcard --card_json ./card.json -v 3.0.8 > wildcard_annotation.log 2>&1
- WELL DONE