基于 GetOrganelle 組裝葉綠體基因組

GetOrganelle是一款由中國科學(xué)院昆明植物研究所的金建軍和郁文彬兩位老師共同開發(fā)的質(zhì)體組裝軟件,主要用于從基因組測序數(shù)據(jù)中組裝完整的細(xì)胞器基因組,尤其擅長組裝植物質(zhì)體基因組。

需要調(diào)用的軟件包括SPAdes、Bowtie2、BLAST+、Bandage等。更詳細(xì)的內(nèi)容見?軟件官網(wǎng)。

安裝

個(gè)人不太習(xí)慣 conda 安裝,使用了非 conda 安裝流程:

## 下載?GetOrganelle 安裝包

curl -L https://github.com/Kinggerm/GetOrganelle/archive/1.7.4.1.tar.gz | tar zx

## 下載依賴環(huán)境

curl -L https://github.com/Kinggerm/GetOrganelleDep/releases/download/v1.7.0/v1.7.0-linux.tar.gz | tar zx

依賴環(huán)境為?SPAdes, Bowtie2, BLAST。

## 嘗試安裝:

cd GetOrganelle-1.7.4.1

python? set,py install

遇見如下報(bào)錯(cuò):

The following error occurred while trying to add or remove files in the installation directory:

[Errno 13] Permission denied: '/build/Cellar/anaconda2/lib/python2.7/site-packages/test-easy-install-367240.write-test'

The installation directory you specified (via --install-dir, --prefix, or the distutils default setting) was: /build/Cellar/anaconda2/lib/python2.7/site-packages/

## 默認(rèn)目錄下無權(quán)限,改到自己的文件夾下:

python? set,py install? --prefix=/my/file

遇見如下報(bào)錯(cuò):

error: bad install directory or PYTHONPATH

* You can choose a different installation directory, i.e., one that is on PYTHONPATH or supports .pth files

* You can add the installation directory to the PYTHONPATH environment variable.? (It must then also be on PYTHONPATH whenever you run Python and want to use the package(s) you are installing.)

* You can set up the installation directory to support ".pth" files by using one of the approaches described here:

https://setuptools.readthedocs.io/en/latest/easy_install.html#custom-installation-locations

將安裝目錄添加到PYTHONPATH環(huán)境變量中:

export PYTHONPATH="$PYTHONPATH:/my/file/"

而后再安裝:

python? set,py install? --prefix=/my/file

順利完成。之后記得將依賴環(huán)境以及本軟件的 bin 目錄配置到 .bashrc 文件內(nèi)。

試運(yùn)行

# 下載示例文件:

## 下載參考序列庫:

get_organelle_config.py--addembplant_pt,embplant_mt

## 下載重測序數(shù)據(jù) fq 文件:

wget https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.1.fq.gz

wget https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.2.fq.gz

## 組裝葉綠體基因組

get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10

參數(shù)詳解:

# -1 Arabidopsis_simulated.1.fq.gz Input file with the forward paired-end reads (*.fq/.gz/.tar.gz)

# -2 Arabidopsis_simulated.2.fq.gz Input file with the reverse paired-end reads (*.fq/.gz/.tar.gz)

# -t 1 Maximum threads to use. Default: 1

# -o Arabidopsis_simulated.plastome Output directory

# -F embplant_pt Target organelle genome type(s)

# -R 10 Maximum extension rounds

組裝失敗,有報(bào)錯(cuò):

......?

2024-06-08 19:04:25,434 - ERROR: sympy/scipy not available! Disentangling disabled!!

......

2024-06-08 17:47:03,893 - ERROR: Error with running SPAdes: == Error == system call for: "['/XX/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades-core', '/XX/GetOrganelle/example/Arabidopsis_simulated.plastome/seed/embplant_pt.initial.fq.spades/K45/configs/config.info']" finished abnormally, OS return value: 1

2024-06-08 17:47:03,894 - WARNING: Pre-assembling failed. The estimations for embplant_pt-hitting base-coverage and word size may be misleading.

......

2024-06-08 17:47:17,892 - WARNING: Compression after read correction will be skipped for lack of 'pigz'

2024-06-08 17:47:17,893 - INFO: spades.py -t 1? --disable-gzip-output --phred-offset 33 -1 Arabidopsis_simulated.plastome/extended_1_paired.fq -2 Arabidopsis_simulated.plastome/extended_2_paired.fq --s1 Arabidopsis_simulated.plastome/extended_1_unpaired.fq --s2 Arabidopsis_simulated.plastome/extended_2_unpaired.fq -k 21,55,85,115 -o Arabidopsis_simulated.plastome/extended_spades

2024-06-08 17:47:18,805 - ERROR: Error with running SPAdes: == Error ==? system call for: "['/XX/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades-hammer', '/XX/GetOrganelle/example/Arabidopsis_simulated.plastome/extended_spades/corrected/configs/config.info']" finished abnormally, OS return value: 1

2024-06-08 17:47:18,806 - ERROR: Assembling failed.

## 安裝 sympy 和 scipy?

pip install sympy scipy --prefix=/my/folder2

Requirement already satisfied: sympy in /build/Cellar/anaconda2/lib/python2.7/site-packages (1.3)

Requirement already satisfied: scipy in /build/Cellar/anaconda2/lib/python2.7/site-packages (1.2.1)

提示這兩個(gè)庫已經(jīng)安裝過了,但在運(yùn)行的時(shí)候仍提示:2024-06-08 19:04:25,434 - ERROR: sympy/scipy not available! Disentangling disabled!!

可能是前面改變了 PYTHONPATH,如果將之前的 export?PYTHONPATH 取消,則會出現(xiàn)新的報(bào)錯(cuò):

Traceback (most recent call last):

????File "/mnt/ge-jbod/zhanghongxiang/software/GetOrganelle/GetOrganelle-1.7.4.1/bin/get_organelle_from_reads.py", line 12, in <module>

? ? import GetOrganelleLib

ImportError: No module named GetOrganelleLib

解決辦法為同時(shí)添加兩個(gè) PYTHONPATH:

export PYTHONPATH="/path/to/folder1:/path/to/folder2"

## 安裝 pigz

wget https://github.com/madler/pigz/archive/refs/heads/master.zip

unzip?master.zip

cd??pigz-master

make

再運(yùn)行,還是報(bào)錯(cuò):

......

2024-06-08 18:44:42,484 - ERROR: Error with running SPAdes: == Error == system call for: "['/XX/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades-core', '/XX/GetOrganelle/example/Arabidopsis_simulated.plastome/seed/embplant_pt.initial.fq.spades/K45/configs/config.info']" finished abnormally, OS return value: 1

2024-06-08 18:44:42,485 - WARNING: Pre-assembling failed. The estimations for embplant_pt-hitting base-coverage and word size may be misleading.

......

2024-06-08 18:44:57,031 - ERROR: Error with running SPAdes: == Error == system call for: "['/XX/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades-hammer', '/XX/GetOrganelle/example/Arabidopsis_simulated.plastome/extended_spades/corrected/configs/config.info']" finished abnormally, OS return value: 1

2024-06-08 18:44:57,032 - ERROR: Assembling failed.

查了一下可能是?SPAdes 的問題,Github 上有人反映說改一個(gè)命令就行:

I asked server administrator and showed him my scripts, then it run successfully by removimg "srun" out from my code.&nbsp?

詳見 Github 上的討論

我嫌麻煩,將原 3.15.4 的版本替換為了 3.15.3,再次運(yùn)行不再報(bào)錯(cuò)。

2024-06-08 19:05:28,351 - INFO: Slimming Arabidopsis_simulated.plastome/extended_spades/K115/assembly_graph.fastg finished!

2024-06-08 19:05:28,352 - INFO: Slimming assembly graphs finished.




以上是我學(xué)習(xí)過程整理的隨手筆記,希望能幫到大家!如果有幫助,希望不吝點(diǎn)個(gè)贊,或者關(guān)注,也是對我的一個(gè)肯定或者鼓勵(lì)。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容