「基因組」判斷物種倍性

鑒定基因組倍性判斷方法:

1.smudgeplot
2.PloidyFrost
3.Survey軟件:genomescope2(根據(jù)survey峰圖判斷)
4.與近緣二倍體做比較,基因組大小,基因組共線性,基因共線性,如果是植物,比如有胚植物,可結合busco D值觀察,結合核型和HiC圖。

1.smudgeplot

下載地址:https://github.com/KamilSJaron/smudgeplot
conda裝,超級多依賴。
注意:目前已經(jīng)到v0.4.0版本,與0.1.2.5低版本存在較大改變,kmer 計數(shù)軟件改變,消耗內存較大,基因組較大時比較慢,可能大基因組會中斷。

2.ploidyfrost

下載地址:https://github.com/CMB-BNU/PloidyFrost
ploidyfrost: Reference-free estimation of ploidy level from whole genome sequencing data based on de Bruijn graphs

NGS數(shù)據(jù)分析:

mkdir kmc_tmp 
export PATH=/share/nas1/pengzw/software/PloidyFrost/KMC/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bifrost/bin/:$PATH

thread=20
fq=fq.list
#/share/nas1/pengzw/project/01.data/Unknown_good_1.fq
#/share/nas1/pengzw/project/01.data/Unknown_good_2.fq
out_fq=out_fq.list
kmc -ci1 -cs10000 -k25 -t${thread} @${fq} kmc_db kmc_tmp
kmc_tools transform kmc_db histogram hist
lower_threshold=$(PloidyFrost cutoffL hist)
kmc_tools -t${thread} filter -hm kmc_db @${fq} -ci${lower_threshold} sample_filtered.fq


Bifrost build -i -d -k 25 -v -r sample_filtered.fq -o dbg -t ${thread}

PloidyFrost -g dbg.gfa -d kmc_db -t ${thread} -v -o multi -h hist
cd PloidyFrost_output
Rscript  /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Filter.R -i multi -o multi-filtered -n 6 -s 11 -q 0.05
Rscript /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Drawfreq.R -f multi-filtered_allele_frequency.txt -t title -p 2 -o histogram
/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/PloidyFrost model -g multi-filtered_allele_frequency.txt -l 1 -u 10 -q 0.05 -o gmm

2.TGS數(shù)據(jù)分析

mkdir kmc_tmp 
export PATH=/share/nas1/pengzw/software/PloidyFrost/KMC/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bifrost/bin/:$PATH

thread=20
fq=/share/nas1/pengzw/project/01.rawdata/ccs/.ccs.fasta.gz
kmc -ci1 -cs10000 -k25 -t${thread}  -fa $fq  kmc_db kmc_tmp
kmc_tools transform kmc_db histogram hist
lower_threshold=$(PloidyFrost cutoffL hist)
kmc_tools -t${thread} filter -hm kmc_db $fq -ci${lower_threshold} sample_filtered.fq


Bifrost build -i -d -k 25 -v -r sample_filtered.fq -o dbg -t ${thread}

PloidyFrost -g dbg.gfa -d kmc_db -t ${thread} -v -o single -h hist
cd PloidyFrost_output
Rscript  /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Filter.R -i single -o single-filtered -n 6 -s 11 -q 0.05
Rscript /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Drawfreq.R -f single-filtered_allele_frequency.txt -t title -p 2 -o histogram
/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/PloidyFrost model -g single-filtered_allele_frequency.txt -l 1 -u 10 -q 0.05 -o gmm

根據(jù)文章中結果進行判斷:


image.png

3.Survey結果

多倍體會有多個峰,同源和異源的圖不同,單倍體等。

4.與二倍體比較

http://www.genomesize.com/search.php 植物基因組流式 (也可查核型)
https://cvalues.science.kew.org/ 植物核型網(wǎng)站1
https://taux.evolseq.net/CCDB_web 植物核型網(wǎng)站2
http://ploidb.tau.ac.il/ 植物核型網(wǎng)站3
http://legacy.tropicos.org/Project/IPCN 植物核型網(wǎng)站4

與2倍體做基因組共線性,看是否分成多組。
核型已知的情況下,看HiC是否分成多組有線的熱圖,這種一般可能是高雜合組裝多套。例如川芎的,淫羊藿的基因組文章

subphaser驗證

分成多組之后,subphaser驗證一下分組情況。

其他分析倍性的方法(沒用過,記錄一下):
HMMploidy:
https://peercommunityjournal.org/articles/10.24072/pcjournal.178/

ploidyNGS:
https://pubmed.ncbi.nlm.nih.gov/28383704/

nQuire:
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2128-z

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

友情鏈接更多精彩內容