鑒定基因組倍性判斷方法:
1.smudgeplot
2.PloidyFrost
3.Survey軟件:genomescope2(根據(jù)survey峰圖判斷)
4.與近緣二倍體做比較,基因組大小,基因組共線性,基因共線性,如果是植物,比如有胚植物,可結合busco D值觀察,結合核型和HiC圖。
1.smudgeplot
下載地址:https://github.com/KamilSJaron/smudgeplot
conda裝,超級多依賴。
注意:目前已經(jīng)到v0.4.0版本,與0.1.2.5低版本存在較大改變,kmer 計數(shù)軟件改變,消耗內存較大,基因組較大時比較慢,可能大基因組會中斷。
2.ploidyfrost
下載地址:https://github.com/CMB-BNU/PloidyFrost
ploidyfrost: Reference-free estimation of ploidy level from whole genome sequencing data based on de Bruijn graphs
NGS數(shù)據(jù)分析:
mkdir kmc_tmp
export PATH=/share/nas1/pengzw/software/PloidyFrost/KMC/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bifrost/bin/:$PATH
thread=20
fq=fq.list
#/share/nas1/pengzw/project/01.data/Unknown_good_1.fq
#/share/nas1/pengzw/project/01.data/Unknown_good_2.fq
out_fq=out_fq.list
kmc -ci1 -cs10000 -k25 -t${thread} @${fq} kmc_db kmc_tmp
kmc_tools transform kmc_db histogram hist
lower_threshold=$(PloidyFrost cutoffL hist)
kmc_tools -t${thread} filter -hm kmc_db @${fq} -ci${lower_threshold} sample_filtered.fq
Bifrost build -i -d -k 25 -v -r sample_filtered.fq -o dbg -t ${thread}
PloidyFrost -g dbg.gfa -d kmc_db -t ${thread} -v -o multi -h hist
cd PloidyFrost_output
Rscript /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Filter.R -i multi -o multi-filtered -n 6 -s 11 -q 0.05
Rscript /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Drawfreq.R -f multi-filtered_allele_frequency.txt -t title -p 2 -o histogram
/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/PloidyFrost model -g multi-filtered_allele_frequency.txt -l 1 -u 10 -q 0.05 -o gmm
2.TGS數(shù)據(jù)分析
mkdir kmc_tmp
export PATH=/share/nas1/pengzw/software/PloidyFrost/KMC/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bifrost/bin/:$PATH
thread=20
fq=/share/nas1/pengzw/project/01.rawdata/ccs/.ccs.fasta.gz
kmc -ci1 -cs10000 -k25 -t${thread} -fa $fq kmc_db kmc_tmp
kmc_tools transform kmc_db histogram hist
lower_threshold=$(PloidyFrost cutoffL hist)
kmc_tools -t${thread} filter -hm kmc_db $fq -ci${lower_threshold} sample_filtered.fq
Bifrost build -i -d -k 25 -v -r sample_filtered.fq -o dbg -t ${thread}
PloidyFrost -g dbg.gfa -d kmc_db -t ${thread} -v -o single -h hist
cd PloidyFrost_output
Rscript /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Filter.R -i single -o single-filtered -n 6 -s 11 -q 0.05
Rscript /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Drawfreq.R -f single-filtered_allele_frequency.txt -t title -p 2 -o histogram
/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/PloidyFrost model -g single-filtered_allele_frequency.txt -l 1 -u 10 -q 0.05 -o gmm
根據(jù)文章中結果進行判斷:

3.Survey結果
多倍體會有多個峰,同源和異源的圖不同,單倍體等。
4.與二倍體比較
http://www.genomesize.com/search.php 植物基因組流式 (也可查核型)
https://cvalues.science.kew.org/ 植物核型網(wǎng)站1
https://taux.evolseq.net/CCDB_web 植物核型網(wǎng)站2
http://ploidb.tau.ac.il/ 植物核型網(wǎng)站3
http://legacy.tropicos.org/Project/IPCN 植物核型網(wǎng)站4
與2倍體做基因組共線性,看是否分成多組。
核型已知的情況下,看HiC是否分成多組有線的熱圖,這種一般可能是高雜合組裝多套。例如川芎的,淫羊藿的基因組文章
subphaser驗證
分成多組之后,subphaser驗證一下分組情況。
其他分析倍性的方法(沒用過,記錄一下):
HMMploidy:
https://peercommunityjournal.org/articles/10.24072/pcjournal.178/
ploidyNGS:
https://pubmed.ncbi.nlm.nih.gov/28383704/
nQuire:
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2128-z