使用 GCE 進(jìn)行基因組大小評(píng)估
最近在嘗試做基因組大小的評(píng)估。嘗試了幾款軟件。今天介紹一下GCE的使用。該軟件GCE(Genome Characteristics Estimation) 是華大基因用于基因組評(píng)估的軟件。最早的版本(gce-1.0.0)發(fā)表于2012年, 其參考文獻(xiàn)為:Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects。時(shí)隔8年,終于更新啦!2020年更新版本為gce-1.0.2,軟件下載地址?ftp://ftp.genomics.org.cn/pub/gce。
以前GCE 軟件包中主要包含kmer_freq_hash 和 gce 兩支程序。前者用于進(jìn)行 kmer的頻數(shù)統(tǒng)計(jì),后者在前者的結(jié)果上進(jìn)行基因組大小的準(zhǔn)確估算。最近更新的版本(gce1.0.2)主要包含kmerfreq和gce兩支程序。程序使用參數(shù)有所變動(dòng)。
1.GCE 下載和安裝
wget
ftp://ftp.genomics.org.cn/pub/gce/gce-1.0.2
tar -xzvf gce.tar.gz
cd gce-1.0.2
make
出現(xiàn)make: Nothing to be done for 'all',上網(wǎng)查了一下,應(yīng)該是已經(jīng)編譯好了,可以直接使用。.
溫馨提示:記得添加環(huán)境變量
(1) gce
gce -h 可以看到其用法,如下圖:

Usage:? gce(genomiccharactor estimator) [option]
Version: 1.0.2
Author:? ?BGIShenZhen
-f?? ?? depth frequency file with two columns: depthvalue and kmer species number?#深度頻率文件共兩列:kmer重復(fù)次數(shù)和kmer種類數(shù)
-c?? ?? expected depth for unique kmer, which can be obtained by checking the data with human eyes?# 唯一kmer的期望深度
-g?? ?? total kmer number, i.e. total number of kmerindividuals #全部kmer數(shù)量
-b? ?? have bias(1) or not(0), default=0
-H? ?? use hybrid mode(1) or not(0), default=0 #使用雜合模式(1),不使用(0),默認(rèn)不使用。
-m ?? estimation mode: discrete mode(0) andcontinuous mode(1), default=0 #估算模型:離散型(0),連續(xù)型(1),默認(rèn)離散型。
-M ?? max depth value, information for larger depthwill be ignored, default=1500 #最大深度值,默認(rèn)1500,超過(guò)此數(shù)值的將被忽略,
-D? ?? precision of expect value, default=1
-d? ? ?difference cut off, default=0.0001
-i?? ?? iterate cycle number cut off, default=10000
,-h? this help
Example:
(1) Before run gce, firstly get the total kmer number anddepth frequency file from the kmerfreq result file (example: AF.kmer.freq.stat)
??? ?lessAF.kmer.freq.stat | grep "#Kmer indivdual number"
???? less AF.kmer.freq.stat | perl-ne 'next if(/^#/ || /^\s/); print; ' | awk '{print $1"\t"$2}' >AF.kmer.freq.stat.2colum
(2) Run gce in homozygous mode, suitable for homozygousand near-homozygous genome (-g and -f must be set at the same time)
?? ??gce-g 173854609857 -f AF.kmer.freq.stat.2colum >gce.table 2>gce.log
(3) Run gce in heterzygous mode, siutable forheterozgyous genome (-H and -c must be set at the same time)
???? gce -g 173854609857 -fAF.kmer.freq.stat.2colum -c 75 -H 1 >gce2.table 2>gce2.log

?(2)?kmerfreq?
kmerfreq? [options]
?Version 4.0
?? -k? kmer size, recommand value13 to 19, default=17
#設(shè)置 kmer 的大小。推薦該值為 13~19,默認(rèn)值為17
?? -f? input file format: 1:fq|gz(one-line), 2: fa|gz(one-line), default=1?
#輸入文件格式 1 fq|gz(one-line), 2: fa|gz(one-line),默認(rèn)為1
?? -p? output file prefix,default=reads_files.lib
#輸出文件前綴,默認(rèn)reads_files.lib
?? -r? number of reads stored inbuffer memory, default=10000
?? -t? thread number to use inparallel, default=10
#線程數(shù) 默認(rèn)10
?? -w? whether output kmer sequenceand frequency value, , 1:yes, 0:no, default=0
#是否輸出kmer序列和頻率值,1:yes, 0:no,默認(rèn)不輸出。
?? -c? kmer frequency cutoff, equalor larger will be output, co-used with -w, default=5
#kmer頻率終止值,等于或者大于默認(rèn)值的會(huì)被輸出。與-w一起使用,默認(rèn)值為5。
?? -m? whether output computermemory data, 1:yes, 0:no, default=0
#是否輸出計(jì)算機(jī)內(nèi)存數(shù)據(jù),1:yes, 0:no,默認(rèn)不輸出。
?? -q? kmer frequency cutoff, 0 forlower, 1 for equal and larger, co-used with -m,?default=5
#kmer頻率截止值,小于默認(rèn)值為0,等于或者大于默認(rèn)值為1,默認(rèn)值為5.
?? -h??????? get help information
#獲取幫助信息
Example: kmerfreq?reads_files.lib
????????kmerfreq? -k 17 -t 10 -p Ecoli_K17reads_files.lib
????????kmerfreq? -k 17 -t 10 -p Ecoli_K17-w 1 -c 5 reads_files.lib
????????kmerfreq? -k 17 -t 10 -p Ecoli_K17-m 1 -q 5 reads_files.lib
GCE命令行:
kmerfreq? -k 17 -t 10 -p Ecoli_K17reads_files.lib
less AF.kmer.freq.stat | grep "#Kmer indivdual number"
less AF.kmer.freq.stat | perl-ne 'next if(/^#/ || /^\s/); print; ' | awk '{print $1"\t"$2}' >AF.kmer.freq.stat.2colum
gce -g 173854609857 -f AF.kmer.freq.stat.2colum >gce.table 2>gce.log? ?or
gce -g 173854609857 -f AF.kmer.freq.stat.2colum -c 75 -H 1 >gce2.table2>gce2.log
2.結(jié)果展示(以kmer=17為例):

這里的基因組大小計(jì)算為: genome size=effective_kmer_individuals/coverage_depth=460468198.15143bp,即約460Mb。
參考:陳連福的生信博客:http://www.chenlianfu.com/?p=2335
? ? ? ??