fastp軟件參數(shù)解讀

usage: fastp -i <in1> -o <out1> [-I <in1> -O <out2>] [options...]

options:

? ## I/O options? 即輸入輸出文件設(shè)置

? -i, --in1? ? ? ? ? ? ? ? ? ? ? ? ? read1 input file name (string)? #輸入read1文件

? -o, --out1? ? ? ? ? ? ? ? ? ? ? ? read1 output file name (string [=]) #輸出read1文件

? -I, --in2? ? ? ? ? ? ? ? ? ? ? ? ? read2 input file name (string [=]) #輸入read2文件

? -O, --out2? ? ? ? ? ? ? ? ? ? ? ? read2 output file name (string [=]) #輸出read2文件

? -6, --phred64? ? ? ? ? ? ? ? ? ? ? indicates the input is using phred64 scoring (it'll be converted to phred33, so the output will still be phred33) #Phred+64 質(zhì)量字符的ASCII值 - 64,Phred+64所使用的字符的ASCII值都大于等于59,字符的ASCII值都小于59使用phred33

? -z, --compression? ? ? ? ? ? ? ? ? compression level for gzip output (1 ~ 9). 1 is fastest, 9 is smallest, default is 2. (int [=2]) #輸出的壓縮級(jí)別(1-9),1是最快的,9是最小的,默認(rèn)設(shè)置是2

? ? --reads_to_process? ? ? ? ? ? ? specify how many reads/pairs to be processed. Default 0 means process all reads. (int [=0]) #指定要處理多少的reads/pairs,默認(rèn)是0,指的是處理全部讀數(shù)


? ## adapter trimming options? 過(guò)濾序列接頭參數(shù)設(shè)置

? -A, --disable_adapter_trimming? ? adapter trimming is enabled by default. If this option is specified, adapter trimming is disabled #一般默認(rèn)是自動(dòng)對(duì)原始數(shù)據(jù)去掉接頭的,如果選擇該選項(xiàng)則代表不去除接頭

? -a, --adapter_sequence? ? ? ? ? ? ? the adapter for read1. For SE data, if not specified, the adapter will be auto-detected. For PE data, this is used if R1/R2 are found not overlapped. (string [=auto])

#對(duì)于單端測(cè)序數(shù)據(jù)來(lái)說(shuō),這個(gè)選項(xiàng)是直接針對(duì)read1數(shù)據(jù)進(jìn)行接頭處理,如果是雙端測(cè)序數(shù)據(jù),則是針對(duì)那些R1/R2沒(méi)有重疊的reads的

? ? ? --adapter_sequence_r2? ? ? ? ? ? the adapter for read2 (PE data only). This is used if R1/R2 are found not overlapped. If not specified, it will be the same as <adapter_sequence> (string [=])

#對(duì)于單端測(cè)序數(shù)據(jù)來(lái)說(shuō),這個(gè)選項(xiàng)是直接針對(duì)read2數(shù)據(jù)進(jìn)行接頭處理,如果是雙端測(cè)序數(shù)據(jù),則是針對(duì)那些R1/R2沒(méi)有重疊的reads的


? ## global trimming options? 剪除序列起始和末端的低質(zhì)量堿基數(shù)量參數(shù)

? -f, --trim_front1? ? ? ? ? ? ? ? ? trimming how many bases in front for read1, default is 0 (int [=0]) #設(shè)置處理read1起始低質(zhì)量堿基數(shù)量,默認(rèn)是0

? -t, --trim_tail1? ? ? ? ? ? ? ? ? trimming how many bases in tail for read1, default is 0 (int [=0])?#設(shè)置處理read1末端低質(zhì)量堿基,默認(rèn)是0

? -F, --trim_front2? ? ? ? ? ? ? ? ? trimming how many bases in front for read2. If it's not specified, it will follow read1's settings (int [=0]) #設(shè)置處理read2起始低質(zhì)量堿基數(shù)量,默認(rèn)是0,如果沒(méi)有設(shè)置,將會(huì)按照read1的設(shè)置來(lái)

? -T, --trim_tail2? ? ? ? ? ? ? ? ? trimming how many bases in tail for read2. If it's not specified, it will follow read1's settings (int [=0])?#設(shè)置處理read2末端低質(zhì)量堿基,默認(rèn)是0,如果沒(méi)有設(shè)置,將會(huì)按照read1的設(shè)置來(lái)

?## polyG tail trimming, useful for NextSeq/NovaSeq data? polyG剪裁

? -g, --trim_poly_g? ? ? ? ? ? ? ? ? force polyG tail trimming, by default trimming is automatically enabled for Illumina NextSeq/NovaSeq data #默認(rèn)會(huì)對(duì)Illumina NextSeq/NovaSeq數(shù)據(jù)尾部進(jìn)行PolyG進(jìn)行處理

? ? ? --poly_g_min_len? ? ? ? ? ? ? ? the minimum length to detect polyG in the read tail. 10 by default. (int [=10]) #對(duì)尾部PolyG進(jìn)行處理的最小長(zhǎng)度,默認(rèn)是10

? -G, --disable_trim_poly_g? ? ? ? ? disable polyG tail trimming, by default trimming is automatically enabled for Illumina NextSeq/NovaSeq data #該選項(xiàng)的使用是不對(duì)尾部的PolyG進(jìn)行處理

? # polyX tail trimming

? -x, --trim_poly_x? ? ? ? ? ? ? ? ? ? enable polyX trimming in 3' ends. #截取3'末端polyX

? ? ? --poly_x_min_len? ? ? ? ? ? ? ? the minimum length to detect polyX in the read tail. 10 by default. (int [=10]) #檢測(cè)read末尾的polyX的長(zhǎng)度,默認(rèn)10;


? # per read cutting by quality options? 滑窗裁剪

? -5, --cut_by_quality5? ? ? ? ? ? ? enable per read cutting by quality in front (5'), default is disabled (WARNING: this will interfere deduplication for both PE/SE data) #從read的5'端至末尾移動(dòng)窗口,去除窗口中平均質(zhì)量值小于'<'閾值的堿基

? -3, --cut_by_quality3? ? ? ? ? ? ? enable per read cutting by quality in tail (3'), default is disabled (WARNING: this will interfere deduplication for SE data) #從read的3'端值至開(kāi)頭移動(dòng)窗口,去除窗口中平均質(zhì)量值小于'<'閾值的堿基;

? -W, --cut_window_size? ? ? ? ? ? ? the size of the sliding window for sliding window trimming, default is 4 (int [=4])? #滑動(dòng)窗口過(guò)濾,這個(gè)類似于計(jì)算kmer,1~1000, 默認(rèn)是4個(gè)堿基作為窗口大??;

? -M, --cut_mean_quality? ? ? ? ? ? the bases in the sliding window with mean quality below cutting_quality will be cut, default is Q20 (int [=20])? #選擇的窗口中,堿基平均質(zhì)量值,范圍1~36,默認(rèn)是Q20,如果這個(gè)區(qū)域窗口平均低于20,則認(rèn)為是一個(gè)低質(zhì)量區(qū)域,處理掉;

-r, --cut_right? ? ?#從read的開(kāi)頭到末尾移動(dòng)窗口,如果某一窗口的平均質(zhì)量值小于閾值,去除窗口中的堿基及其右側(cè)部分,并停止;


? # quality filtering options? 根據(jù)堿基質(zhì)量來(lái)過(guò)濾序列

? -Q, --disable_quality_filtering? ? quality filtering is enabled by default. If this option is specified, quality filtering is disabled #控制是否去除低質(zhì)量,默認(rèn)自動(dòng)去除,設(shè)置-Q關(guān)閉;

? -q, --qualified_quality_phred? ? ? the quality value that a base is qualified. Default 15 means phred quality >=Q15 is qualified. (int [=15])? #設(shè)置低質(zhì)量的標(biāo)準(zhǔn),默認(rèn)是15,也就是質(zhì)量值小于15認(rèn)為是低質(zhì)量堿基,一般我們?cè)O(shè)置20,常說(shuō)的Q20;

? -u, --unqualified_percent_limit? ? how many percents of bases are allowed to be unqualified (0~100). Default 40 means 40% (int [=40]) #低質(zhì)量堿基所占百分比,并不是包含低質(zhì)量堿基就把一條reads丟掉,而是設(shè)置一定的比例,默認(rèn)40代表40%,也就是150bpreads,包含60個(gè)以上低質(zhì)量的堿基就丟掉,只要有一條reads不滿足條件就成對(duì)丟掉;

? -n, --n_base_limit? ? ? ? ? ? ? ? if one read's number of N base is >n_base_limit, then this read/pair is discarded. Default is 5 (int [=5])? #過(guò)濾N堿基過(guò)多的reads,如果N堿基含量大于n,這條read/pair將被舍棄,默認(rèn)5;


? # length filtering options? 根據(jù)序列長(zhǎng)度來(lái)過(guò)濾序列

? -L, --disable_length_filtering? ? length filtering is enabled by default. If this option is specified, length filtering is disabled #關(guān)閉reads長(zhǎng)度過(guò)濾選項(xiàng);

? -l, --length_required? ? ? ? ? ? ? reads shorter than length_required will be discarded, default is 15. (int [=15]) #接一個(gè)長(zhǎng)度值,小于這個(gè)長(zhǎng)度reads被丟掉,默認(rèn)是15,這個(gè)在處理非illumina測(cè)序數(shù)據(jù)時(shí)很有用。

? # low complexity filtering??低復(fù)雜度過(guò)濾

? -y, --low_complexity_filter? ? ? ? ? enable low complexity filter. The complexity is defined as the percentage of base that is different from its next base (base[i] != base[i+1]). #使用低復(fù)雜度過(guò)濾,這里低復(fù)雜度的定義是與其下一個(gè)堿基不同的堿基比例(base[i] != base[i+1]).

? -Y, --complexity_threshold? ? ? ? ? the threshold for low complexity filter (0~100). Default is 30, which means 30% complexity is required. (int [=30]) #低復(fù)雜度的閾值(0~100),默認(rèn)30;

? # filter reads with unwanted indexes (to remove possible contamination)?根據(jù)indexes過(guò)濾reads--刪除可能的污染

? ? ? --filter_by_index1? ? ? ? ? ? ? specify a file contains a list of barcodes of index1 to be filtered out, one barcode per line (string [=])

? ? ? --filter_by_index2? ? ? ? ? ? ? specify a file contains a list of barcodes of index2 to be filtered out, one barcode per line (string [=])

? ? ? --filter_by_index_threshold? ? ? the allowed difference of index barcode for index filtering, default 0 means completely identical. (int [=0])

? # base correction by overlap analysis options? 通過(guò)overlap來(lái)校正堿基

? -c, --correction? ? ? ? ? ? ? ? ? enable base correction in overlapped regions (only for PE data), default is disabled #是對(duì)overlap的區(qū)域進(jìn)行糾錯(cuò),所以只適用于pairend reads。


? # UMI processing?分子標(biāo)簽處理

? -U, --umi? ? ? ? ? ? ? ? ? ? ? ? ? enable unique molecular identifer (UMI) preprocessing

? ? ? --umi_loc? ? ? ? ? ? ? ? ? ? ? specify the location of UMI, can be (index1/index2/read1/read2/per_index/per_read, default is none (string [=])

? ? ? --umi_len? ? ? ? ? ? ? ? ? ? ? if the UMI is in read1/read2, its length should be provided (int [=0])

? ? ? --umi_prefix? ? ? ? ? ? ? ? ? if specified, an underline will be used to connect prefix and UMI (i.e. prefix=UMI, UMI=AATTCG, final=UMI_AATTCG). No prefix by default (string [=])

? ? ? --umi_skip? ? ? ? ? ? ? ? ? ? ? if the UMI is in read1/read2, fastp can skip several bases following UMI, default is 0 (int [=0])

? # overrepresented sequence analysis

? -p, --overrepresentation_analysis? ? enable overrepresented sequence analysis.

? -P, --overrepresentation_sampling? ? One in (--overrepresentation_sampling) reads will be computed for overrepresentation analysis (1~10000), smaller is slower, default is 20. (int [=20])

? # reporting options

? -j, --json? ? ? ? ? ? ? ? ? ? ? ? the json format report file name (string [=fastp.json]) #輸出json格式報(bào)告文件名(string [=fastp.json])

? -h, --html? ? ? ? ? ? ? ? ? ? ? ? the html format report file name (string [=fastp.html]) #輸出html 格式報(bào)告文件名

? -R, --report_title? ? ? ? ? ? ? ? should be quoted with ' or ", default is "fastp report" (string [=fastp report])


? # threading options? 設(shè)置線程數(shù)

? -w, --thread? ? ? ? ? ? ? ? ? ? ? worker thread number, default is 3 (int [=3]) #使用線程數(shù),默認(rèn)是3(int [=3])


? # output splitting options?控制split選項(xiàng),有時(shí)候單條reads文件太大,可以分割為多份分別比對(duì),在合并bam結(jié)果,這樣可以提高效率。

? -s, --split? ? ? ? ? ? ? ? ? ? ? ? split output by limiting total split file number with this option (2~999), a sequential number prefix will be added to output name ( 0001.out.fq, 0002.out.fq...), disabled by default (int [=0]) #切割數(shù)目(2~999),默認(rèn)是0,不分割

? -S, --split_by_lines? ? ? ? ? ? ? split output by limiting lines of each file with this option(>=1000), a sequential number prefix will be added to output name ( 0001.out.fq, 0002.out.fq...), disabled by default (long [=0])

? -d, --split_prefix_digits? ? ? ? ? the digits for the sequential number padding (1~10), default is 4, so the filename will be padded as 0001.xxx, 0 to disable padding (int [=4]) #輸出前綴位數(shù),默認(rèn)是4,0001,0002這種命名,如果設(shè)置為3,就是001,002這種


? # help

? -?, --help? ? ? ? ? ? ? ? ? ? ? ? print this message

fastp參數(shù)日常使用?

1、接頭處理

fastp默認(rèn)啟用了接頭處理,但是可以使用-A命令來(lái)關(guān)掉。fastp可以自動(dòng)化地查找接頭序列并進(jìn)行剪裁,也就是說(shuō)你可以不輸入任何的接頭序列,fastp全自動(dòng)搞定了!對(duì)于SE數(shù)據(jù),你還是可以-a參數(shù)來(lái)輸入你的接頭,而對(duì)于PE數(shù)據(jù)則完全沒(méi)有必要,fastp基于PE數(shù)據(jù)的overlap分析可以更準(zhǔn)確地查找接頭,去得更干凈,而且對(duì)于一些接頭本身就有堿基不匹配情況處理得更好。fastp對(duì)于接頭去除會(huì)有一個(gè)匯總的報(bào)告。

2、全局裁剪

fastp可以對(duì)所有read在頭部和尾部進(jìn)行統(tǒng)一剪裁,該功能在去除一些測(cè)序質(zhì)量不好的cycle比較有用,比如151*2的PE測(cè)序中,最后一個(gè)cycle通常質(zhì)量是非常低的,需要剪裁掉。使用-f和-t分別指定read1的頭部和尾部的剪裁,使用-F和-T分別指定read2的頭部和尾部的剪裁。

3、滑窗質(zhì)量剪裁

很多時(shí)候,一個(gè)read的低質(zhì)量序列都是集中在read的末端,也有少部分是在read的開(kāi)頭。fastp支持像Trimmomatic那樣對(duì)滑動(dòng)窗口中的堿基計(jì)算平均質(zhì)量值,然后將不符合的滑窗直接剪裁掉。使用-5參數(shù)開(kāi)啟在5’端,也就是read的開(kāi)頭的剪裁,使用-3參數(shù)開(kāi)啟在3’端,也就是read的末尾的剪裁。使用-W參數(shù)指定滑動(dòng)窗大小,默認(rèn)是4,使用-M參數(shù)指定要求的平均質(zhì)量值,默認(rèn)是20,也就是Q20。

4、過(guò)濾過(guò)短序列

默認(rèn)開(kāi)啟多序列過(guò)濾,默認(rèn)值為15,使用-L(--disable_length_filtering)禁止此默認(rèn)選項(xiàng)?;蚴褂?l(--length_required)自定義最短序列。

5、校正堿基(用于雙端測(cè)序)

fastp支持對(duì)PE數(shù)據(jù)的每一對(duì)read進(jìn)行分析,查找它們的overlap區(qū)間,然后對(duì)于overlap區(qū)間中不一致的堿基,如果發(fā)現(xiàn)其中一個(gè)質(zhì)量非常高,而另一個(gè)非常低,則可以將非常低質(zhì)量的堿基改為相應(yīng)的非常高質(zhì)量值的堿基值。此選項(xiàng)默認(rèn)關(guān)閉,可使用-c(--correction)開(kāi)啟。

6、質(zhì)量過(guò)濾

fastp可以對(duì)低質(zhì)量序列,較多N的序列,該功能默認(rèn)是啟用的,但可以使用-Q參數(shù)關(guān)閉。使用-q參數(shù)來(lái)指定合格的phred質(zhì)量值,比如-q 15表示質(zhì)量值大于等于Q15的即為合格,然后使用-u參數(shù)來(lái)指定最多可以有多少百分比的質(zhì)量不合格堿基。比如-q 15 -u 40表示一個(gè)read最多只能有40%的堿基的質(zhì)量值低于Q15,否則會(huì)被扔掉。使用-n可以限定一個(gè)read中最多能有多少個(gè)N。

出處鏈接:http://www.itdecent.cn/p/6f492058da5b

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容