導(dǎo)讀
Fastp能檢測和去除adapter,PE序列overlap區(qū)堿基矯正,slide window修剪頭尾,polyG/X尾修剪,UMI預(yù)處理。多功能合一,速度快,結(jié)果好,生成可讀報(bào)表。Fastp完全可以代替Trimmomatic, FastQC, Cutadapt, AfterQC, SOAPnuke。
Fastp文章
標(biāo)題:fastp: an ultra-fast all-in-one FASTQ preprocessor
中文:超快的多合一fastq數(shù)據(jù)預(yù)處理器
雜志:Bioinformatics
引用:2414 (谷歌學(xué)術(shù)2021.11.18)
工作流程

速度快

去adapter更準(zhǔn)確、高效

匹配hg19人參考基因組mismatch base, clip read, single-read map 最少

高效預(yù)處理UMI

fastp地址
Github: https://github.com/OpenGene/fastp
安裝Fastp
conda create -n readqc
conda install fastp
fastp --version
# fastp 0.23.1
運(yùn)行Fastp
conda activate readqc
time fastp \
--in1 ./input/E100032181_L01_29_1.fq.gz \
--in2 ./input/E100032181_L01_29_2.fq.gz \
--out1 ./fastp/E100032181_L01_29_1.fq.gz \
--out2 ./fastp/E100032181_L01_29_2.fq.gz \
--json ./fastp/fastp.json \
--html ./fastp/fastp.html \
--trim_poly_g --poly_g_min_len 10 \
--trim_poly_x --poly_x_min_len 10 \
--cut_front --cut_tail --cut_window_size 4 \
--qualified_quality_phred 15 \
--low_complexity_filter \
--complexity_threshold 30 \
--length_required 30 \
--thread 4
參數(shù)
--trim_poly_g 切ployG
--poly_g_min_len 10 最短為10bp
--trim_poly_x 切ployX
--poly_x_min_len 10 最短為10bp
--cut_front 從5端掃描
--cut_tail 從3端掃描
--cut_window_size 4 窗口設(shè)為4bp
--cut_mean_quality 20 窗口內(nèi)最低平均堿基質(zhì)量值為20
--qualified_quality_phred 15 最低堿基質(zhì)量值15
--low_complexity_filter 啟動(dòng)過濾低復(fù)雜序列
--complexity_threshold 30 復(fù)雜度閾值為30%
--length_required 30 切后最短長度閾值30bp
過程
Read1 before filtering:
total reads: 68871423
total bases: 6887142300
Q20 bases: 6788565208(98.5687%)
Q30 bases: 6516393608(94.6168%)
Read2 before filtering:
total reads: 68871423
total bases: 6887142300
Q20 bases: 6752497708(98.045%)
Q30 bases: 6459072061(93.7845%)
Read1 after filtering:
total reads: 68870151
total bases: 6579451130
Q20 bases: 6490255475(98.6443%)
Q30 bases: 6233038928(94.7349%)
Read2 after filtering:
total reads: 68870151
total bases: 6570653779
Q20 bases: 6449906989(98.1623%)
Q30 bases: 6173217216(93.9513%)
Filtering result:
reads passed filter: 137740302
reads failed due to low quality: 32
reads failed due to too many N: 936
reads failed due to too short: 1480
reads failed due to low complexity: 96
reads with adapter trimmed: 24272074
bases trimmed due to adapters: 604687721
reads with polyX in 3' end: 698520
bases trimmed in polyX tail: 6954246
Duplication rate: 69.3962%
Insert size peak (evaluated by paired-end reads): 141
JSON report: ./fastp/fastp.json
HTML report: ./fastp/fastp.html
fastp --in1 ./input/E100032181_L01_29_1.fq.gz --in2 ./input/E100032181_L01_29_2.fq.gz --out1 ./fastp/E100032181_L01_29_1.fq.gz --out2 ./fastp/E100032181_L01_29_2.fq.gz --json ./fastp/fastp.json --html ./fastp/fastp.html --trim_poly_x --poly_x_min_len 10 --cut_front --cut_tail --cut_window_size 4 --qualified_quality_phred 15 --low_complexity_filter --complexity_threshold 30 --length_required 30 --thread 4
fastp v0.23.1, time used: 567 seconds
real 9m28.522s
user 39m31.517s
sys 0m37.690s
Fastp結(jié)果

結(jié)果html例:http://opengene.org/fastp/fastp.html
結(jié)果json例:http://opengene.org/fastp/fastp.json
更多:
2000+引用的fastp推出重磅更新,再提速一倍!
生信軟件工具-fastp
測序數(shù)據(jù)質(zhì)控和預(yù)處理之fastp
UMI的處理
UMI-unique molecular identifiers