Fastp:過濾二代測序數(shù)據(jù)

導(dǎo)讀

Fastp能檢測和去除adapter,PE序列overlap區(qū)堿基矯正,slide window修剪頭尾,polyG/X尾修剪,UMI預(yù)處理。多功能合一,速度快,結(jié)果好,生成可讀報(bào)表。Fastp完全可以代替Trimmomatic, FastQC, Cutadapt, AfterQC, SOAPnuke。

Fastp文章

標(biāo)題:fastp: an ultra-fast all-in-one FASTQ preprocessor
中文:超快的多合一fastq數(shù)據(jù)預(yù)處理器
雜志:Bioinformatics
引用:2414 (谷歌學(xué)術(shù)2021.11.18)

工作流程

速度快

去adapter更準(zhǔn)確、高效

匹配hg19人參考基因組mismatch base, clip read, single-read map 最少

高效預(yù)處理UMI

fastp地址

Github: https://github.com/OpenGene/fastp

安裝Fastp

conda create -n readqc
conda install fastp
fastp --version
# fastp 0.23.1

運(yùn)行Fastp

conda activate readqc
time fastp \
--in1 ./input/E100032181_L01_29_1.fq.gz \
--in2 ./input/E100032181_L01_29_2.fq.gz \
--out1 ./fastp/E100032181_L01_29_1.fq.gz \
--out2 ./fastp/E100032181_L01_29_2.fq.gz \
--json ./fastp/fastp.json \
--html ./fastp/fastp.html \
--trim_poly_g --poly_g_min_len 10 \
--trim_poly_x --poly_x_min_len 10 \
--cut_front --cut_tail --cut_window_size 4 \
--qualified_quality_phred 15 \
--low_complexity_filter \
--complexity_threshold 30 \
--length_required 30 \
--thread 4

參數(shù)

--trim_poly_g  切ployG
--poly_g_min_len 10  最短為10bp
--trim_poly_x  切ployX
--poly_x_min_len 10 最短為10bp
--cut_front  從5端掃描
--cut_tail  從3端掃描
--cut_window_size 4  窗口設(shè)為4bp
--cut_mean_quality 20 窗口內(nèi)最低平均堿基質(zhì)量值為20
--qualified_quality_phred 15  最低堿基質(zhì)量值15
--low_complexity_filter  啟動(dòng)過濾低復(fù)雜序列
--complexity_threshold 30  復(fù)雜度閾值為30%
--length_required 30  切后最短長度閾值30bp

過程

Read1 before filtering:
total reads: 68871423
total bases: 6887142300
Q20 bases: 6788565208(98.5687%)
Q30 bases: 6516393608(94.6168%)

Read2 before filtering:
total reads: 68871423
total bases: 6887142300
Q20 bases: 6752497708(98.045%)
Q30 bases: 6459072061(93.7845%)

Read1 after filtering:
total reads: 68870151
total bases: 6579451130
Q20 bases: 6490255475(98.6443%)
Q30 bases: 6233038928(94.7349%)

Read2 after filtering:
total reads: 68870151
total bases: 6570653779
Q20 bases: 6449906989(98.1623%)
Q30 bases: 6173217216(93.9513%)

Filtering result:
reads passed filter: 137740302
reads failed due to low quality: 32
reads failed due to too many N: 936
reads failed due to too short: 1480
reads failed due to low complexity: 96
reads with adapter trimmed: 24272074
bases trimmed due to adapters: 604687721
reads with polyX in 3' end: 698520
bases trimmed in polyX tail: 6954246

Duplication rate: 69.3962%

Insert size peak (evaluated by paired-end reads): 141

JSON report: ./fastp/fastp.json
HTML report: ./fastp/fastp.html

fastp --in1 ./input/E100032181_L01_29_1.fq.gz --in2 ./input/E100032181_L01_29_2.fq.gz --out1 ./fastp/E100032181_L01_29_1.fq.gz --out2 ./fastp/E100032181_L01_29_2.fq.gz --json ./fastp/fastp.json --html ./fastp/fastp.html --trim_poly_x --poly_x_min_len 10 --cut_front --cut_tail --cut_window_size 4 --qualified_quality_phred 15 --low_complexity_filter --complexity_threshold 30 --length_required 30 --thread 4
fastp v0.23.1, time used: 567 seconds

real    9m28.522s
user    39m31.517s
sys     0m37.690s

Fastp結(jié)果

結(jié)果html例:http://opengene.org/fastp/fastp.html
結(jié)果json例:http://opengene.org/fastp/fastp.json

更多:
2000+引用的fastp推出重磅更新,再提速一倍!
生信軟件工具-fastp
測序數(shù)據(jù)質(zhì)控和預(yù)處理之fastp
UMI的處理
UMI-unique molecular identifiers

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容