2019-08-26 數(shù)據(jù)clean

PAS數(shù)據(jù)clean 使用命令

cd /data5/Cleanreads/seq_20181119up_1210down_Xten/result/raw && zcat 1_arabidopsis-WT_3-PAS.fq.gz | /public/software/exec/ActivePython-2.7.8.10/bin/cutadapt -a AGATCGGAAGAGC -m 17 --label raw -O 3 -e 0.1 -|fastx_trimmer -Q 33 -f 4 |fastq_quality_trimmer -Q 33 -t 20 -l 16 | fastq_quality_filter -Q 33 -q 20 -p 70 | /public/software/exec/ActivePython-2.7.8.10/bin/cutadapt -a N -m 16 -O 1 -N - | /public/software/exec/ActivePython-2.7.8.10/bin/cutadapt -a GGGGGGGGGG -O 1 -e 0.3 -m 16 -n 5 - > 1_arabidopsis-WT_3-PAS.fq.clean_fq && fastqc 1_arabidopsis-WT_3-PAS.fq.clean_fq && rm -rf 1_arabidopsis-WT_3-PAS.fq.clean_fq_fastqc.zip &&

cd /data5/Cleanreads/seq_20181119up_1210down_Xten/result/raw && zcat 2_arabidopsis-WT_3-PAS.fq.gz | /public/software/exec/ActivePython-2.7.8.10/bin/cutadapt -a AGATCGGAAGAGC -m 20 --label raw -O 3 -e 0.1 -| fastx_trimmer -Q 33 -t 3|fastq_quality_trimmer -Q 33 -t 20 -l 16|fastq_quality_filter -Q 33 -q 20 -p 70 |/public/software/exec/ActivePython-2.7.8.10/bin/cutadapt -a N -m 16 -O 1 -N - | /public/software/exec/ActivePython-2.7.8.10/bin/cutadapt -a GGGGGGGGGG -O 1 -e 0.3 -m 16 -n 5 - > 2_arabidopsis-WT_3-PAS.fq.clean_fq && fastqc 2_arabidopsis-WT_3-PAS.fq.clean_fq && rm -rf 2_arabidopsis-WT_3-PAS.fq.clean_fq_fastqc.zip &&


涉及軟件:

cutadapt

參數(shù)解釋:

-a?--adapter=ADAPTER

Sequence of an adapter that was ligated to the 3' end.The adapter itself and anything that follows is trimmed. If the adapter sequence ends with the '$' character, the adapter is anchored to the end of the read and only found if it is a suffix of the read.

-m LENGTH, --minimum-length=LENGTH

Discard trimmed reads that are shorter than LENGTH. Reads that are too short even before adapter removal are also discarded. In colorspace, an initial primer is not counted (default: 0).

-O LENGTH, --overlap=LENGTH

Minimum overlap length. If the overlap between the read and the adapter is shorter than LENGTH, the read is not modified. This reduces the no. of bases trimmed? purely due to short random adapter matches (default: 3).

?-e 最大錯配比例,比如cutadapt在某條序列上檢測的接頭有15bp長,那么允許這個匹配上的15bp接頭中有15*0.1約為1個堿基的錯配

?-m --minimum-length 切除接頭后的序列長度的最小值

?-O --overlap 默認必須至少有3個堿基匹配時才會認為是adapter序列,但有時可以適當?shù)恼{(diào)大

?--discard-trimmed 去除掉有檢測到接頭的序列(默認cutadapt只是截掉接頭序列以及接頭序列以后的序列)

--untrimmed-output 將沒有接頭的序列輸出到目標文件中(但是必須要跟-o 一起用)

--untrimmed-paired-output 將沒有接頭的paired序列輸出到目標文件中(也要跟-p 一起用)

?--pair-filter=(any|both) 這個參數(shù)很好用,對于雙端測序而言,read1和read2都有可能檢測到接頭。如果選擇any,則只要兩個中其中一個檢測到接頭,read1和read2均舍棄;如果選擇both,則必須兩個都檢測到接頭,read1和read2才舍棄


?fastx_trimmer [-h] [-f N] [-l N] [-t N] [-m MINLEN] [-z] [-v] [-i INFILE] [-o OUTFILE]從3'開始到5'哪些部分保留

[-f N] ? ? ? =?從第幾個堿基開始保留,默認第一個

[-l N] ? ? ? =?后面從第幾個堿基開始保留,默認全部堿基都保留.

[-t N] ? ? ? =序列尾部修剪掉N個堿基.

[-m MINLEN] ?=?修剪掉長度小于MINLEN的序列.


fastq_quality_trimmer [-h] [-v] [-t N] [-l N] [-z] [-i INFILE] [-o OUTFILE]?修剪reads的末端

[-t N] ? ? ? =?從5'端開始,低與N的質(zhì)量的堿基將被修剪掉

[-l N] ? ? ? =?修建之后的reads的長度允許的最短值

[-z] ? ? ? ? =?壓縮輸出

[-v] ? ? ? =詳細-報告序列編號,如果使用了-o則報告會直接在STDOUT,如果沒有則輸入到STDERR


3. fastq_quality_filter [-h] [-v] [-q N] [-p N] [-z] [-i INFILE] [-o OUTFILE]過濾低質(zhì)量序列

[-q N] ? ? ? =?最小的需要留下的質(zhì)量值

[-p N] ? ? ? =?每個reads中最少有百分之多少的堿基需要有-q的質(zhì)量值

? [-z] ? ? ? ? =壓縮輸出

? [-v] ? ? ? =詳細-報告序列編號,如果使用了-o則報告會直接在STDOUT,如果沒有則輸入到STDERR

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容