10.11 Trimmomatic學(xué)習(xí)和腳本練習(xí)

昨天到腳本雖然順利運(yùn)行,但是路徑太麻煩,沒(méi)有檢測(cè)輸出文件是否qc成功。


下面是師兄腳本里的語(yǔ)句。

判斷文件是否存在
判斷目錄是否存在if [ ! -d $文件夾]? mkdir -p 為創(chuàng)建子文件夾語(yǔ)句

PE模式,HiSeq PE測(cè)序:

$ java -jar /path/Trimmomatic/trimmomatic-0.36.jar PE -phred33 -trimlog logfile reads_1.fq.gz reads_2.fq.gz out.read_1.fq.gz out.trim.read_1.fq.gz out.read_2.fq.gz out.trim.read_2.fq.gz ILLUMINACLIP:/path/Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:5:20 LEADING:5 TRAILING:5 MINLEN:50

SE模式,HiSeq SE測(cè)序:

$ java -jar /path/Trimmomatic/trimmomatic-0.36.jar SE -phred33 -trimlog se.logfile raw_data/untreated.fq out.untreated.fq.gz ILLUMINACLIP:/path/Trimmomatic/adapters/TruSeq3-SE.fa:2:30:10 SLIDINGWINDOW:5:20 LEADING:5 TRAILING:5 MINLEN:50

在SE模式中,是不需要指定文件來(lái)存放被過(guò)濾掉的read信息的,后面直接就接Trimmer信息!這是需要注意到的一個(gè)地方。

ILLUMINACLIP,接頭序列切除參數(shù)。LLUMINACLIP:TruSeq3-PE.fa:2:30:10(省掉了路徑)意思分別是:TruSeq3-PE.fa是接頭序列,2是比對(duì)時(shí)接頭序列時(shí)所允許的最大錯(cuò)配數(shù);30指的是要求PE的兩條read同時(shí)和PE的adapter序列比對(duì),匹配度加起來(lái)超30%,那么就認(rèn)為這對(duì)PE的read含有adapter,并在對(duì)應(yīng)的位置需要進(jìn)行切除【注】。10和前面的30不同,它指的是,我就什么也不管,反正只要這條read的某部分和adpater序列有超過(guò)10%的匹配率,那么就代表含有adapter了,需要進(jìn)行去除;

鏈接:http://www.itdecent.cn/p/36891a89ed6e



Running Trimmomatic

Paired End Mode:

java -jar <path to trimmomatic.jar> PE [-threads <threads] [-phred33 | -phred64] [-trimlog <logFile>] <input 1> <input 2> <paired output 1> <unpaired output 1> <paired output 2> <unpaired output 2> <step 1> ...

or

java -classpath <path to trimmomatic jar> org.usadellab.trimmomatic.TrimmomaticPE [-threads <threads>] [-phred33 | -phred64] [-trimlog <logFile>] <input 1> <input 2> <paired output 1> <unpaired output 1> <paired output 2> <unpaired output 2> <step 1> ...

Single End Mode:

java -jar <path to trimmomatic jar> SE [-threads <threads>] [-phred33 | -phred64] [-trimlog <logFile>] <input> <output> <step 1> ...

or

java -classpath <path to trimmomatic jar> org.usadellab.trimmomatic.TrimmomaticSE [-threads <threads>] [-phred33 | -phred64] [-trimlog <logFile>] <input> <output> <step 1> ...

If no quality score is specified, phred-64 is the default. This will be changed to an 'autodetected' quality score in a future version.

Specifying a trimlog file creates a log of all read trimmings, indicating the following details:

the read name

the surviving sequence length

the location of the first surviving base, aka. the amount trimmed from the start

the location of the last surviving base in the original read

the amount trimmed from the end

Multiple steps can be specified as required, by using additional arguments at the end.

Most steps take one or more settings, delimited by ':' (a colon)

Step options:

ILLUMINACLIP::::

fastaWithAdaptersEtc: specifies the path to a fasta file containing all the adapters, PCR sequences etc. The naming of the various sequences within this file determines how they are used. See below.

seedMismatches: specifies the maximum mismatch count which will still allow a full match to be performed

palindromeClipThreshold: specifies how accurate the match between the two 'adapter ligated' reads must be for PE palindrome read alignment.

simpleClipThreshold: specifies how accurate the match between any adapter etc. sequence must be against a read.

SLIDINGWINDOW::

windowSize: specifies the number of bases to average across

requiredQuality: specifies the average quality required.

LEADING:

quality: Specifies the minimum quality required to keep a base.

TRAILING:

quality: Specifies the minimum quality required to keep a base.

CROP:

length: The number of bases to keep, from the start of the read.

HEADCROP:

length: The number of bases to remove from the start of the read.

MINLEN:

length: Specifies the minimum length of reads to be kept.

Trimming Order

Trimming occurs in the order which the steps are specified on the command line. It is recommended in most cases that adapter clipping, if required, is done as early as possible.


安裝fastqc
$ unzip fastqc_v0.11.5.zip

$cdFastQC?

$ chmod 755 fastqc

運(yùn)行$ /path_to_fastqc/FastQC/fastqc untreated.fq -o fastqc_out_dir/

命令比較簡(jiǎn)單,這里?唯一值得注意的地方就是 -o 參數(shù)用于指定FastQC報(bào)告的輸出目錄,這個(gè)目錄需要事先創(chuàng)建好,如果不指定特定的目錄,那么FastQC的結(jié)果會(huì)默認(rèn)輸出到文件untreated.fq的同一個(gè)目錄下。它輸出結(jié)果只有兩個(gè),一個(gè)html和一個(gè).zip壓縮包。

我們可以直接通過(guò)瀏覽器打開(kāi)html,就可以看到FastQC給出的所有結(jié)果,zip壓縮包解壓后,從中我們也可以在對(duì)應(yīng)的目錄下找到所有的QC圖表和Summary數(shù)據(jù)。

qc前的fastqc圖

qc后的序列fastqc圖

疑問(wèn):雙端測(cè)序的兩個(gè)文件都需要考慮到嗎?

Single-end、Paired-end主要區(qū)別在測(cè)序文庫(kù)的構(gòu)建方法上。

1、單端測(cè)序(Single-end)首先將DNA樣本進(jìn)行片段化處理形成200-500p的片段,引物序列連接到DNA片段的一端,然后末端加上接頭,將片段固定在flowcell上生成DNA簇,上機(jī)測(cè)序單端讀取序列。

2、Paired-end方法是指在構(gòu)建待測(cè)DNA文庫(kù)時(shí)在兩端的接頭上都加上測(cè)序引物結(jié)合位點(diǎn),在第一輪測(cè)序完成后,去除第一輪測(cè)序的模板鏈,用對(duì)讀測(cè)序模塊(Paried-End Module)引導(dǎo)互補(bǔ)鏈在原位置再生和擴(kuò)增,以達(dá)到第二輪測(cè)序所用的模板量,進(jìn)行第二輪互補(bǔ)鏈的合成測(cè)序。


最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • Introduction What is Bowtie 2? Bowtie 2 is an ultrafast a...
    wzz閱讀 6,162評(píng)論 0 5
  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,847評(píng)論 0 10
  • 早上好!#幸福實(shí)修#~每天進(jìn)步1%#幸福實(shí)修10班-22號(hào)-@金珠清--富陽(yáng) 20170821(28/30) 【幸...
    金珠清閱讀 290評(píng)論 1 1
  • 有句話,叫生活不能只有眼前的茍且,還有詩(shī)和遠(yuǎn)方。 老生常談的話題,叫做夢(mèng)想。高曉松曾說(shuō),人類在還沒(méi)有發(fā)明吃飯的碗之...
    七日江南閱讀 798評(píng)論 4 3
  • 可惡的雛田,居然欺負(fù)無(wú)辜的人。我和佐助都被她害得好慘。 佐助,抱佐助去了。佐助不喜歡別人抱他,我就是觸...
    溪境閱讀 1,029評(píng)論 0 0

友情鏈接更多精彩內(nèi)容