之前一貫使用的是BWA,根據(jù)文獻(xiàn)描述,BWA速度比較慢但是比較精確,Bowtie比較快消耗的內(nèi)存也比較小。目前用自己的8G筆記本,使用Bowtie比對(duì)二代測(cè)序reads到人類(lèi)基因組上,的確是可以跑得開(kāi)的。
首先,先區(qū)別一下這些大名鼎鼎的比對(duì)軟件。
TopHat: a fast splice junction mapper for RNA-seq reads
Cufflinks: a tool for transcriptome assembly and isoform quantitiation from RNA-seq reads
Crossbow: a cloud-enabled software tool for analyzing resequencing data
Myrna: a cloud-enabled software tool for aligning RNA-seq reads and measuring differential gene expression
Bowtie2 參數(shù):
參數(shù)如下:
第一步,是將你的reference進(jìn)行index
Bowtie2結(jié)果文件 SAM格式解析:
SAM被tab鍵分割成12個(gè)列,tab分割有利于用shell腳本直接處理。當(dāng)然SAMtools也可以承擔(dān)一些工作。
1??? 比對(duì)到參考基因組上的reads的ID
2??? 進(jìn)行標(biāo)注的Flag值:1.這個(gè)reads是paired reads里面的一個(gè); 2.這個(gè)比對(duì)是paired-end比對(duì)中的一端;4.這個(gè)read,沒(méi)有任何比對(duì)上的結(jié)果;8.這個(gè)read是pair里面的一個(gè),并且沒(méi)有比對(duì)上;16.比對(duì)到了反義鏈上;32.另外一條read比對(duì)到了反義鏈;64.它是pair里面的第一條;128.它是pair里面的第二條。
把這些條件進(jìn)行加和:比如83=64+16+2+1,代表paired-end序列的第一條read,并且比對(duì)到了反義鏈上。
3??? 比對(duì)到基因組的位置的染色體或者scaffold
4??? 以正義鏈來(lái)算,比對(duì)上的最左邊的那個(gè)位置的bp數(shù)
5??? 比對(duì)的質(zhì)量值
6? ? CIGAR string representation of alignment???應(yīng)該是代表多少個(gè)Match多少個(gè)Mismatch
7??? 參考基因組被比上的序列,如果完全相同就是=,如果沒(méi)比上就是*
8???? 這個(gè)read的另一個(gè)pair的read比對(duì)上的最左邊的第一個(gè)氨基酸
9???? 它的pair read發(fā)生的位置在上下游的多少bp數(shù),正為下游,負(fù)數(shù)為上游。
10? ? read sequence(reverse-complemented if aligned to the reverse strand)
11??? ASCII碼標(biāo)注的質(zhì)量
12??? 附加信息
AS:i:<N> 比對(duì)分?jǐn)?shù)
XS:i:<N>在出現(xiàn)比對(duì)到多個(gè)位置的情況下,最佳比對(duì)的分?jǐn)?shù)
用于Uniq過(guò)濾的參數(shù)!
YS:i:<N> 那個(gè)pair mate的比對(duì)分?jǐn)?shù)
XN:i:<N> 多少個(gè)可疑的base
XM:i:<N> Mismatch的個(gè)數(shù)
XO:i:<N> gap open的個(gè)數(shù)? XE 是gap extension的個(gè)數(shù)
YF:Z:<S> read被過(guò)濾掉的原因。。。
NM:i:<N> The edit distance; that is, the minimal number of one-nucleotide edits
(substitutions, insertions and deletions) needed to transform the read
string into the reference string.? Only present if SAM record is for an
aligned read.
YT:Z:<S> Value of `UU` indicates the read was not part of a pair.? Value of `CP`
indicates the read was part of a pair and the pair aligned concordantly.
Value of `DP` indicates the read was part of a pair and the pair aligned
discordantly.? Value of `UP` indicates the read was part of a pair but the
pair failed to aligned either concordantly or discordantly.
MD : Z : <S>
A string representation of the mismatched reference bases in the alignment.
See [SAM] format specification for details.? Only present if SAM record is
for an aligned read.