每次遇到BAM文件flag值都有一些困惑,時(shí)間越久越迷惑。
在此,針對(duì)BAM文件中的flag信息進(jìn)行梳理和解釋:
記錄于BAM文件的第2列,以bwa軟件比對(duì)結(jié)果為例
可以使用samtools查詢:
samtools view test.bam | cut -f2 | uniq
1024
1040
1089
1097
1105
1107
1121
1123
113
1137
1145
1153
1161
1169
117
1171
1185
1187
1201
1209
121
129
133
137
145
147
16
161
163
177
181
185
65
69
73
81
83
97
99
問題來了,這些數(shù)字的意義是什么呢?
首先可以參考SAM/BAM文件的解釋文檔:
http://samtools.sourceforge.net/SAMv1.pdf
其中,對(duì)于FLAG有如下說明:
FLAG: bitwise FLAG. Each bit is explained in the following table:
| Bit | Description |
| 0x1 | template having multiple segments in sequencing |
| 0x2 | each segment properly aligned according to the aligner |
| 0x4 | segment unmapped |
| 0x8 | next segment in the template unmapped |
| 0x10 | SEQ being reverse complemented |
| 0x20 | SEQ of the next segment in the template being reversed |
| 0x40 | the first segment in the template |
| 0x80 | the last segment in the template |
| 0x100 | secondary alignment |
| 0x200 | not passing quality controls |
| 0x400 | PCR or optical duplicate |
| 0x800 | supplementary alignment |
上述0x1, 0x2, …是十六進(jìn)制的數(shù)值與十進(jìn)制的數(shù)字截然不同。
對(duì)應(yīng)的十進(jìn)制數(shù)值描述如下:
| 十進(jìn)制 | 描述 |
| 1 | template having multiple segments in sequencing |
| 2 | each segment properly aligned according to the aligner |
| 4 | segment unmapped |
| 8 | next segment in the template unmapped |
| 16 | SEQ being reverse complemented |
| 32 | SEQ of the next segment in the template being reversed |
| 64 | the first segment in the template |
| 128 | the last segment in the template |
| 256 | secondary alignment |
| 512 | not passing quality controls |
| 1024 | PCR or optical duplicate |
| 2048 | supplementary alignment |
回過頭來看,比如16和1024分別是比對(duì)到互補(bǔ)鏈的片段,對(duì)于1024指的是PCR重復(fù)片段。
那其他數(shù)字的含義呢,他們只是簡(jiǎn)單數(shù)字組合而已,例如:1040是1024 + 16,Read比對(duì)到反義鏈且是一個(gè)PCR重復(fù),簡(jiǎn)單的數(shù)字相加而已。
也可以借助flag解釋鏈接來解析上述數(shù)字的含義,如把1040輸入到該網(wǎng)站會(huì)返回:
“read reverse strand”和“read is PCR or optical duplicate”。
不過,SAM說明文檔中FLAG的代號(hào)均使用按位符號(hào)顯示。bit是信息的基本單元且只有2個(gè)數(shù)值,1和0。
這誰能搞的懂???!
直接用linux bc轉(zhuǎn)換吧:
#bam flag 1040
echo 'obase=2;1040' | bc
10000010000
按下表對(duì)10000010000從右到左依次讀?。?/p>

故BAM flag轉(zhuǎn)換為元字符,輕松獲取各種類型BAM flag值背后的信息。
參考資料
https://davetang.org/muse/2014/03/06/understanding-bam-flags/