基因組比對(duì)的結(jié)果顯示比對(duì)效率非常低,現(xiàn)在用轉(zhuǎn)錄本比對(duì),比較一下二者的差異。
且基因組比對(duì)一般適用于:新發(fā)現(xiàn)的轉(zhuǎn)錄本(novel transcripts)或可變剪切(alternative splicing)。根據(jù)我的實(shí)驗(yàn)?zāi)康模篟IP,轉(zhuǎn)錄本比對(duì)才是更好的選擇。
一、用原始數(shù)據(jù)比對(duì)
- 將Reads比對(duì)到小鼠轉(zhuǎn)錄組
Minimap2 是長(zhǎng)數(shù)據(jù)(nanopore測(cè)序數(shù)據(jù))的最優(yōu)選擇
minimap2 -ax splice ../../../../genome/mouse_transcripts/gencode.vM36.transcripts.fa ../BC07.fastq > BC07.sam
- 將sam文件格式轉(zhuǎn)換為bam二進(jìn)制文件格式,排序并進(jìn)行索引
samtools view -S -b BC07.sam > BC07.bam
samtools sort BC07.bam -o BC07_sorted.bam
samtools index BC07_sorted.bam
- 檢查比對(duì)質(zhì)量
samtools flagstat BC07_sorted.bam
1112070 + 0 in total (QC-passed reads + QC-failed reads)
802807 + 0 primary
301965 + 0 secondary
7298 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
658093 + 0 mapped (59.18% : N/A)
348830 + 0 primary mapped (43.45% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
samtools flagstat BC08_sorted.bam
642396 + 0 in total (QC-passed reads + QC-failed reads)
479463 + 0 primary
161361 + 0 secondary
1572 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
318593 + 0 mapped (49.59% : N/A)
155660 + 0 primary mapped (32.47% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
可以看到用轉(zhuǎn)錄本比對(duì)的比對(duì)效率遠(yuǎn)遠(yuǎn)高于基因組比對(duì),雖然沒(méi)有特別高(59.18%和49.59%)。
師兄說(shuō):primary mapped比例都很高,說(shuō)明1條read只比對(duì)到一個(gè)轉(zhuǎn)錄本上的比例更好,會(huì)使準(zhǔn)確率更高,是個(gè)好事情,這也是三代測(cè)序長(zhǎng)read的優(yōu)勢(shì)所在。
4. IGV可視化報(bào)錯(cuò)
想要和基因組比對(duì)一樣,在IGV上進(jìn)行可視化,但是總是因?yàn)?bam文件格式不正確而報(bào)錯(cuò):

這個(gè)報(bào)錯(cuò)我一直解決不了,不知道是不是IGV不能用于transcripts.fa的可視化,因?yàn)閎am文件不提供位置信息。師兄說(shuō)這可能是minimap2生成的bam文件格式和IGV不匹配所致,而不是IGV不能用于轉(zhuǎn)錄本的可視化。
- 用其他方法統(tǒng)計(jì)比對(duì)質(zhì)量
- qualimap
conda install -c bioconda qualimap
qualimap bamqc -bam BC08_sorted.bam -outdir qualimap_report



看不懂這些報(bào)告,先留著備份學(xué)習(xí)。
- 轉(zhuǎn)錄本計(jì)數(shù)
samtools idxstats BC07_sorted.bam > BC07_counts.txt