RNA- seq 的數(shù)據(jù)處理主要分為以下幾個過程：

1. 測序數(shù)據(jù)質(zhì)控，以及參考基因組和注釋文件下載；

2. 序列比對，將測序得到的短 reads 序列往參考基因組上 mapping；

3. 差異表達基因鑒定。

更詳細的內(nèi)容見軟件官網(wǎng)?

HISAT2: graph-based alignment of next generation sequencing reads to a population of genomes

一、建索引

下載參考基因組和相應(yīng)的注釋文件：

wget ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

gzip -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

mv Homo_sapiens.GRCh38.dna.primary_assembly.fa genome.fa

wget ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.gtf.gz $ gzip -d Homo_sapiens.GRCh38.84.gtf.gz$ mv Homo_sapiens.GRCh38.84.gtf genome.gtf

make exon, splicesite file:

hisat2_extract_splice_sites.py genome.gtf > genome.ss

hisat2_extract_exons.py genome.gtf > genome.exon

Building indexes

hisat2-build? ?-p? 16 --exon? genome.exon --ss? genome.ss? genome.fa? genome_tran

# -p? 線程數(shù)；

二、序列比對

hisat2? ?[options]*? ?-x? ? <hisat2-idx>? ? ?{-1 <m1> -2 <m2> | -U <r> | --sra-acc <SRA accession number>}? ? ?[-S <hit>]?? --dta

-x? ?? ? # The basename of the index for the reference genome.

-1? ?? ? ? ? ? ? ? ?#?Comma-separated list of files containing mate 1s (filename usually includes?_1), e.g.?-1 flyA_1.fq,flyB_1.fq.

-2? ? ? ? ? ? ? ? ?#?Comma-separated list of files containing mate 2s (filename usually includes?_2), e.g.?-2 flyA_2.fq,flyB_2.fq.

-S? ? ?? ? ? ? ? ? ?#?File to write SAM alignments to.

--dta? ? ? ? ? ?#??Report alignments tailored for transcript assemblers including StringTie. With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short-anchors, which helps transcript assemblers improve significantly in computation and memory usage.

上述 4 個參數(shù)是基本參數(shù)，如果輸入文件是 FASTQ 文件，并按第一步建了索引文件，即可直接進行比對。

三、結(jié)果文件解讀

hisat2? ? -p? ?8? ? ? -x? ? $RNA_REF_INDEX? ? -1? ? ?BB.read1.fastq.gz? ? ?-2? ? ? BB.read2.fastq.gz? ? -S? ? ?./HBR_Rep3.sam

生成的 sam 文件格式如下：

第1列：reads名稱；

第2列：Flag標簽；Flag標簽是二進制數(shù)字之和，不同數(shù)字代表了不同的意義。Sum of all applicable flags.

1: The read is one of a pair

2: The alignment is one end of a proper paired-end alignment

4: The read has no reported alignments

8: The read is one of a pair and has no reported alignments

16: The alignment is to the reverse reference strand

32: The other mate in the paired-end alignment is aligned to the reverse reference strand

64: The read is mate 1 in a pair

128: The read is mate 2 in a pair

第3列：比對到的染色體信息；

第4列：比對到參考基因組物理位置；

第5列：比對質(zhì)量值（0-60）；

第6列：CIAGR（記錄插入、缺失等）；CIAGR中包含的是比對結(jié)果信息，表明了一條reads所有堿基的比對情況。比如CIGAR = 150M表示150bp的reads都比對到參考基因組上；

第7列：配對reads比對到的染色體，=表示相同；

第8列：配對reads比對到的染色體物理位置；

第9列：文庫插入序列大?。?/p>

第10列：Reads序列；

第11列：質(zhì)量值。

第12列：Optional fields.

四、格式轉(zhuǎn)換并排序

使用 samtools 對文件進行排序

# 轉(zhuǎn)換為 bam 文件格式

samtools view -S? test.sam -b > test.bam

# 排序并建索引

samtools sort test.bam -o test_sort.bam

samtools index test_sort.bam

比對過程還算是比較簡單，接下來就是檢測差異表達基因了。

后續(xù)處理可能需要結(jié)合其它軟件使用，我自己有用到的話會持續(xù)更新，歡迎翻閱寫的其它筆記！

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

RNA-seq 數(shù)據(jù)處理（一）HISAT2 序列比對

RNA-seq 數(shù)據(jù)處理（一）HISAT2 序列比對

RNA- seq 的數(shù)據(jù)處理主要分為以下幾個過程：

1. 測序數(shù)據(jù)質(zhì)控，以及參考基因組和注釋文件下載；

2. 序列比對，將測序得到的短 reads 序列往參考基因組上 mapping；

3. 差異表達基因鑒定。

更詳細的內(nèi)容見軟件官網(wǎng)?

一、建索引

二、序列比對

三、結(jié)果文件解讀

四、格式轉(zhuǎn)換并排序

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

RNA-seq 數(shù)據(jù)處理（一）HISAT2 序列比對

RNA- seq 的數(shù)據(jù)處理主要分為以下幾個過程：

1. 測序數(shù)據(jù)質(zhì)控，以及參考基因組和注釋文件下載；

2. 序列比對，將測序得到的短 reads 序列往參考基因組上 mapping；

3. 差異表達基因鑒定。

更詳細的內(nèi)容見軟件官網(wǎng)?

一、建索引

二、序列比對

三、結(jié)果文件解讀

四、格式轉(zhuǎn)換并排序

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

1. 測序數(shù)據(jù)質(zhì)控，以及參考基因組和注釋文件下載；

2. 序列比對，將測序得到的短 reads 序列往參考基因組上 mapping；

3. 差異表達基因鑒定。

一、建索引

二、序列比對

四、格式轉(zhuǎn)換并排序