海盗王国电影在线播放,亚洲精品福利av,日逼精品伊人

Introduction 介紹

GDC mRNA定量分析管道測量 HT-Seq 原始reads統(tǒng)計(jì)中的基因表達(dá)水平，F(xiàn)ragments per Kilobase of transcript per Million mapped reads（FPKM）和FPKM-UQ（上四分位標(biāo)準(zhǔn)化）。首先將reads與GRCh38 reference genome 參考基因組比對，然后通過量化映射的reads產(chǎn)生這些值。為了促進(jìn)樣品間歸一化，所有RNA-Seq讀數(shù)在分析過程中都被視為unstranded的狀態(tài).

Data Processing Steps 數(shù)據(jù)處理步驟

RNA-Seq 比對流程

以 Alignment Workflow 開始比對的流程, 該流程使用 STAR 中重復(fù)比對方法執(zhí)行. STAR 分別比對每個(gè) read group 然后將得到的比對文件合并為一個(gè)。按照國際癌癥基因組協(xié)會(huì) ICGC ( github) 使用的方法, the two-pass method 包含剪接點(diǎn)檢測步驟，其用于產(chǎn)生最終比對。此工作流程輸出基因組BAM文件，其中包含比對和未比對的reads。質(zhì)量評估在比對前用 FASTQC 進(jìn)行，并在比對后用 Picard Tools 進(jìn)行。.

除了上面詳述的基因組比對之外，在數(shù)據(jù)發(fā)布之后處理的文件具有相關(guān)的轉(zhuǎn)錄組和嵌合比對。這僅用于至少有1組paired-end reads的等份樣品. 嵌合的BAM文件包含mapping到不同染色體或鏈的reads（融合比對）。基因組比對文件包含嵌合和未對齊的reads，以便于檢索所有原始reads。轉(zhuǎn)錄組比對報(bào)告使用轉(zhuǎn)錄物坐標(biāo)而不是基因組坐標(biāo)比對reads。轉(zhuǎn)錄組比對隊(duì)列也被不同地排序以促進(jìn)下游分析。這種排序方法不允許在這些排列上進(jìn)行BAM切片，故不支持BAM索引文件配對。這些對齊的拼接頭文件也可用。

RNA Alignment Pipeline

I/O	Entity	Format
Input	Submitted Unaligned Reads or Submitted Aligned Reads	FASTQ or BAM
Output	Aligned Reads	BAM

RNA-Seq Alignment 命令行參數(shù)

請注意，由于正在進(jìn)行管道開發(fā)和改進(jìn)，從GDC門戶下載的文件中的版本號可能會(huì)有所不同。

Original
Dr15plus

# STAR-2.4.2a

### For users with access to the ICGC pipeline:

python star_align.py \
--genomeDir <star_index_path> \
--FastqFileIn <input_fastq_path> \
--workDir <work_dir> \
--out <output_bam> \
--genomeFastaFiles <reference> \
--runThreadN 8 \
--outFilterMultimapScoreRange 1 \
--outFilterMultimapNmax 20 \
--outFilterMismatchNmax 10 \
--alignIntronMax 500000 \
--alignMatesGapMax 1000000 \
--sjdbScore 2 \
--limitBAMsortRAM 0 \
--alignSJDBoverhangMin 1 \
--genomeLoad NoSharedMemory \
--outFilterMatchNminOverLread 0.33 \
--outFilterScoreMinOverLread 0.33 \
--twopass1readsN -1 \
--sjdbOverhang 100 \
--outSAMstrandField intronMotif \
--outSAMunmapped Within

### For users without access to the ICGC pipeline:

### Step 1: Building the STAR index.*

STAR
--runMode genomeGenerate
--genomeDir <star_index_path>
--genomeFastaFiles <reference>
--sjdbOverhang 100
--sjdbGTFfile <gencode.v22.annotation.gtf>
--runThreadN 8

### Step 2: Alignment 1st Pass.

STAR
--genomeDir <star_index_path>
--readFilesIn <fastq_left_1>,<fastq_left2>,... <fastq_right_1>,<fastq_right_2>,...
--runThreadN <runThreadN>
--outFilterMultimapScoreRange 1
--outFilterMultimapNmax 20
--outFilterMismatchNmax 10
--alignIntronMax 500000
--alignMatesGapMax 1000000
--sjdbScore 2
--alignSJDBoverhangMin 1
--genomeLoad NoSharedMemory
--readFilesCommand <bzcat|cat|zcat>
--outFilterMatchNminOverLread 0.33
--outFilterScoreMinOverLread 0.33
--sjdbOverhang 100
--outSAMstrandField intronMotif
--outSAMtype None
--outSAMmode None

### Step 3: Intermediate Index Generation.

STAR
--runMode genomeGenerate
--genomeDir <output_path>
--genomeFastaFiles <reference>
--sjdbOverhang 100
--runThreadN <runThreadN>
--sjdbFileChrStartEnd <SJ.out.tab from previous step>

### Step 4: Alignment 2nd Pass.

STAR
--genomeDir <output_path from previous step>
--readFilesIn <fastq_left_1>,<fastq_left2>,... <fastq_right_1>,<fastq_right_2>,...
--runThreadN <runThreadN>
--outFilterMultimapScoreRange 1
--outFilterMultimapNmax 20
--outFilterMismatchNmax 10
--alignIntronMax 500000
--alignMatesGapMax 1000000
--sjdbScore 2
--alignSJDBoverhangMin 1
--genomeLoad NoSharedMemory
--limitBAMsortRAM 0
--readFilesCommand <bzcat|cat|zcat>
--outFilterMatchNminOverLread 0.33
--outFilterScoreMinOverLread 0.33
--sjdbOverhang 100
--outSAMstrandField intronMotif
--outSAMattributes NH HI NM MD AS XS
--outSAMunmapped Within
--outSAMtype BAM SortedByCoordinate
--outSAMheaderHD @HD VN:1.4
--outSAMattrRGline <formatted RG line provided by wrapper>

*這些索引可在 GDC Website 上下載，無需再次構(gòu)建。

mRNA 表達(dá)量處理流程

比對后，通過 RNA Expression Workflow 處理BAM文件以確定RNA表達(dá)水平。

映射到每個(gè)基因的讀數(shù)使用HT-Seq-Count計(jì)數(shù)。表達(dá)式值以制表符分隔的格式提供。 GENCODE v22 用于基因注釋。

在Data Release 14之后處理的文件具有STAR在對齊步驟期間生成的額外讀取計(jì)數(shù)集。

I/O	Entity	Format
Input	Aligned Reads	BAM
Output	Gene Expression	TXT

mRNA Quantification 命令行參數(shù)

HTSeq-0.6.1p1

Original
Dr15plus

htseq-count \
-m intersection-nonempty \
-i gene_id \
-r pos \
-s no \
- gencode.v22.annotation.gtf

mRNA Expression HT-Seq Normalization 表達(dá)標(biāo)準(zhǔn)化

由HT-Seq產(chǎn)生的RNA-Seq表達(dá)水平reads計(jì)數(shù)使用兩種類似的方法標(biāo)準(zhǔn)化：FPKM和FPKM-UQ。標(biāo)準(zhǔn)化值應(yīng)僅在整個(gè)基因集的上下文中使用。如果研究了一組基因，鼓勵(lì)用戶將原始reads計(jì)數(shù)值標(biāo)準(zhǔn)化。

FPKM

The Fragments per Kilobase of transcript per Million mapped reads (FPKM) 計(jì)算通過將讀數(shù)除以基因長度和映射到蛋白質(zhì)編碼基因的讀數(shù)總數(shù)來標(biāo)準(zhǔn)化讀數(shù)。

Upper Quartile FPKM

The upper quartile FPKM (FPKM-UQ) 是一種修改的FPKM計(jì)算，其中總蛋白質(zhì)編碼讀數(shù)計(jì)數(shù)被樣品的第75百分位讀數(shù)計(jì)數(shù)值代替。

Calculations

FPKM Calculations

RC_g: 映射到Gene的reads數(shù)
RC_pc: 映射到所有蛋白質(zhì)編碼基因的reads數(shù)
RC_g75: 本中基因的第75百分位reads計(jì)數(shù)值
L: Length of the gene in base pairs; 計(jì)算為基因中所有外顯子的總和

Note: 在歸一化時(shí)，reads計(jì)數(shù)乘以標(biāo)量(10⁹) 以考慮千堿基和'百萬映射讀數(shù)'單位

Examples 樣品

Sample 1: Gene A

Gene length: 3,000 bp
1,000 reads mapped to Gene A
1,000,000 reads mapped to all protein-coding regions
Read count in Sample 1 for 75th percentile gene: 2,000

FPKM for Gene A = (1,000)(10^9)/[(3,000)(1,000,000)] = 333.33

FPKM-UQ for Gene A = (1,000)(10^9)/[(3,000)(2,000)] = 166,666.67

File Access and Availability 文件訪問和可用性

為了便于在用戶創(chuàng)建的管道中使用協(xié)調(diào)數(shù)據(jù)，可以在GDC數(shù)據(jù)門戶中的幾個(gè)中間步驟中訪問RNA-Seq基因表達(dá)。以下是可在GDC Data Portal中下載的每種文件類型的說明。

Type	Description	Format
RNA-Seq Alignment	已經(jīng)與GRCh38構(gòu)建一致的RNA-Seq reads。包括未比對上的reads以促進(jìn)原始讀取集的可用性	BAM
HT-Seq Read Counts	通過HT-Seq計(jì)算的與每個(gè)基因比對的reads數(shù)目	TXT
STAR Read Counts	STAR計(jì)算的比對到每個(gè)基因的reads數(shù)	TSV
FPKM	標(biāo)準(zhǔn)化的表達(dá)值，其考慮每個(gè)基因長度和映射到所有蛋白質(zhì)編碼基因的reads的數(shù)量	TXT
FPKM-UQ	FPKM公式的修改版本，其中第75百分位reads計(jì)數(shù)用作分母代替蛋白質(zhì)編碼的總reads數(shù)	TXT

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

生物信息學(xué)流程：mRNA Analysis Pipeline