Part1: gnomAD-LOEUF數(shù)據(jù)下載

? ? 上一篇文章已經(jīng)講解了gnomAD的flagship paper，文章中新建的評(píng)估基因?qū)LoF突變?nèi)淌芰Φ哪Ｐ汀狶OEUF，想必大家肯定都很想趕緊用起來(lái)。

? ? 這個(gè)模型的數(shù)據(jù)結(jié)果即可以在文章的supplymentary中找到（就是supplementary_dataset_11_full_constraint_metrics.tsv.gz）。也可以爬上梯子，在gnomAD的官網(wǎng) → 右上角Downloads → gnomad.v2.1.1系列的Constraint，鏈接到相應(yīng)位置進(jìn)行下載，官網(wǎng)提供了：①只有經(jīng)典轉(zhuǎn)錄本的gene list表格、②包含了多個(gè)轉(zhuǎn)錄本的transcript list表格（和文章附表內(nèi)容相同）、③按人群數(shù)量做了降采樣的E.O值。

Part2: 表格數(shù)據(jù)意義（翻譯一下）

? 以full_constraint_metrics，即lof_metrics.by_transcript為例。
該文檔共80950行，其中包含了19600+個(gè)基因的經(jīng)典轉(zhuǎn)錄本和其他常見(jiàn)轉(zhuǎn)錄本。共78列，每列的header解釋如下（詳見(jiàn)supplementary information文檔第74頁(yè)）：

gene: Gene name，基因名稱

transcript: Ensembl transcript ID (Gencode v19)，轉(zhuǎn)錄本編號(hào)

canonical: Boolean indicator as to whether the transcript is the canonical transcript for the gene，是否是該基因的經(jīng)典轉(zhuǎn)錄本

obs_XXX: Number of observed XXX variants in transcript，在該轉(zhuǎn)錄本上觀察到XXX突變的數(shù)量（XXX=mis錯(cuò)義、syn同義、lof功能缺失）

exp_XXX: Number of expected XXX variants in transcript，在該轉(zhuǎn)錄本上預(yù)測(cè)到XXX突變的數(shù)量

oe_XXX: Observed over expected ratio for XXX variants in transcript (obs_XXX divided by exp_XXX)，在該轉(zhuǎn)錄本上觀察到XXX變異超出預(yù)期的比率

mu_XXX: Mutation rate summed across all possible XXX variants in transcript，該轉(zhuǎn)錄本中所有可能的XXX變異的突變率總和

possible_XXX: Number of possible XXX variants in transcript，該轉(zhuǎn)錄中可能的XXX突變的數(shù)量（其實(shí)不是很理解這個(gè)）

obs_XXX_pphen: Number of observed XXX variants in transcript predicted "probably damaging" by PolyPhen-2，被PolyPhen-2預(yù)測(cè)為“可能有害”的、觀察到的XXX突變數(shù)量

exp_XXX_pphen: Number of expected XXX variants in transcript predicted "probably damaging" by PolyPhen-2，被PolyPhen-2預(yù)測(cè)為“可能有害”的、預(yù)測(cè)察到的XXX突變數(shù)量

oe_XXX_pphen: Observed over expected ratio for PolyPhen-2 predicted "probably damaging" XXX variants in transcript (obs_mis_pphen divided by exp_mis_pphen)，被PolyPhen-2預(yù)測(cè)為“可能有害”的XXX突變，觀察到超過(guò)預(yù)期的比率

possible_XXX_pphen: Number of possible missense variants in transcript that are predicted "probably damaging" by PolyPhen-2，被PolyPhen-2預(yù)測(cè)為“可能有害”的、可能的XXX突變的數(shù)量（其實(shí)也不是很理解這個(gè)）

oe_XXX_lower: Lower bound of 90% confidence interval for o/e ratio for XXX variants，XXX突變的o/e比率90%置信區(qū)間的下界

oe_XXX_upper: Upper bound of 90% confidence interval for o/e ratio for XXX variants，XXX突變的o/e比率90%置信區(qū)間的上界

XXX_z: Z score for XXX variants in gene. Higher (more positive) Z scores indicate that the transcript is more intolerant of variation (more constrained). Extreme values of XXX_z indicate likely data quality issues，基因中XXX突變的Z-score。Z-score越高(越陽(yáng)性)表明該轉(zhuǎn)錄本越不耐受XXX變異(越受限制)。XXX_z的極端值表示可能存在數(shù)據(jù)質(zhì)量問(wèn)題。

pLI: Probability of loss-of-function intolerance; probability that transcript falls into distribution of haploinsufficient genes (~9% o/e pLoF ratio; computed from gnomAD data)，用gnomAD數(shù)據(jù)計(jì)算出來(lái)的pLI

pRec: Probability that transcript falls into distribution of recessive genes (~46% o/e pLoF ratio; computed from gnomAD data)，該轉(zhuǎn)錄本屬于隱性基因的概率

pNull: Probability that transcript falls into distribution of unconstrained genes (~100% o/e pLoF ratio; computed from gnomAD data)，該轉(zhuǎn)錄本屬于非約束基因的概率

oe_lof_upper_rank: Transcript’s rank of LOEUF value compared to all transcripts (lower values indicate more constrained)，與所有轉(zhuǎn)錄本相比，該轉(zhuǎn)錄本的LOEUF值的排名(較低的值表示更受限制)

oe_lof_upper_bin: Decile bin of LOEUF for given transcript (lower values indicate more constrained)，該轉(zhuǎn)錄本在十分位分類中的位置(較低的值表示更受限制)

（以上2個(gè)主要是是表示LOEUF的排序、decile 分類指標(biāo)）

oe_lof_upper_bin_6: Sextile bin of LOEUF for given transcript (lower values indicate more constrained)，該轉(zhuǎn)錄本在六分位分類中的位置(較低的值表示更受限制)

n_sites: Number of distinct pLoF variant sites in the transcript，該轉(zhuǎn)錄體中不同lof突變位點(diǎn)的數(shù)量

classic_caf: Sum of allele frequencies of pLoFs in the transcript，該轉(zhuǎn)錄本中的pLoFs的等位基因頻率的總和

max_af: Maximum allele frequency of any pLoF in the transcript，該轉(zhuǎn)錄本中的任一pLoF的最大等位基因頻率

no_lofs: The number of individuals with no observed pLoF variants in the transcript，在該轉(zhuǎn)錄本中觀察到pLoF變異的個(gè)體數(shù)量

obs_het_lof: The number of individuals with at least one observed heterozygous pLoF variant, but no homozygous pLoF variants, in the transcript，在該轉(zhuǎn)錄本中觀察到至少一個(gè)雜合pLoF變異，但沒(méi)有純合，的個(gè)體數(shù)量

obs_hom_lof: The number of individuals with at least one observed homozygous pLoF in the transcript，在該轉(zhuǎn)錄本中觀察到至少一個(gè)純合pLoF變異的個(gè)體數(shù)量

defined: The number of individuals where at least one high-quality genotype (including homozygous reference) is observed at a called site annotated as a pLoF variant，至少有觀察到一個(gè)高質(zhì)量的pLoF突變的個(gè)體數(shù)量

p: The estimated proportion of haplotypes with a pLoF variant. Defined as: 1 - sqrt(no_lofs / defined) 一個(gè)pLoF突變的單倍型的估計(jì)比例。

exp_hom_lof: The expected number of individuals with at least one homozygous pLoF variant based on the frequency of pLoF haplotypes. Defined as: defined * p2，根據(jù)pLoF的單倍型頻率計(jì)算，至少有一個(gè)純合pLoF突變的預(yù)期個(gè)體數(shù)量。

classic_caf_POP: Sum of allele frequencies of pLoFs in the transcript among POP individuals，POP人群中pLoFs的等位基因頻率的總和

p_POP: The computation of `p` repeated among only POP individuals，只在POP群體中重復(fù)的'p'值

transcript_type: Transcript biotype (https://www.gencodegenes.org/pages/biotypes.html)，轉(zhuǎn)錄本生物型

gene_id: Ensembl gene ID，Ensembl 的基因編號(hào)

transcript_level: Transcript level from Gencode (https://www.gencodegenes.org/pages/data_format.html)，來(lái)自Gencode的轉(zhuǎn)錄水平

cds_length: Length of coding sequence in gene，該基因的編碼序列長(zhǎng)度

num_coding_exons: Number of coding exons in gene，該基因上編碼外顯子的數(shù)量

gene_type: Gene biotype (https://www.gencodegenes.org/pages/biotypes.html)，基因生物型

gene_length: Length of gene，基因的長(zhǎng)度

exac_pLI: pLI score calculated from ExAC，在ExAC中計(jì)算得到的pLI值

exac_obs_lof: Number of observed pLoF variants in gene in ExAC，在ExAC中pLoF突變的觀察數(shù)量

exac_exp_lof: Number of expected pLoF variants in gene in ExAC，在ExAC中pLoF突變的預(yù)測(cè)數(shù)量

exac_oe_lof: Observed to expected ratio of pLoF variants in ExAC，在ExAC中pLoF突變的觀察與預(yù)期的比率

brain_expression: Expression of gene in brain from GTEx data，GTEx數(shù)據(jù)中該基因在腦部的表達(dá)

chromosome: Chromosome name，染色體

start_position: Start position of gene，該基因的起始位置

end_position: End position of gene，該基因的終止始位置）

Part3: 使用annovar注釋

注釋數(shù)據(jù)庫(kù)文件制作：

? ? 根據(jù)對(duì)header的理解，選用了默認(rèn)canonical=TRUE的"gnomad.v2.1.1.lof_metrics.by_gene.txt"作為數(shù)據(jù)庫(kù)數(shù)據(jù)來(lái)源；

? ? 方便起見(jiàn)，將最后3列移到了最前面；然后挑選了一些我自己認(rèn)為對(duì)我做疾病分析可能比較重要的列：

gene，gene_id，transcript，oe_lof_upper_bin，oe_lof_upper_rank， pLI，pRec，pNull，transcript_level

? ? 制作成了一個(gè)12列的文件，命名為“hg19_LOEUF.txt”。數(shù)據(jù)量不是很大的話，做不做該文件的annovar_index都可以，想做的話，構(gòu)建索引的程序index_annovar.pl可以給王凱老師發(fā)郵件獲得。

? ? 另外，由于ANNOVAR的region_base注釋數(shù)據(jù)庫(kù)不能有header，而且所有數(shù)據(jù)會(huì)被合并到一列里面，需要后續(xù)再自己拆開(kāi)，所以上面這個(gè)順序需要另外記錄好哦~

ANNOVAR注釋腳本的修改：?

? ? 這里感謝TSY小伙伴之前的嘗試和經(jīng)驗(yàn)分享，在annotate_variation.pl文件的3084行加入下面這段elseif：

注釋一下：

$annovar/convert2annovar.pl -format vcf4 sample.vcf > sample.avinput

$annovar/table_annovar.pl sample.avinput $humandb --buildver hg19 -out sample_anno_LOEUF? -remove -protocol refGene,ensGene,LOEUF -operation g,g,r --nastring . -csvout

同時(shí)把refGene和ensGene注釋上去了，為了比較一下注釋結(jié)果中的gene名和ID是不是都能匹配上。

? ? 這個(gè)結(jié)果可以按需要再進(jìn)行分列處理。如果對(duì)其他列的信息也感興趣想進(jìn)行注釋，可以參考上面的步驟進(jìn)行修改和使用哦。

To Wang Lab 小伙伴：

? ? ANNOVAR比較適合做突變位點(diǎn)的注釋，在這個(gè)region_base的注釋中，只能覆蓋到這些基因的基因內(nèi)區(qū)域，如coding區(qū)和intron區(qū)，如果你們的SNP是在基因間區(qū)的話應(yīng)該不能直接注釋。可以基于最原始的表格，直接用你們感興趣的基因去做匹配，主要看“oe_lof_upper_bin”列的值。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

gnomAD-LOEUF數(shù)據(jù)下載&使用annovar注釋

gnomAD-LOEUF數(shù)據(jù)下載&使用annovar注釋

Part1: gnomAD-LOEUF數(shù)據(jù)下載

Part2: 表格數(shù)據(jù)意義（翻譯一下）

Part3: 使用annovar注釋

注釋數(shù)據(jù)庫(kù)文件制作：

ANNOVAR注釋腳本的修改：?

注釋一下：

To Wang Lab 小伙伴：

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

gnomAD-LOEUF數(shù)據(jù)下載&使用annovar注釋

Part1: gnomAD-LOEUF數(shù)據(jù)下載

Part2: 表格數(shù)據(jù)意義（翻譯一下）

Part3: 使用annovar注釋

注釋數(shù)據(jù)庫(kù)文件制作：

ANNOVAR注釋腳本的修改：?

注釋一下：

To Wang Lab 小伙伴：

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av