snpEff使用說明(下)-SnpEff注釋SNP/INDEL

上一期,給大家介紹了SnpEff注釋數(shù)據(jù)庫。這一期著重介紹SnpEff的命令,最后一期介紹注釋結(jié)果解析

準(zhǔn)備文件

  1. 已經(jīng)注釋好的物種SnpEff注釋庫- GRCh37.100 (~/snpeff/genome/GRCh37.100 詳細過程參照說明一)
  2. 需要注釋的SNP/INDEL文件,格式VCF (任意文件夾 ~/database/SNP/human_GRCh37.vcf.gz)

??1 快速注釋的代碼很簡單,一步搞定

snpeffDir=~/snpeff
snpEff=${snpeffDir}/snpEff.jar
cd  ~/database/SNP/
##常規(guī)注釋
nohup java -Xmx10G -jar $snpEff GRCh37.100 human_GRCh37.vcf.gz  > human_GRCh37_snpeff.snp.vcf -csvStats human_GRCh37_snpeff.snp.csv -stats human_GRCh37_snpeff.snp.html &

解說:注釋的文件human_GRCh37_snpeff.snp.vcf 有詳細信息, human_GRCh37_snpeff.snp.html鏈接有統(tǒng)計圖片,該鏈接在Microsoft Edge顯示圖片失敗,如果出現(xiàn)這種情況,可以換一個瀏覽器打開。

??2 對特定區(qū)間注釋

過濾結(jié)果的選項(與命令ann配合使用):
-fi , -filterInterval <file> : Only analyze changes that intersect with the intervals specified in this file (you may use this option many times)
-no-downstream : Do not show DOWNSTREAM changes
-no-intergenic : Do not show INTERGENIC changes
-no-intron : Do not show INTRON changes
-no-upstream : Do not show UPSTREAM changes
-no-utr : Do not show 5_PRIME_UTR or 3_PRIME_UTR changes
-no EffectType : Do not show 'EffectType'. This option can be used several times.

#例:展示基因內(nèi)注釋
java -Xmx10G -jar $snpEff ann -no-intron -no-utr -no-downstream -no-upstream -no-intergenic GRCh37.100 human_GRCh37_snpeff.snp.vcf.gz  > RNA-H-DL_snpeff.snp.gene.vcf -csvStats human_GRCh37_snpeff.csv -stats human_GRCh37_snpeff.html

注釋常規(guī)選項解說
Options:
-chr <string> : Prepend 'string' to chromosome name (e.g. 'chr1' instead of '1'). 染色體輸出前綴
-classic : Use old style annotations instead of Sequence Ontology and Hgvs. 使用舊的注釋格式,現(xiàn)在使用的Sequence Ontology, 新舊示例如下
-download : Download reference genome if not available. Default: true
-i <format> : Input format [ vcf, bed ]. Default: VCF.
-fileList : Input actually contains a list of files to process.
-o <format> : Ouput format [ vcf, gatk, bed, bedAnn ]. Default: VCF.
-s , -stats : Name of stats file (summary). Default is 'snpEff_summary.html'
-noStats : Do not create stats (summary) file
-csvStats : Create CSV summary file instead of HTML

常用選項-chr,-classic,-csvStats
-classic

Type Classic
coding_sequence_variant CDS
chromosome CHROMOSOME_LARGE DELETION
coding_sequence_variant CODON_CHANGE
inframe_insertion CODON_INSERTION
disruptive_inframe_insertion CODON_CHANGE_PLUS CODON_INSERTION
inframe_deletion CODON_DELETION
disruptive_inframe_deletion CODON_CHANGE_PLUS CODON_DELETION
downstream_gene_variant DOWNSTREAM
exon_variant EXON
exon_loss_variant EXON_DELETED
frameshift_variant FRAME_SHIFT
gene_variant GENE
intergenic_region INTERGENIC
conserved_intergenic_variant INTERGENIC_CONSERVED
intragenic_variant INTRAGENIC
intron_variant INTRON
conserved_intron_variant INTRON_CONSERVED
miRNA MICRO_RNA
missense_variant NON_SYNONYMOUS_CODING
initiator_codon_variant NON_SYNONYMOUS_START
stop_retained_variant NON_SYNONYMOUS_STOP
rare_amino_acid_variant RARE_AMINO_ACID
splice_acceptor_variant SPLICE_SITE_ACCEPTOR
splice_donor_variant SPLICE_SITE_DONOR
splice_region_variant SPLICE_SITE_REGION
splice_region_variant SPLICE_SITE_BRANCH
splice_region_variant SPLICE_SITE_BRANCH_U12
stop_lost STOP_LOST
5_prime_UTR_premature start_codon_gain_variant START_GAINED
start_lost START_LOST
stop_gained STOP_GAINED
synonymous_variant SYNONYMOUS_CODING
start_retained SYNONYMOUS_START
stop_retained_variant SYNONYMOUS_STOP
transcript_variant TRANSCRIPT
regulatory_region_variant REGULATION
upstream_gene_variant UPSTREAM
3_prime_UTR_variant UTR_3_PRIME
3_prime_UTR_truncation + exon_loss UTR_3_DELETED
5_prime_UTR_variant UTR_5_PRIME
5_prime_UTR_truncation + exon_loss_variant UTR_5_DELETED

部分變異注釋:密碼子變異(initiator_codon_variant),下游基因變異(downstream_gene_variant),基因間變異(intergenic_region),基因內(nèi)變異(intragenic_variant),內(nèi)含子變異(intron_variant),錯義突變(missense_variant),非編碼轉(zhuǎn)錄外顯子突變(non_coding_transcript_exon_variant),剪切受體突變(splice_acceptor_variant),剪切供體突變(splice_donor_variant),剪切位點區(qū)域變異(splice_region_variant),終止密碼子獲(stop_gained),終止密碼子丟失(stop_lost),終止密碼子保留(stop_retained_variant),同義突變(synonymous_variant ),上游基因突變(upstream_gene_variant),5_prime_UTR_premature_start_codon_gain_variant,5_prime_UTR(5_prime_UTR_variant),3_prime_UTR變異(3_prime_UTR_variant)。

??3 注釋文件的參數(shù)設(shè)置

Annotations options:
-cancer : Perform 'cancer' comparisons (Somatic vs Germline). Default: false
-cancerSamples <file> : Two column TXT file defining 'original \t derived' samples.
-formatEff : Use 'EFF' field compatible with older versions (instead of 'ANN').
-geneId : Use gene ID instead of gene name (VCF output). Default: false
-hgvs : Use HGVS annotations for amino acid sub-field. Default: true
-lof : Add loss of function (LOF) and Nonsense mediated decay (NMD) tags.
-noHgvs : Do not add HGVS annotations.
-noLof : Do not add LOF and NMD annotations.
-noShiftHgvs : Do not shift variants according to HGVS notation (most 3prime end).
-oicr : Add OICR tag in VCF file. Default: false
-sequenceOntology : Use Sequence Ontology terms. Default: true (跟-classic對應(yīng))

??4 注釋典型轉(zhuǎn)錄本 (canonical transcripts)

結(jié)果會輸出gene name, geneID, trianscriptId, cdsLength。

java -Xmx10G -jar $snpEff -v -canon GRCh37.100 human_GRCh37.vcf.gz > human_GRCh37ann.canon.vcf
image.png

snpEff的主要功能及解析就介紹到這里,如果大家有什么疑問,可以在評論下方留言哦??

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容