上一期,給大家介紹了SnpEff注釋數(shù)據(jù)庫。這一期著重介紹SnpEff的命令,最后一期介紹注釋結(jié)果解析
準(zhǔn)備文件
- 已經(jīng)注釋好的物種SnpEff注釋庫- GRCh37.100 (~/snpeff/genome/GRCh37.100 詳細過程參照說明一)
- 需要注釋的SNP/INDEL文件,格式VCF (任意文件夾 ~/database/SNP/human_GRCh37.vcf.gz)
??1 快速注釋的代碼很簡單,一步搞定
snpeffDir=~/snpeff
snpEff=${snpeffDir}/snpEff.jar
cd ~/database/SNP/
##常規(guī)注釋
nohup java -Xmx10G -jar $snpEff GRCh37.100 human_GRCh37.vcf.gz > human_GRCh37_snpeff.snp.vcf -csvStats human_GRCh37_snpeff.snp.csv -stats human_GRCh37_snpeff.snp.html &
解說:注釋的文件human_GRCh37_snpeff.snp.vcf 有詳細信息, human_GRCh37_snpeff.snp.html鏈接有統(tǒng)計圖片,該鏈接在Microsoft Edge顯示圖片失敗,如果出現(xiàn)這種情況,可以換一個瀏覽器打開。
??2 對特定區(qū)間注釋
過濾結(jié)果的選項(與命令ann配合使用):
-fi , -filterInterval <file> : Only analyze changes that intersect with the intervals specified in this file (you may use this option many times)
-no-downstream : Do not show DOWNSTREAM changes
-no-intergenic : Do not show INTERGENIC changes
-no-intron : Do not show INTRON changes
-no-upstream : Do not show UPSTREAM changes
-no-utr : Do not show 5_PRIME_UTR or 3_PRIME_UTR changes
-no EffectType : Do not show 'EffectType'. This option can be used several times.
#例:展示基因內(nèi)注釋
java -Xmx10G -jar $snpEff ann -no-intron -no-utr -no-downstream -no-upstream -no-intergenic GRCh37.100 human_GRCh37_snpeff.snp.vcf.gz > RNA-H-DL_snpeff.snp.gene.vcf -csvStats human_GRCh37_snpeff.csv -stats human_GRCh37_snpeff.html
注釋常規(guī)選項解說
Options:
-chr <string> : Prepend 'string' to chromosome name (e.g. 'chr1' instead of '1'). 染色體輸出前綴
-classic : Use old style annotations instead of Sequence Ontology and Hgvs. 使用舊的注釋格式,現(xiàn)在使用的Sequence Ontology, 新舊示例如下
-download : Download reference genome if not available. Default: true
-i <format> : Input format [ vcf, bed ]. Default: VCF.
-fileList : Input actually contains a list of files to process.
-o <format> : Ouput format [ vcf, gatk, bed, bedAnn ]. Default: VCF.
-s , -stats : Name of stats file (summary). Default is 'snpEff_summary.html'
-noStats : Do not create stats (summary) file
-csvStats : Create CSV summary file instead of HTML
常用選項-chr,-classic,-csvStats
-classic
| Type | Classic |
|---|---|
| coding_sequence_variant | CDS |
| chromosome | CHROMOSOME_LARGE DELETION |
| coding_sequence_variant | CODON_CHANGE |
| inframe_insertion | CODON_INSERTION |
| disruptive_inframe_insertion | CODON_CHANGE_PLUS CODON_INSERTION |
| inframe_deletion | CODON_DELETION |
| disruptive_inframe_deletion | CODON_CHANGE_PLUS CODON_DELETION |
| downstream_gene_variant | DOWNSTREAM |
| exon_variant | EXON |
| exon_loss_variant | EXON_DELETED |
| frameshift_variant | FRAME_SHIFT |
| gene_variant | GENE |
| intergenic_region | INTERGENIC |
| conserved_intergenic_variant | INTERGENIC_CONSERVED |
| intragenic_variant | INTRAGENIC |
| intron_variant | INTRON |
| conserved_intron_variant | INTRON_CONSERVED |
| miRNA | MICRO_RNA |
| missense_variant | NON_SYNONYMOUS_CODING |
| initiator_codon_variant | NON_SYNONYMOUS_START |
| stop_retained_variant | NON_SYNONYMOUS_STOP |
| rare_amino_acid_variant | RARE_AMINO_ACID |
| splice_acceptor_variant | SPLICE_SITE_ACCEPTOR |
| splice_donor_variant | SPLICE_SITE_DONOR |
| splice_region_variant | SPLICE_SITE_REGION |
| splice_region_variant | SPLICE_SITE_BRANCH |
| splice_region_variant | SPLICE_SITE_BRANCH_U12 |
| stop_lost | STOP_LOST |
| 5_prime_UTR_premature start_codon_gain_variant | START_GAINED |
| start_lost | START_LOST |
| stop_gained | STOP_GAINED |
| synonymous_variant | SYNONYMOUS_CODING |
| start_retained | SYNONYMOUS_START |
| stop_retained_variant | SYNONYMOUS_STOP |
| transcript_variant | TRANSCRIPT |
| regulatory_region_variant | REGULATION |
| upstream_gene_variant | UPSTREAM |
| 3_prime_UTR_variant | UTR_3_PRIME |
| 3_prime_UTR_truncation + exon_loss | UTR_3_DELETED |
| 5_prime_UTR_variant | UTR_5_PRIME |
| 5_prime_UTR_truncation + exon_loss_variant | UTR_5_DELETED |
部分變異注釋:密碼子變異(initiator_codon_variant),下游基因變異(downstream_gene_variant),基因間變異(intergenic_region),基因內(nèi)變異(intragenic_variant),內(nèi)含子變異(intron_variant),錯義突變(missense_variant),非編碼轉(zhuǎn)錄外顯子突變(non_coding_transcript_exon_variant),剪切受體突變(splice_acceptor_variant),剪切供體突變(splice_donor_variant),剪切位點區(qū)域變異(splice_region_variant),終止密碼子獲(stop_gained),終止密碼子丟失(stop_lost),終止密碼子保留(stop_retained_variant),同義突變(synonymous_variant ),上游基因突變(upstream_gene_variant),5_prime_UTR_premature_start_codon_gain_variant,5_prime_UTR(5_prime_UTR_variant),3_prime_UTR變異(3_prime_UTR_variant)。
??3 注釋文件的參數(shù)設(shè)置
Annotations options:
-cancer : Perform 'cancer' comparisons (Somatic vs Germline). Default: false
-cancerSamples <file> : Two column TXT file defining 'original \t derived' samples.
-formatEff : Use 'EFF' field compatible with older versions (instead of 'ANN').
-geneId : Use gene ID instead of gene name (VCF output). Default: false
-hgvs : Use HGVS annotations for amino acid sub-field. Default: true
-lof : Add loss of function (LOF) and Nonsense mediated decay (NMD) tags.
-noHgvs : Do not add HGVS annotations.
-noLof : Do not add LOF and NMD annotations.
-noShiftHgvs : Do not shift variants according to HGVS notation (most 3prime end).
-oicr : Add OICR tag in VCF file. Default: false
-sequenceOntology : Use Sequence Ontology terms. Default: true (跟-classic對應(yīng))
??4 注釋典型轉(zhuǎn)錄本 (canonical transcripts)
結(jié)果會輸出gene name, geneID, trianscriptId, cdsLength。
java -Xmx10G -jar $snpEff -v -canon GRCh37.100 human_GRCh37.vcf.gz > human_GRCh37ann.canon.vcf
