從高通量基因組技術(shù)直觀地可視化和解讀數(shù)據(jù)仍然具有挑戰(zhàn)性。R中的基因組可視化(GenVisR)試圖通過提供高度可定制的出版物質(zhì)量圖形來減輕這一負(fù)擔(dān),支持多個物種,并主要關(guān)注cohort level(即多個樣本/患者)。GenVisR試圖保持高度的靈活性,同時利用ggplot2和bioconductor的能力來實(shí)現(xiàn)這一目標(biāo)。
部分功能跟maftools的功能很相似。
安裝:從Bioconductor安裝:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("GenVisR")
library(GenVisR)
功能:
GenVisR主要有以下13個功能,我將用兩個數(shù)據(jù)集去測試(一個從TCGA下載,一個是自己的內(nèi)部數(shù)據(jù))。
- Waterfall (mutation overview graphic)
- lolliplot (mutation hotspot graphic)
- genCov (sequence coverage graphic)
- TvTi (transition/transversion graphic)
- cnSpec(copy altered cohort graphic)
- cnView(copy altered single sample graphic)
- covBars(sequencing coverage cohort)
- cnFreq(proportional copy number alterations)
- ideoView(ideogram graphic)
- lohSpec(Loss of Heterozygosity Spectrum)
- lohView(Loss of Heterozygosity View)
- compldent(snp identity graphic)
- geneViz(Transcript Represenation)
1. Waterfall plot
首先需要一個MAF文件或MGI文件作為輸入, 文件中的突變類型或者叫“Variant_Classification”包含下列字段:

可以下載一個TCGA的某一個腫瘤類型的MAF文件看看,再用自己跑出來的單個樣本的MAF文件試試。
使用TCGAbiolinks下載:隨便選擇一個腫瘤,就用COAD。
BiocManager::install("TCGAbiolinks")
library(TCGAbiolinks)
# 下載TCGA的MAF
maf_file = GDCquery_Maf("COAD",save.csv = T, directory = "GDCdata", pipelines = "varscan")
waterfall(maf_file, fileType = "MAF")
有報(bào)錯:

報(bào)錯內(nèi)容是:該MAF中Variant_Classification中有不認(rèn)識的字段。
看看到底是哪個不認(rèn)識:

肉眼比較一下,發(fā)現(xiàn)TCGA結(jié)果中多出來了一個Splice_Region是GenVisR不認(rèn)識的,看看屬于Splice_Region的突變多不多呢?

還挺多,有1929個,刪掉試試:
maf_file = subset(maf_file, Variant_Classification != "Splice_Region")
waterfall(maf_file, fileType = "MAF")
gene太多了導(dǎo)致左側(cè)都堆在一起了。

下載的TCGA COAD腫瘤的MAF中有399個樣本,再來看看我自己一個樣本的MAF文件結(jié)果:
my_maf = read.csv("mysample.variants.funcotated.without.header.MAF.xls",header = T, sep = "\t")
waterfall(maf_file, fileType = "MAF")

雖然可以設(shè)置plotGenes來指定特定的基因,但好像不能想maftools那樣指定top基因
# Plot only the specified genes
waterfall(brcaMAF, plotGenes = c("PIK3CA", "TP53", "USH2A", "MLL3", "BRCA1"))
如果有臨床信息(clinical data)可以通過一些代碼附加上,并畫出如下圖:
# Create clinical data
subtype <- c("lumA", "lumB", "her2", "basal", "normal")
subtype <- sample(subtype, 50, replace = TRUE)
age <- c("20-30", "31-50", "51-60", "61+")
age <- sample(age, 50, replace = TRUE)
sample <- as.character(unique(brcaMAF$Tumor_Sample_Barcode))
clinical <- as.data.frame(cbind(sample, subtype, age))
# Melt the clinical data into 'long' format.
library(reshape2)
clinical <- melt(clinical, id.vars = c("sample"))
# Run waterfall
waterfall(brcaMAF, clinDat = clinical, clinVarCol = c(lumA = "blue4", lumB = "deepskyblue",
her2 = "hotpink2", basal = "firebrick2", normal = "green4", `20-30` = "#ddd1e7",
`31-50` = "#bba3d0", `51-60` = "#9975b9", `61+` = "#7647a2"), plotGenes = c("PIK3CA",
"TP53", "USH2A", "MLL3", "BRCA1"), clinLegCol = 2, clinVarOrder = c("lumA", "lumB",
"her2", "basal", "normal", "20-30", "31-50", "51-60", "61+"))

但是這里都沒有Clinical Data,就不演示了。
GenVisR 基因組數(shù)據(jù)可視化實(shí)戰(zhàn)(二)
GenVisR 基因組數(shù)據(jù)可視化實(shí)戰(zhàn)(三)
GenVisR 基因組數(shù)據(jù)可視化實(shí)戰(zhàn) (四)