
GSVA.jpg
跟著 Nat Med. 學(xué)作圖 | GSVA+limma差異通路分析+發(fā)散條形圖

image
Lambrechts D, Wauters E, Boeckx B, et al. Phenotype molding of stromal cells in the lung tumor microenvironment[J]. Nat Med, 2018, 24(8): p. 1277-1289.

image
最近很多同學(xué)在后臺(tái)說(shuō)要講一下這個(gè)圖,今天來(lái)簡(jiǎn)單寫(xiě)一下。
Gene Set Variation Analysis(GSVA)被稱(chēng)為基因集變異分析,是一種非參數(shù)的無(wú)監(jiān)督分析方法,主要用來(lái)評(píng)估芯片和轉(zhuǎn)錄組的基因集富集結(jié)果。主要是通過(guò)將基因在不同樣品間的表達(dá)量矩陣轉(zhuǎn)化成基因集在樣品間的表達(dá)量矩陣,從而來(lái)評(píng)估不同的代謝通路在不同樣品間是否富集。通常用于單細(xì)胞轉(zhuǎn)錄組中,不同細(xì)胞類(lèi)型的差異基因集的比較。

GSVA流程示意圖。doi:10.1186/1471-2105-14-7
22
示例數(shù)據(jù)及代碼領(lǐng)取
詳見(jiàn):https://mp.weixin.qq.com/s/udBYwaFY_VpPGcJO0OH1Zg
GSVA
導(dǎo)入示例數(shù)據(jù)
## 為正常和腫瘤的內(nèi)皮細(xì)胞的基因表達(dá)矩陣
library(readxl)
library(dplyr)
dat <- read_excel("data_test.xlsx")
dat <- dat %>% data.frame()
row.names(dat) <- dat$gene
dat <- dat[,-1]
head(dat)
> head(dat)
EC1_T EC2_T EC3_T EC0_N EC1_N EC3_N EC4_N
RP11-34P13.3 0 0 0 0 0 0 0
FAM138A 0 0 0 0 0 0 0
OR4F5 0 0 0 0 0 0 0
RP11-34P13.7 0 0 0 0 0 0 0
RP11-34P13.8 0 0 0 0 0 0 0
RP11-34P13.14 0 0 0 0 0 0 0
參考基因集數(shù)據(jù)庫(kù)
MSigDB數(shù)據(jù)庫(kù)具體介紹詳見(jiàn):Q&A | 如何使用clusterProfiler對(duì)MSigDB數(shù)據(jù)庫(kù)進(jìn)行富集分析
這里我們使用:hallmark gene sets
開(kāi)始分析
#BiocManager::install("GSEABase")
BiocManager::install("GSVA", version = "3.14") # R 4.1.2 注意版本w
library('GSEABase')
library(GSVA)
geneSets <- getGmt('h.all.v7.5.1.symbols.gmt') ###下載的基因集
GSVA_hall <- gsva(expr=as.matrix(dat),
gset.idx.list=geneSets,
mx.diff=T, # 數(shù)據(jù)為正態(tài)分布則T,雙峰則F
kcdf="Gaussian", #CPM, RPKM, TPM數(shù)據(jù)就用默認(rèn)值"Gaussian", read count數(shù)據(jù)則為"Poisson",
parallel.sz=4) # 并行線(xiàn)程數(shù)目
head(GSVA_hall)
> head(GSVA_hall)
EC1_T EC2_T EC3_T EC0_N
HALLMARK_TNFA_SIGNALING_VIA_NFKB -0.43490528 -0.36929145 -0.4694267 0.57790329
HALLMARK_HYPOXIA -0.19830871 -0.15040810 -0.1237437 0.31595670
HALLMARK_CHOLESTEROL_HOMEOSTASIS -0.15223695 -0.15160770 -0.0505465 0.24273366
HALLMARK_MITOTIC_SPINDLE -0.29757233 0.20140000 0.3185932 -0.09284047
HALLMARK_WNT_BETA_CATENIN_SIGNALING -0.08827878 0.03086390 0.2383694 0.24045204
HALLMARK_TGF_BETA_SIGNALING -0.26357054 -0.01068719 0.3041132 0.28761931
EC1_N EC3_N EC4_N
HALLMARK_TNFA_SIGNALING_VIA_NFKB -0.1673606 0.2198982 0.33940521
HALLMARK_HYPOXIA -0.3292568 0.0906295 0.01407149
HALLMARK_CHOLESTEROL_HOMEOSTASIS -0.3566404 0.1532081 0.05151732
HALLMARK_MITOTIC_SPINDLE -0.3673806 0.1091566 -0.15295077
HALLMARK_WNT_BETA_CATENIN_SIGNALING -0.2352282 -0.1027337 -0.15598311
HALLMARK_TGF_BETA_SIGNALING -0.4387025 0.2336080 -0.14090342
limma差異通路分析
## limma
#BiocManager::install('limma')
library(limma)
# 設(shè)置或?qū)敕纸M
group <- factor(c(rep("Tumor", 3), rep("Normal", 4)), levels = c('Tumor', 'Normal'))
design <- model.matrix(~0+group)
colnames(design) = levels(factor(group))
rownames(design) = colnames(GSVA_hall)
design
# Tunor VS Normal
compare <- makeContrasts(Tumor - Normal, levels=design)
fit <- lmFit(GSVA_hall, design)
fit2 <- contrasts.fit(fit, compare)
fit3 <- eBayes(fit2)
Diff <- topTable(fit3, coef=1, number=200)
head(Diff)
> head(Diff)
logFC AveExpr t P.Value
HALLMARK_INTERFERON_GAMMA_RESPONSE -0.7080628 -0.021269395 -4.393867 0.0002713325
HALLMARK_INTERFERON_ALPHA_RESPONSE -0.7419587 -0.044864170 -4.299840 0.0003385128
HALLMARK_INFLAMMATORY_RESPONSE -0.5940473 -0.003525139 -4.193096 0.0004352397
HALLMARK_IL6_JAK_STAT3_SIGNALING -0.6117943 -0.038379008 -4.157977 0.0004727716
HALLMARK_TNFA_SIGNALING_VIA_NFKB -0.6670027 -0.043396767 -4.149307 0.0004825257
HALLMARK_ALLOGRAFT_REJECTION -0.4645747 0.015248663 -3.278981 0.0036971108
adj.P.Val B
HALLMARK_INTERFERON_GAMMA_RESPONSE 0.004825257 0.43891329
HALLMARK_INTERFERON_ALPHA_RESPONSE 0.004825257 0.22763343
HALLMARK_INFLAMMATORY_RESPONSE 0.004825257 -0.01221232
HALLMARK_IL6_JAK_STAT3_SIGNALING 0.004825257 -0.09109393
HALLMARK_TNFA_SIGNALING_VIA_NFKB 0.004825257 -0.11056468
HALLMARK_ALLOGRAFT_REJECTION 0.030809257 -2.03863951
發(fā)散條形圖繪制
## barplot
dat_plot <- data.frame(id = row.names(Diff),
t = Diff$t)
# 去掉"HALLMARK_"
library(stringr)
dat_plot$id <- str_replace(dat_plot$id , "HALLMARK_","")
# 新增一列 根據(jù)t閾值分類(lèi)
dat_plot$threshold = factor(ifelse(dat_plot$t >-2, ifelse(dat_plot$t >= 2 ,'Up','NoSignifi'),'Down'),levels=c('Up','Down','NoSignifi'))
# 排序
dat_plot <- dat_plot %>% arrange(t)
# 變成因子類(lèi)型
dat_plot$id <- factor(dat_plot$id,levels = dat_plot$id)
# 繪制
library(ggplot2)
library(ggtheme)
# install.packages("ggprism")
library(ggprism)
p <- ggplot(data = dat_plot,aes(x = id,y = t,fill = threshold)) +
geom_col()+
coord_flip() +
scale_fill_manual(values = c('Up'= '#36638a','NoSignifi'='#cccccc','Down'='#7bcd7b')) +
geom_hline(yintercept = c(-2,2),color = 'white',size = 0.5,lty='dashed') +
xlab('') +
ylab('t value of GSVA score, tumour versus non-malignant') + #注意坐標(biāo)軸旋轉(zhuǎn)了
guides(fill=F)+ # 不顯示圖例
theme_prism(border = T) +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank()
)
p
# 添加標(biāo)簽
# 此處參考了:https://mp.weixin.qq.com/s/eCMwWCnjTyQvNX2wNaDYXg
# 小于-2的數(shù)量
low1 <- dat_plot %>% filter(t < -2) %>% nrow()
# 小于0總數(shù)量
low0 <- dat_plot %>% filter( t < 0) %>% nrow()
# 小于2總數(shù)量
high0 <- dat_plot %>% filter(t < 2) %>% nrow()
# 總的柱子數(shù)量
high1 <- nrow(dat_plot)
# 依次從下到上添加標(biāo)簽
p <- p + geom_text(data = dat_plot[1:low1,],aes(x = id,y = 0.1,label = id),
hjust = 0,color = 'black') + # 小于-1的為黑色標(biāo)簽
geom_text(data = dat_plot[(low1 +1):low0,],aes(x = id,y = 0.1,label = id),
hjust = 0,color = 'grey') + # 灰色標(biāo)簽
geom_text(data = dat_plot[(low0 + 1):high0,],aes(x = id,y = -0.1,label = id),
hjust = 1,color = 'grey') + # 灰色標(biāo)簽
geom_text(data = dat_plot[(high0 +1):high1,],aes(x = id,y = -0.1,label = id),
hjust = 1,color = 'black') # 大于1的為黑色標(biāo)簽
ggsave("gsva_bar.pdf",p,width = 8,height = 8)

結(jié)果展示