微信公眾號:研平方
關注可了解更多的科研教程及技巧。如有問題或建議,請在公眾號留言
歡迎關注我:一起學習,一起進步!
已經(jīng)很久沒有再用R語言跑過數(shù)據(jù)了,最近有朋友需要跑GSVA,順便重溫了下R,現(xiàn)將內容分享如下。
1.GSVA簡介
GSVA全名Gene set variation analysis(基因集變異分析),是一種非參數(shù),無監(jiān)督的算法。與GSEA不同,GSVA不需要預先對樣本進行分組,可以計算每個樣本中特定基因集的富集分數(shù)。換而言之,GSVA轉化了基因表達數(shù)據(jù),從單個基因作為特征的表達矩陣,轉化為特定基因集作為特征的表達矩陣。GSVA對基因富集結果進行了量化,可以更方便地進行后續(xù)統(tǒng)計分析。如果用limma包做差異表達分析可以尋找樣本間差異表達的基因,同樣地,使用limma包對GSVA的結果(依然是一個矩陣)做同樣的分析,則可以尋找樣本間有顯著差異的基因集。這些“差異表達”的基因集,相對于基因而言,更加具有生物學意義,更具有可解釋性,可以進一步用于腫瘤subtype的分型等等與生物學意義結合密切的探究。

GSVA
2.準備數(shù)據(jù)
2.1 加載相應的包
setwd(" ")
rm(list = ls())
options(stringsAsFactors = F)
library(GSVA)
library(GSEABase)
library(msigdbr)
library(clusterProfiler)
library(org.Hs.eg.db)
library(enrichplot)
library(limma)
2.2 Expression Data
exprSet <- read.table("exprSet.txt",header = T,sep = ",")
rownames(exprSet) <- exprSet$X
exprSet <- exprSet[,-1]
str(exprSet)
2.3 自定義基因集
2.3.1 版本一:沒眼睛看
pathway <- read_delim("pathway.txt", "\t",
escape_double = FALSE, trim_ws = TRUE)
pathway <- as.data.frame(pathway)
if(T){
T_cell_activation <- unique(na.omit(pathway$`T cell activation`))
toll_like_receptor_signaling_pathway <- unique(na.omit(pathway$`toll-like receptor signaling pathway`))
leukocyte_differentiation <- unique(na.omit(pathway$`leukocyte differentiation`))
positive_regulation_of_cell_death <- unique(na.omit(pathway$`positive regulation of cell death`))
neutrophil_activation <- unique(na.omit(pathway$`neutrophil activation`))
positive_regulation_of_immune_response <- unique(na.omit(pathway$`positive regulation of immune response`))
}
pathway_list <- list(T_cell_activation,toll_like_receptor_signaling_pathway,leukocyte_differentiation,
positive_regulation_of_cell_death,neutrophil_activation,positive_regulation_of_immune_response)
names(pathway_list) <- c("T cell activation","toll-like receptor signaling pathway","leukocyte differentiation",
"positive regulation of cell death","neutrophil activation","positive regulation of immune response")
2.3.2 版本二:for循環(huán)
pathway_list <- vector("list",length(pathway))
for (i in seq_along(pathway)) {
pathway_list[[i]] <- unique(na.omit(pathway[,i]))
}
names(pathway_list) <- c("T cell activation","toll-like receptor signaling pathway","leukocyte differentiation",
"positive regulation of cell death","neutrophil activation","positive regulation of immune response")
2.3.3 版本二:lappy()
pathway_list <- lapply(pathway, function(x) {
unique(na.omit(x))
})
不得不說,apply()家族是真的香呀!
3.實戰(zhàn)
gsva_matrix_BD <- gsva(as.matrix(exprSet), pathway_list,method='gsva',
kcdf='Gaussian',abs.ranking=TRUE)
write.csv(gsva_matrix_BD,file = "gsva_matrix_BD.csv")
4.結果

GSVA富集分析結果