前情提要:
SPATA:基因集驅(qū)動(dòng)的空間轉(zhuǎn)錄組分析框架
enrichplot||基因富集結(jié)果可視化解決方案
單細(xì)胞數(shù)據(jù)挖掘||Pathway and Network Analysis of Gene List
Pathway analyses是組學(xué)實(shí)驗(yàn)分析的關(guān)鍵方法。然而,整合來自不同組學(xué)技術(shù)和不同物種的數(shù)據(jù)仍然需要大量的生物信息學(xué)知識(shí)。在單細(xì)胞數(shù)據(jù)科學(xué)中,Pathway analyses也是基因表達(dá)數(shù)據(jù)鏈接其他組學(xué)的主要路徑,如單細(xì)胞轉(zhuǎn)錄組與代謝組學(xué)之間的鏈接。本文介紹的ReactomeGSA可以使用reacome數(shù)據(jù)庫(kù)分析scRNA-seq數(shù)據(jù)。將來自來自不同物種的數(shù)據(jù)自動(dòng)映射到一個(gè)共同的路徑空間(基于生命科學(xué)底層的保守型)。將來自ExpressionAtlas和SingleCellExpressionAtlas的公開數(shù)據(jù)可以直接整合到分析中。ReactomeGSA大大降低了多組學(xué)、跨物種、比較通路分析的技術(shù)障礙。
通過ReactomeGSA R包的“analyze_sc_clusters”功能支持對(duì)scRNA-seq數(shù)據(jù)的分析,也可以通過直接從單細(xì)胞表達(dá)圖譜導(dǎo)入。在這兩種情況下,我們計(jì)算基因在一個(gè)簇內(nèi)的平均表達(dá)量。這可以通過“Seurat”的AverageExpression函數(shù),或者通過scater的“aggregateaccroscells”函數(shù)來完成,具體取決于輸入對(duì)象。其實(shí)就是把數(shù)據(jù)庫(kù)與富集方法封裝到一起了。

這里我們只看ReactomeGSA 處理單細(xì)胞的過程,其他過程可以自行探索。
if (!require(ReactomeGSA))
BiocManager::install("ReactomeGSA")
#> Loading required package: ReactomeGSA
# install the ReactomeGSA.data package for the example data
if (!require(ReactomeGSA)) BiocManager::install("ReactomeGSA.data")
library(ReactomeGSA.data)
library(ReactomeGSA)
#> Loading required package: limma
#> Loading required package: edgeR
#> Loading required package: Seurat
data(jerby_b_cells)
jerby_b_cells
An object of class Seurat
23686 features across 920 samples within 1 assay
Active assay: RNA (23686 features, 0 variable features)
> head(jerby_b_cells@meta.data)
orig.ident nCount_RNA nFeature_RNA
cy53_1_CD45_neg_H12_S384_comb cy53 50508 8124
Cy81_FNA_CD45_G01_S265_comb Cy81 292554 3301
cy94_cd45pos_F03_S159_comb cy94 240094 4246
cy79_p4_CD45_pos_PD1_pos_F03_S351_comb cy79 177665 3824
cy94_cd45pos_C05_S125_comb cy94 830771 7004
CY89A_CD45_POS_6_H05_S185_comb
輸入一個(gè)Seurat對(duì)象,analyse_sc_clusters就可以完成Seuat的基本分析。
gsva_result <- analyse_sc_clusters(jerby_b_cells, verbose = TRUE)
Calculating average cluster expression...
Converting expression data to string... (This may take a moment)
Conversion complete
Submitting request to Reactome API...
Compressing request data...
Reactome Analysis submitted succesfully
Mapping identifiers...
Performing gene set analysis using ssGSEA
Analysing dataset 'Seurat' using ssGSEA
Retrieving result...
head(gsva_result@results$Seurat$fold_changes)
Identifier Cluster_1 Cluster_10 Cluster_11 Cluster_12 Cluster_13 Cluster_2 Cluster_3 Cluster_4 Cluster_5
1 RPS11 407.214286 613.350 278.096774 204.72 104.8750000 436.271930 294.0825688 426.179245 514.200000
2 ELMO2 9.707143 23.050 12.838710 8.24 9.0000000 26.289474 0.6697248 12.273585 14.288889
3 PNMA1 11.214286 0.900 8.387097 0.00 11.8333333 14.605263 1.9908257 16.339623 2.533333
4 MMP2 86.792857 0.000 454.419355 0.00 186.0833333 196.833333 0.0000000 155.632075 1032.944444
5 TMEM216 2.292857 12.400 6.516129 3.76 0.5416667 6.342105 5.3577982 6.103774 6.188889
6 ZHX3 18.392857 12.375 14.483871 8.44 3.5000000 17.289474 5.3119266 11.990566 18.444444
Cluster_6 Cluster_7 Cluster_8 Cluster_9
1 374.258824 120.7090909 705.537037 236.9148936
2 1.823529 7.1090909 54.277778 15.9148936
3 2.717647 1.8727273 6.037037 0.8085106
4 0.000000 0.3272727 1.148148 0.0212766
5 9.682353 1.2727273 9.703704 2.0851064
6 4.658824 1.0363636 22.462963 10.6808511
可以看到其實(shí)是以亞群為單位的ssGSEA分析,也啟發(fā)我們?cè)谧鰡渭?xì)胞富集分析的時(shí)候,如果計(jì)算量太大就可以考慮用亞群的平均表達(dá)量來。在此也反映,單細(xì)胞數(shù)據(jù)分析的基本單位是細(xì)胞亞群而不是單個(gè)細(xì)胞。
通路分析就一個(gè)函數(shù)
pathway_expression <- pathways(gsva_result)
> colnames(pathway_expression) <- gsub("\\.Seurat", "", colnames(pathway_expression))
> pathway_expression[1:3,]
Name Cluster_1 Cluster_10 Cluster_11 Cluster_12 Cluster_13 Cluster_2 Cluster_3
R-HSA-1059683 Interleukin-6 signaling 0.1067040 0.09940645 0.1509142 0.10414911 0.11368072 0.11979177 0.11156012
R-HSA-109606 Intrinsic Pathway for Apoptosis 0.1063122 0.10317826 0.1116539 0.10910644 0.12484602 0.10068176 0.10487716
R-HSA-109703 PKB-mediated events 0.1274642 0.05289859 0.1062885 0.09587804 0.07340752 0.08322147 0.08406187
Cluster_4 Cluster_5 Cluster_6 Cluster_7 Cluster_8 Cluster_9
R-HSA-1059683 0.11429424 0.1151514 0.09620555 0.11560454 0.13346414 0.10350569
R-HSA-109606 0.10839244 0.1021714 0.10272290 0.11024641 0.11423797 0.11243046
R-HSA-109703 0.05573774 0.0458425 0.12398024 0.07708516 0.07818166 0.01426709
找出有差異的通路
# find the maximum differently expressed pathway
max_difference <- do.call(rbind, apply(pathway_expression, 1, function(row) {
+ values <- as.numeric(row[2:length(row)])
+ return(data.frame(name = row[1], min = min(values), max = max(values)))
+ }))
max_difference$diff <- max_difference$max - max_difference$min
# sort based on the difference
max_difference <- max_difference[order(max_difference$diff, decreasing = T), ] head(max_difference)
name min max diff
R-HSA-350864 Regulation of thyroid hormone activity -0.4873008 0.3750639 0.8623647
R-HSA-8964540 Alanine metabolism -0.5061766 0.2556097 0.7617863
R-HSA-190374 FGFR1c and Klotho ligand binding and activation -0.3439743 0.4157764 0.7597507
R-HSA-140180 COX reactions -0.3460100 0.3727507 0.7187606
R-HSA-5263617 Metabolism of ingested MeSeO2H into MeSeH -0.1932855 0.4938234 0.6871089
R-HSA-9024909 BDNF activates NTRK2 (TRKB) signaling -0.3786589 0.3011768 0.6798357
感興趣通路的可視化,可以看出哪些亞群在該通路比較高的或比較低的。當(dāng)然,我們這里沒有做細(xì)胞類型注釋,所以你也看不出具體的生物學(xué)意義啦。這里繪制的一條通路是 Alanine metabolism(丙氨酸代謝),肌肉組織中的氨基酸經(jīng)轉(zhuǎn)氨基作用將氨基轉(zhuǎn)給丙酮酸生成丙氨酸,經(jīng)血液運(yùn)輸?shù)礁闻K,在肝臟中丙氨酸通過聯(lián)合脫氨基作用生成丙酮酸和游離氨,可經(jīng)糖異生作用生成葡萄糖,葡萄糖由血液運(yùn)輸?shù)郊∪饨M織中沿糖分解途徑再產(chǎn)生丙酮酸,后者再接受氨基生成丙氨酸,通過這循環(huán)使肌肉中的氨以無毒氨基酸形式運(yùn)輸?shù)礁危瑫r(shí)肝也為肌肉提供了生成丙酮酸的葡萄糖。
library(tidyverse)
p1 <- plot_gsva_pathway(gsva_result, pathway_id = rownames(max_difference)[1])
p1$data %>% mutate(absmy = ifelse(expr>=0, "Z","Fy")) -> df
?element_blank
df %>% ggplot(aes(cluster_id, expr ,fill=absmy ))+
geom_bar(stat='identity') + theme_bw()+
theme(axis.text.x = element_text(angle =45,hjust = .9,size = 10,vjust = 0.9))+
ggtitle("Alanine metabolism") + theme(legend.position="none")

# Additional parameters are directly passed to gplots heatmap.2 function
plot_gsva_heatmap(gsva_result, max_pathways = 15, margins = c(6,20))

選擇特定的通路
# limit to selected B cell related pathways
relevant_pathways <- c("R-HSA-983170", "R-HSA-388841", "R-HSA-2132295", "R-HSA-983705", "R-HSA-5690714")
plot_gsva_heatmap(gsva_result,
pathway_ids = relevant_pathways, # limit to these pathways
margins = c(6,30), # adapt the figure margins in heatmap.2
dendrogram = "col", # only plot column dendrogram
scale = "row", # scale for each pathway
key = FALSE, # don't display the color key
lwid=c(0.1,4)) # remove the white space on the left

其實(shí)做完GSVA之后,我們相當(dāng)于得到了一個(gè)關(guān)于通路的矩陣,也可以做降維聚類,如PCA,可以看到哪些亞群在pca空間上更為相近。
plot_gsva_pca(gsva_result)

https://www.mcponline.org/article/S1535-9476(20)60015-9/fulltext