ReactomeGSA:高效的單細(xì)胞通路分析工具

前情提要:

SPATA:基因集驅(qū)動(dòng)的空間轉(zhuǎn)錄組分析框架
enrichplot||基因富集結(jié)果可視化解決方案
單細(xì)胞數(shù)據(jù)挖掘||Pathway and Network Analysis of Gene List

Pathway analyses是組學(xué)實(shí)驗(yàn)分析的關(guān)鍵方法。然而,整合來自不同組學(xué)技術(shù)和不同物種的數(shù)據(jù)仍然需要大量的生物信息學(xué)知識(shí)。在單細(xì)胞數(shù)據(jù)科學(xué)中,Pathway analyses也是基因表達(dá)數(shù)據(jù)鏈接其他組學(xué)的主要路徑,如單細(xì)胞轉(zhuǎn)錄組與代謝組學(xué)之間的鏈接。本文介紹的ReactomeGSA可以使用reacome數(shù)據(jù)庫(kù)分析scRNA-seq數(shù)據(jù)。將來自來自不同物種的數(shù)據(jù)自動(dòng)映射到一個(gè)共同的路徑空間(基于生命科學(xué)底層的保守型)。將來自ExpressionAtlas和SingleCellExpressionAtlas的公開數(shù)據(jù)可以直接整合到分析中。ReactomeGSA大大降低了多組學(xué)、跨物種、比較通路分析的技術(shù)障礙。

通過ReactomeGSA R包的“analyze_sc_clusters”功能支持對(duì)scRNA-seq數(shù)據(jù)的分析,也可以通過直接從單細(xì)胞表達(dá)圖譜導(dǎo)入。在這兩種情況下,我們計(jì)算基因在一個(gè)簇內(nèi)的平均表達(dá)量。這可以通過“Seurat”的AverageExpression函數(shù),或者通過scater的“aggregateaccroscells”函數(shù)來完成,具體取決于輸入對(duì)象。其實(shí)就是把數(shù)據(jù)庫(kù)與富集方法封裝到一起了。

ReactomeGSA - Efficient Multi-Omics Comparative Pathway Analysis

這里我們只看ReactomeGSA 處理單細(xì)胞的過程,其他過程可以自行探索。

if (!require(ReactomeGSA))
  BiocManager::install("ReactomeGSA")
#> Loading required package: ReactomeGSA

# install the ReactomeGSA.data package for the example data
if (!require(ReactomeGSA)) BiocManager::install("ReactomeGSA.data")

library(ReactomeGSA.data)
library(ReactomeGSA)
#> Loading required package: limma
#> Loading required package: edgeR
#> Loading required package: Seurat
data(jerby_b_cells)
jerby_b_cells
An object of class Seurat 
23686 features across 920 samples within 1 assay 
Active assay: RNA (23686 features, 0 variable features)
> head(jerby_b_cells@meta.data)
                                       orig.ident nCount_RNA nFeature_RNA
cy53_1_CD45_neg_H12_S384_comb                cy53      50508         8124
Cy81_FNA_CD45_G01_S265_comb                  Cy81     292554         3301
cy94_cd45pos_F03_S159_comb                   cy94     240094         4246
cy79_p4_CD45_pos_PD1_pos_F03_S351_comb       cy79     177665         3824
cy94_cd45pos_C05_S125_comb                   cy94     830771         7004
CY89A_CD45_POS_6_H05_S185_comb

輸入一個(gè)Seurat對(duì)象,analyse_sc_clusters就可以完成Seuat的基本分析。

gsva_result <- analyse_sc_clusters(jerby_b_cells, verbose = TRUE)

Calculating average cluster expression...
Converting expression data to string... (This may take a moment)
Conversion complete
Submitting request to Reactome API...
Compressing request data...
Reactome Analysis submitted succesfully
Mapping identifiers...                                                                                                        
Performing gene set analysis using ssGSEA                                                                                     
Analysing dataset 'Seurat' using ssGSEA                                                                                       
Retrieving result...                 

 head(gsva_result@results$Seurat$fold_changes)
  Identifier  Cluster_1 Cluster_10 Cluster_11 Cluster_12  Cluster_13  Cluster_2   Cluster_3  Cluster_4   Cluster_5
1      RPS11 407.214286    613.350 278.096774     204.72 104.8750000 436.271930 294.0825688 426.179245  514.200000
2      ELMO2   9.707143     23.050  12.838710       8.24   9.0000000  26.289474   0.6697248  12.273585   14.288889
3      PNMA1  11.214286      0.900   8.387097       0.00  11.8333333  14.605263   1.9908257  16.339623    2.533333
4       MMP2  86.792857      0.000 454.419355       0.00 186.0833333 196.833333   0.0000000 155.632075 1032.944444
5    TMEM216   2.292857     12.400   6.516129       3.76   0.5416667   6.342105   5.3577982   6.103774    6.188889
6       ZHX3  18.392857     12.375  14.483871       8.44   3.5000000  17.289474   5.3119266  11.990566   18.444444
   Cluster_6   Cluster_7  Cluster_8   Cluster_9
1 374.258824 120.7090909 705.537037 236.9148936
2   1.823529   7.1090909  54.277778  15.9148936
3   2.717647   1.8727273   6.037037   0.8085106
4   0.000000   0.3272727   1.148148   0.0212766
5   9.682353   1.2727273   9.703704   2.0851064
6   4.658824   1.0363636  22.462963  10.6808511

可以看到其實(shí)是以亞群為單位的ssGSEA分析,也啟發(fā)我們?cè)谧鰡渭?xì)胞富集分析的時(shí)候,如果計(jì)算量太大就可以考慮用亞群的平均表達(dá)量來。在此也反映,單細(xì)胞數(shù)據(jù)分析的基本單位是細(xì)胞亞群而不是單個(gè)細(xì)胞。

通路分析就一個(gè)函數(shù)

 pathway_expression <- pathways(gsva_result)
> colnames(pathway_expression) <- gsub("\\.Seurat", "", colnames(pathway_expression))
> pathway_expression[1:3,]
                                         Name Cluster_1 Cluster_10 Cluster_11 Cluster_12 Cluster_13  Cluster_2  Cluster_3
R-HSA-1059683         Interleukin-6 signaling 0.1067040 0.09940645  0.1509142 0.10414911 0.11368072 0.11979177 0.11156012
R-HSA-109606  Intrinsic Pathway for Apoptosis 0.1063122 0.10317826  0.1116539 0.10910644 0.12484602 0.10068176 0.10487716
R-HSA-109703              PKB-mediated events 0.1274642 0.05289859  0.1062885 0.09587804 0.07340752 0.08322147 0.08406187
               Cluster_4 Cluster_5  Cluster_6  Cluster_7  Cluster_8  Cluster_9
R-HSA-1059683 0.11429424 0.1151514 0.09620555 0.11560454 0.13346414 0.10350569
R-HSA-109606  0.10839244 0.1021714 0.10272290 0.11024641 0.11423797 0.11243046
R-HSA-109703  0.05573774 0.0458425 0.12398024 0.07708516 0.07818166 0.01426709

找出有差異的通路

 # find the maximum differently expressed pathway
 max_difference <- do.call(rbind, apply(pathway_expression, 1, function(row) {
+   values <- as.numeric(row[2:length(row)])
+   return(data.frame(name = row[1], min = min(values), max = max(values)))
+ }))
 max_difference$diff <- max_difference$max - max_difference$min
# sort based on the difference
max_difference <- max_difference[order(max_difference$diff, decreasing = T), ] head(max_difference)

                                                         name        min       max      diff
R-HSA-350864           Regulation of thyroid hormone activity -0.4873008 0.3750639 0.8623647
R-HSA-8964540                              Alanine metabolism -0.5061766 0.2556097 0.7617863
R-HSA-190374  FGFR1c and Klotho ligand binding and activation -0.3439743 0.4157764 0.7597507
R-HSA-140180                                    COX reactions -0.3460100 0.3727507 0.7187606
R-HSA-5263617       Metabolism of ingested MeSeO2H into MeSeH -0.1932855 0.4938234 0.6871089
R-HSA-9024909           BDNF activates NTRK2 (TRKB) signaling -0.3786589 0.3011768 0.6798357

感興趣通路的可視化,可以看出哪些亞群在該通路比較高的或比較低的。當(dāng)然,我們這里沒有做細(xì)胞類型注釋,所以你也看不出具體的生物學(xué)意義啦。這里繪制的一條通路是 Alanine metabolism(丙氨酸代謝),肌肉組織中的氨基酸經(jīng)轉(zhuǎn)氨基作用將氨基轉(zhuǎn)給丙酮酸生成丙氨酸,經(jīng)血液運(yùn)輸?shù)礁闻K,在肝臟中丙氨酸通過聯(lián)合脫氨基作用生成丙酮酸和游離氨,可經(jīng)糖異生作用生成葡萄糖,葡萄糖由血液運(yùn)輸?shù)郊∪饨M織中沿糖分解途徑再產(chǎn)生丙酮酸,后者再接受氨基生成丙氨酸,通過這循環(huán)使肌肉中的氨以無毒氨基酸形式運(yùn)輸?shù)礁危瑫r(shí)肝也為肌肉提供了生成丙酮酸的葡萄糖。

library(tidyverse)
p1 <- plot_gsva_pathway(gsva_result, pathway_id = rownames(max_difference)[1]) 
p1$data %>% mutate(absmy = ifelse(expr>=0, "Z","Fy")) -> df
?element_blank
df %>% ggplot(aes(cluster_id,   expr ,fill=absmy    ))+ 
  geom_bar(stat='identity') + theme_bw()+ 
  theme(axis.text.x = element_text(angle =45,hjust = .9,size = 10,vjust = 0.9))+ 
  ggtitle("Alanine metabolism") + theme(legend.position="none")
# Additional parameters are directly passed to gplots heatmap.2 function
plot_gsva_heatmap(gsva_result, max_pathways = 15, margins = c(6,20))

選擇特定的通路

# limit to selected B cell related pathways
relevant_pathways <- c("R-HSA-983170", "R-HSA-388841", "R-HSA-2132295", "R-HSA-983705", "R-HSA-5690714")
plot_gsva_heatmap(gsva_result, 
                  pathway_ids = relevant_pathways, # limit to these pathways
                  margins = c(6,30), # adapt the figure margins in heatmap.2
                  dendrogram = "col", # only plot column dendrogram
                  scale = "row", # scale for each pathway
                  key = FALSE, # don't display the color key
                  lwid=c(0.1,4)) # remove the white space on the left

其實(shí)做完GSVA之后,我們相當(dāng)于得到了一個(gè)關(guān)于通路的矩陣,也可以做降維聚類,如PCA,可以看到哪些亞群在pca空間上更為相近。

plot_gsva_pca(gsva_result)

https://www.mcponline.org/article/S1535-9476(20)60015-9/fulltext

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容