使用MetaboDiff包分析非靶向代謝組數(shù)據(jù)

最近手里有個非靶向代謝組的數(shù)據(jù),通過學(xué)習MetaboDiff包來熟悉代謝組分析的思路和流程,接下來的流程來自于MetaboDiff包官方幫助文檔。

1. MetaboDiff包安裝
library("devtools")
install_github("andreasmock/MetaboDiff")
library(MetaboDiff)

2. 數(shù)據(jù)處理
2.1數(shù)據(jù)的導(dǎo)入

MetaboDiff包需要三個數(shù)據(jù):

  1. assay - 包含代謝物的相對豐度的數(shù)據(jù)矩陣;
  2. rowData -包含代謝物注釋信息的數(shù)據(jù) 框;
  3. colData - 包含樣本元數(shù)據(jù)的數(shù)據(jù)框。

MetaboDiff包自帶的示例數(shù)據(jù)來自于這篇文獻AKT1 and MYC Induce Distinctive Metabolic Fingerprints in Human Prostate Cancer。代謝組數(shù)據(jù)來自于61個前列腺癌病人和25個正常人的前列腺組織。
先查看一下這個三個數(shù)據(jù)。

> assay[1:5,1:5]
         pat1      pat2      pat3     pat4      pat5
met1 33964.73 117318.43 118856.90  78670.7 102565.94
met2 18505.56 167585.32  59621.97  66220.4  74892.27
met3       NA  42373.93  27141.21       NA  38390.78
met4 61638.77  74595.78        NA       NA        NA
met5       NA 148363.61  43861.79 105835.2  25589.08

> head(colData)
       id tumor_normal random_gender   group
pat1  cp2            N        female Control
pat2  cp7            N        female Control
pat3 cp19            N          male Control
pat4 cp26            N          male Control
pat5 cp29            N        female Control
pat6 cp32            N          male Control

> head(rowData)
                                    BIOCHEMICAL    SUPER_PATHWAY      SUB_PATHWAY METABOLON_ID
met1  1-arachidonoylglycerophosphoethanolamine*            Lipid        Lysolipid        35186
met2      1-arachidonoylglycerophosphoinositol*            Lipid        Lysolipid        34214
met3                      1-arachidonylglycerol            Lipid Monoacylglycerol        34397
met4      1-eicosadienoylglycerophosphocholine*            Lipid        Lysolipid        33871
met5 1-heptadecanoylglycerophosphoethanolamine* No Super Pathway       No Pathway        37419
met6       1-linoleoylglycerol (1-monolinolein)            Lipid Monoacylglycerol        27447
      PLATFORM KEGG_ID   HMDB_ID
met1 LC/MS neg    <NA> HMDB11517
met2 LC/MS neg    <NA>      <NA>
met3 LC/MS neg  C13857 HMDB11572
met4 LC/MS pos    <NA>      <NA>
met5 LC/MS neg    <NA>      <NA>
met6 LC/MS neg    <NA>      <NA>

#將三個數(shù)據(jù)集融合成一個以便于下游分析。
> (met <- create_mae(assay,rowData,colData))
A MultiAssayExperiment object of 1 listed
 experiment with a user-defined name and respective class. 
 Containing an ExperimentList class object of length 1: 
 [1] raw: SummarizedExperiment with 307 rows and 86 columns 
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices

2.2 代謝物的注釋

如果HMDB、KEGG或ChEBI id是rowData數(shù)據(jù)集的一部分,則可以從小分子通路數(shù)據(jù)庫(SMPDB)檢索進行代謝產(chǎn)物注釋。

> met <- get_SMPDBanno(met,
+                           column_kegg_id=6,
+                           column_hmdb_id=7,
+                           column_chebi_id=NA)

2.3 處理缺失值
> na_heatmap(met,
+            group_factor="tumor_normal",
+            label_colors=c("darkseagreen","dodgerblue"))

#剔除缺失值,計算代謝物的相對豐度。
> (met = knn_impute(met,cutoff=0.4))
A MultiAssayExperiment object of 2 listed
 experiments with user-defined names and respective classes. 
 Containing an ExperimentList class object of length 2: 
 [1] raw: SummarizedExperiment with 307 rows and 86 columns 
 [2] imputed: SummarizedExperiment with 238 rows and 86 columns 
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices

2.4 異常值熱圖

在標準化數(shù)據(jù)之前,我們需要剔除數(shù)據(jù)中的異常值。

> outlier_heatmap(met,
+                 group_factor="tumor_normal",
+                 label_colors=c("darkseagreen","dodgerblue"),
+                 k=2)

根據(jù)上述熱圖,設(shè)置了k=2, 熱圖形成了cluster1和cluster2,cluster1相對cluster2便是異常值,我們將剔除cluster1。

> (met <- remove_cluster(met,cluster=1))
harmonizing input:
  removing 5 sampleMap rows with 'colname' not in colnames of experiments
harmonizing input:
  removing 5 sampleMap rows with 'colname' not in colnames of experiments
  removing 5 colData rownames not in sampleMap 'primary'
A MultiAssayExperiment object of 2 listed
 experiments with user-defined names and respective classes. 
 Containing an ExperimentList class object of length 2: 
 [1] raw: SummarizedExperiment with 307 rows and 81 columns 
 [2] imputed: SummarizedExperiment with 238 rows and 81 columns 
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices

2.5 數(shù)據(jù)標準化
> (met <- normalize_met(met))
vsn2: 307 x 81 matrix (1 stratum). 
Please use 'meanSdPlot' to verify the fit.
vsn2: 238 x 81 matrix (1 stratum). 
Please use 'meanSdPlot' to verify the fit.
A MultiAssayExperiment object of 4 listed
 experiments with user-defined names and respective classes. 
 Containing an ExperimentList class object of length 4: 
 [1] raw: SummarizedExperiment with 307 rows and 81 columns 
 [2] imputed: SummarizedExperiment with 238 rows and 81 columns 
 [3] norm: SummarizedExperiment with 307 rows and 81 columns 
 [4] norm_imputed: SummarizedExperiment with 238 rows and 81 columns 
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices

2.6 數(shù)據(jù)標準化質(zhì)控
> quality_plot(met,
+              group_factor="tumor_normal",
+              label_colors=c("darkseagreen","dodgerblue"))
harmonizing input:
  removing 243 sampleMap rows not in names(experiments)
harmonizing input:
  removing 243 sampleMap rows not in names(experiments)
harmonizing input:
  removing 243 sampleMap rows not in names(experiments)
harmonizing input:
  removing 243 sampleMap rows not in names(experiments)
Warning messages:
1: Removed 5356 rows containing non-finite values (stat_boxplot). 
2: Removed 5356 rows containing non-finite values (stat_boxplot). 

3. 數(shù)據(jù)分析
3.1 無監(jiān)督分析

MetaboDiff包提供了線性降維方法PCA和非線性降維方法tSNE。

> source("http://peterhaschke.com/Code/multiplot.R")
> multiplot(
+   pca_plot(met,
+            group_factor="tumor_normal",
+            label_colors=c("darkseagreen","dodgerblue")),
+   tsne_plot(met,
+             group_factor="tumor_normal",
+             label_colors=c("darkseagreen","dodgerblue")),
+   cols=2)
sigma summary: Min. : 0.486945518988849 |1st Qu. : 0.714292832194587 |Median : 0.752934663223126 |Mean : 0.75914557339073 |3rd Qu. : 0.808081774279559 |Max. : 0.939549187337462 |
Epoch: Iteration #100 error is: 18.6145995899728
Epoch: Iteration #200 error is: 1.54407709770312
Epoch: Iteration #300 error is: 1.22290267643501
Epoch: Iteration #400 error is: 1.11106327484334
Epoch: Iteration #500 error is: 1.03658104678225
Epoch: Iteration #600 error is: 0.976566767973725
Epoch: Iteration #700 error is: 0.951849496540308
Epoch: Iteration #800 error is: 0.93612964053674
Epoch: Iteration #900 error is: 0.914421902208305
Epoch: Iteration #1000 error is: 0.88283039690459

3.2 假設(shè)檢驗

對單個代謝物進行差異分析,主要用T檢驗和ANOVA分析。

> met = diff_test(met,
+                 group_factors = c("tumor_normal","random_gender"))
> str(metadata(met), max.level=2)
List of 2
 $ ttest_tumor_normal_T_vs_N         :'data.frame': 238 obs. of  3 variables:
  ..$ pval       : num [1:238] 0.0206 0.7808 0.0832 0.0432 0.5859 ...
  ..$ adj_pval   : num [1:238] 0.102 0.904 0.221 0.158 0.758 ...
  ..$ fold_change: num [1:238] 0.2872 0.0366 -0.3936 -0.5391 -0.1646 ...
 $ ttest_random_gender_male_vs_female:'data.frame': 238 obs. of  3 variables:
  ..$ pval       : num [1:238] 0.2318 0.8626 0.4048 0.0121 0.2111 ...
  ..$ adj_pval   : num [1:238] 0.83 0.959 0.862 0.386 0.83 ...
  ..$ fold_change: num [1:238] -0.1372 -0.0208 0.1742 0.607 0.3438 ...
#以tumor和normal分組進行差異分析
> volcano_plot(met, 
+              group_factor="tumor_normal",
+              label_colors=c("darkseagreen","dodgerblue"),
+              p_adjust = FALSE)
> volcano_plot(met, 
+              group_factor="tumor_normal",
+              label_colors=c("darkseagreen","dodgerblue"),
+              p_adjust = TRUE)


#以female和male分組進行差異分析
> par(mfrow=c(1,2))
> volcano_plot(met, 
+              group_factor="random_gender",
+              label_colors=c("brown","orange"),
+              p_adjust = FALSE)
> volcano_plot(met, 
+              group_factor="random_gender",
+              label_colors=c("brown","orange"),
+              p_adjust = TRUE)

3.3 代謝物關(guān)聯(lián)網(wǎng)絡(luò)分析

相關(guān)分析被成功應(yīng)用在比較轉(zhuǎn)錄組分析中揭示具生物學(xué)意義的模塊的變化情況。同樣是思路也可以應(yīng)用于代謝組數(shù)據(jù)分析中。

> met_example <- met_example %>%
+   diss_matrix %>%    #構(gòu)建相異矩陣
+   identify_modules(min_module_size=5) %>%  #鑒定代謝相關(guān)模塊
+   name_modules(pathway_annotation="SUB_PATHWAY") %>%  #代謝相關(guān)模塊命名
+   calculate_MS(group_factors=c("tumor_normal","random_gender")) #根據(jù)樣本性狀計算模塊之間關(guān)聯(lián)的顯著性

alpha: 1.000000
 ..cutHeight not given, setting it to 0.991  ===>  99% of the (truncated) height range in dendro.
 ..done.
#代謝相關(guān)模塊可視化,分級聚類
> WGCNA::plotDendroAndColors(metadata(met_example)$tree, 
+                            metadata(met_example)$module_color_vector, 
+                            'Module colors', 
+                            dendroLabels = FALSE, 
+                            hang = 0.03,
+                            addGuide = TRUE, 
+                            guideHang = 0.05, main='')

#代謝相關(guān)模塊可視化,各模塊直接的關(guān)系
> par(mar=c(2,2,2,2))
> ape::plot.phylo(ape::as.phylo(metadata(met_example)$METree),
+                 type = 'fan',
+                 show.tip.label = FALSE, 
+                 main='')
> ape::tiplabels(frame = 'circle',
+                col='black', 
+                text=rep('',length(unique(metadata(met_example)$modules))), 
+                bg = WGCNA::labels2colors(0:21))

#代謝相關(guān)模塊命名,可視化
> ape::plot.phylo(ape::as.phylo(metadata(met_example)$METree), cex=0.9)

#癌癥樣本和正常樣本對應(yīng)的模塊之間的關(guān)聯(lián)顯著性,可視化
> MS_plot(met_example,
+         group_factor="tumor_normal",
+         p_value_cutoff=0.05,
+         p_adjust=FALSE)
#不同性別樣本對應(yīng)的模塊之間的關(guān)聯(lián)顯著性,可視化
> MS_plot(met_example,
+         group_factor="random_gender",
+         p_value_cutoff=0.05,
+         p_adjust=FALSE)

#相關(guān)模塊中單個代謝產(chǎn)物在不同樣品中的差異性檢驗
> MOI_plot(met_example,
+          group_factor="tumor_normal",
+          MOI = 2,
+          label_colors=c("darkseagreen","dodgerblue"),
+          p_adjust = FALSE) + xlim(c(-1,8))

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容