10X單細(xì)胞空間分析回顧之SPOTlight

一年了,我們都需要總結(jié),這一篇我們回顧一下SPOTlight,非常好的方法,建議大家也總結(jié)一下,看看2021年,自己都得到了什么。

SPOTlight 的目標(biāo)是提供一種工具,能夠?qū)Π?xì)胞混合物的每個(gè)捕獲位置中存在的細(xì)胞類型和細(xì)胞類型比例進(jìn)行解卷積,最初是為 10X 的 Visium - 空間轉(zhuǎn)錄組學(xué)技術(shù)開(kāi)發(fā)的, it can be used for all technologies returning mixtures of cells。 SPOTlight 基于通過(guò) NMFreg 模型為每種細(xì)胞類型查找topics profiles singatures,并找到最適合我們想要解卷積的spot的組合。

圖片.png

Libraries

library(Matrix)
library(data.table)
library(Seurat)
library(SeuratData)
library(dplyr)
library(gt)
library(SPOTlight)
library(igraph)
library(RColorBrewer)

Load data

Load single-cell reference dataset.

path_to_data <- system.file(package = "SPOTlight")
cortex_sc <- readRDS(glue::glue("{path_to_data}/allen_cortex_dwn.rds"))

Load Spatial data

if (! "stxBrain" %in% SeuratData::AvailableData()[, "Dataset"]) {
  # If dataset not downloaded proceed to download it
  SeuratData::InstallData("stxBrain")
}

# Load data
anterior <- SeuratData::LoadData("stxBrain", type = "anterior1")

Pre-processing,設(shè)置種子的功能大家還知道么??

set.seed(123)
cortex_sc <- Seurat::SCTransform(cortex_sc, verbose = FALSE) %>%
  Seurat::RunPCA(., verbose = FALSE) %>%
  Seurat::RunUMAP(., dims = 1:30, verbose = FALSE)

Visualize the clustering

Seurat::DimPlot(cortex_sc,
                group.by = "subclass",
                label = TRUE) + Seurat::NoLegend()
圖片.png

Descriptive

cortex_sc@meta.data %>%
  dplyr::count(subclass) %>%
  gt::gt(.[-1, ]) %>%
  gt::tab_header(
    title = "Cell types present in the reference dataset",
  ) %>%
  gt::cols_label(
    subclass = gt::html("Cell Type")
  )
圖片.png

Compute marker genes

為了確定最重要的標(biāo)記基因,我們可以使用函數(shù) Seurat::FindAllMarkers,它將返回每個(gè)cluster的標(biāo)記。

Seurat::Idents(object = cortex_sc) <- cortex_sc@meta.data$subclass
cluster_markers_all <- Seurat::FindAllMarkers(object = cortex_sc, 
                                              assay = "SCT",
                                              slot = "data",
                                              verbose = TRUE, 
                                              only.pos = TRUE)

saveRDS(object = cluster_markers_all,
        file = here::here("inst/markers_sc.RDS"))

SPOTlight Decomposition

set.seed(123)

spotlight_ls <- spotlight_deconvolution(
  se_sc = cortex_sc,
  counts_spatial = anterior@assays$Spatial@counts,
  clust_vr = "subclass", # Variable in sc_seu containing the cell-type annotation
  cluster_markers = cluster_markers_all, # Dataframe with the marker genes
  cl_n = 100, # number of cells per cell type to use
  hvg = 3000, # Number of HVG to use
  ntop = NULL, # How many of the marker genes to use (by default all)
  transf = "uv", # Perform unit-variance scaling per cell and spot prior to factorzation and NLS
  method = "nsNMF", # Factorization method
  min_cont = 0 # Remove those cells contributing to a spot below a certain threshold 
  )

saveRDS(object = spotlight_ls, file = here::here("inst/spotlight_ls.rds"))

Read RDS object

spotlight_ls <- readRDS(file = here::here("inst/spotlight_ls.rds"))

nmf_mod <- spotlight_ls[[1]]
decon_mtrx <- spotlight_ls[[2]]

Assess deconvolution

Before even looking at the decomposed spots we can gain insight on how well the model performed by looking at the topic profiles for the cell types.
The first thing we can do is look at how specific the topic profiles are for each cell type.

h <- NMF::coef(nmf_mod[[1]])
rownames(h) <- paste("Topic", 1:nrow(h), sep = "_")
topic_profile_plts <- SPOTlight::dot_plot_profiles_fun(
  h = h,
  train_cell_clust = nmf_mod[[2]])

topic_profile_plts[[2]] + ggplot2::theme(
  axis.text.x = ggplot2::element_text(angle = 90), 
  axis.text = ggplot2::element_text(size = 12))
圖片.png

接下來(lái)我們可以看看每個(gè)細(xì)胞類型中每個(gè)細(xì)胞的各個(gè)topic profiles的行為。
在這里,我們期望來(lái)自同一細(xì)胞類型的所有細(xì)胞顯示出相似的topic profiles分布,否則該cluster中可能會(huì)有更多的子結(jié)構(gòu),我們可能只捕獲其中一個(gè)。

topic_profile_plts[[1]] + theme(axis.text.x = element_text(angle = 90), 
                                axis.text = element_text(size = 12))
圖片.png

Lastly we can take a look at which genes are the most important for each topic and therefore get an insight into which genes are driving them.

basis_spotlight <- data.frame(NMF::basis(nmf_mod[[1]]))

colnames(basis_spotlight) <- unique(stringr::str_wrap(nmf_mod[[2]], width = 30))

basis_spotlight %>%
  dplyr::arrange(desc(Astro)) %>%
  round(., 5) %>% 
  DT::datatable(., filter = "top")
圖片.png

Visualization

Join decomposition with metadata

# This is the equivalent to setting min_cont to 0.04
decon_mtrx_sub <- decon_mtrx[, colnames(decon_mtrx) != "res_ss"]
decon_mtrx_sub[decon_mtrx_sub < 0.08] <- 0
decon_mtrx <- cbind(decon_mtrx_sub, "res_ss" = decon_mtrx[, "res_ss"])
rownames(decon_mtrx) <- colnames(anterior)

decon_df <- decon_mtrx %>%
  data.frame() %>%
  tibble::rownames_to_column("barcodes")

anterior@meta.data <- anterior@meta.data %>%
  tibble::rownames_to_column("barcodes") %>%
  dplyr::left_join(decon_df, by = "barcodes") %>%
  tibble::column_to_rownames("barcodes")

Specific cell-types

we can use the standard Seurat::SpatialFeaturePlot to view predicted celltype proportions one at a time.

Seurat::SpatialFeaturePlot(
  object = anterior,
  features = c("L2.3.IT", "L6b", "Meis2", "Oligo"),
  alpha = c(0.1, 1))
圖片.png

Spatial scatterpies

cell_types_all <- colnames(decon_mtrx)[which(colnames(decon_mtrx) != "res_ss")]

SPOTlight::spatial_scatterpie(se_obj = anterior,
                              cell_types_all = cell_types_all,
                              img_path = "sample_data/spatial/tissue_lowres_image.png",
                              pie_scale = 0.4)
圖片.png

Plot spot composition of spots containing cell-types of interest

SPOTlight::spatial_scatterpie(se_obj = anterior,
                              cell_types_all = cell_types_all,
                              img_path = "sample_data/spatial/tissue_lowres_image.png",
                              cell_types_interest = "L6b",
                              pie_scale = 0.8)
圖片.png

Spatial interaction graph

現(xiàn)在我們知道在每個(gè)點(diǎn)內(nèi)發(fā)現(xiàn)了哪些細(xì)胞類型,我們可以制作一個(gè)表示空間相互作用的圖,其中細(xì)胞類型之間的邊緣越強(qiáng),我們?cè)谕稽c(diǎn)內(nèi)發(fā)現(xiàn)它們的頻率越高。 為此,我們只需要運(yùn)行 get_spatial_interaction_graph 函數(shù),該函數(shù)將打印繪圖并返回繪圖所需的元素。

graph_ntw <- SPOTlight::get_spatial_interaction_graph(decon_mtrx = decon_mtrx[, cell_types_all])

If you want to tune how the graph looks you can do the following or you can check out more options here:

deg <- degree(graph_ntw, mode="all")

# Get color palette for difusion
edge_importance <- E(graph_ntw)$importance

# Select a continuous palette
qual_col_pals <- brewer.pal.info[brewer.pal.info$category == 'seq',]

# Create a color palette
getPalette <- colorRampPalette(brewer.pal(9, "YlOrRd"))

# Get how many values we need
grad_edge <- seq(0, max(edge_importance), 0.1)

# Generate extended gradient palette dataframe
graph_col_df <- data.frame(value = as.character(grad_edge),
                           color = getPalette(length(grad_edge)),
                           stringsAsFactors = FALSE)
# Assign color to each edge
color_edge <- data.frame(value = as.character(round(edge_importance, 1)), stringsAsFactors = FALSE) %>%
  dplyr::left_join(graph_col_df, by = "value") %>%
  dplyr::pull(color)

# Open a pdf file
plot(graph_ntw,
     # Size of the edge
     edge.width = edge_importance,
     edge.color = color_edge,
     # Size of the buble
     vertex.size = deg,
     vertex.color = "#cde394",
     vertex.frame.color = "white",
     vertex.label.color = "black",
     vertex.label.family = "Ubuntu", # Font family of the label (e.g.“Times”, “Helvetica”)
     layout = layout.circle)
圖片.png

Lastly one can compute cell-cell correlations to see groups of cells that correlate positively or negatively.

# Remove cell types not predicted to be on the tissue
decon_mtrx_sub <- decon_mtrx[, cell_types_all]
decon_mtrx_sub <- decon_mtrx_sub[, colSums(decon_mtrx_sub) > 0]

# Compute correlation
decon_cor <- cor(decon_mtrx_sub)

# Compute correlation P-value
p.mat <- corrplot::cor.mtest(mat = decon_mtrx_sub, conf.level = 0.95)

# Visualize
ggcorrplot::ggcorrplot(
  corr = decon_cor,
  p.mat = p.mat[[1]],
  hc.order = TRUE,
  type = "full",
  insig = "blank",
  lab = TRUE,
  outline.col = "lightgrey",
  method = "square",
  # colors = c("#4477AA", "white", "#BB4444"))
  colors = c("#6D9EC1", "white", "#E46726"),
  title = "Predicted cell-cell proportion correlation",
  legend.title = "Correlation\n(Pearson)") +
  ggplot2::theme(
    plot.title = ggplot2::element_text(size = 22, hjust = 0.5, face = "bold"),
    legend.text = ggplot2::element_text(size = 12),
    legend.title = ggplot2::element_text(size = 15),
    axis.text.x = ggplot2::element_text(angle = 90),
    axis.text = ggplot2::element_text(size = 18, vjust = 0.5))
圖片.png

Step-by-Step insight

Here we are going to show step by step what is going on and all the different steps involved in the process

圖片.png

Downsample data

如果數(shù)據(jù)集非常大,我們希望在細(xì)胞數(shù)量和基因數(shù)量方面對(duì)其進(jìn)行下采樣,以訓(xùn)練模型。 為了進(jìn)行下采樣,我們希望保留每個(gè)簇的代表性細(xì)胞數(shù)量和最重要的基因。 我們表明這種下采樣不會(huì)影響模型的性能并大大加快了模型訓(xùn)練的速度。

# Downsample scRNAseq to select gene set and number of cells to train the model
se_sc_down <- downsample_se_obj(se_obj = cortex_sc,
                                clust_vr = "subclass",
                                cluster_markers = cluster_markers_all,
                                cl_n = 100,
                                hvg = 3000)

Train NMF model

Once we have the data ready to pass to the model we can train it as shown below.

start_time <- Sys.time()
nmf_mod_ls <- train_nmf(cluster_markers = cluster_markers_all, 
                        se_sc = se_sc_down, 
                        mtrx_spatial = anterior@assays$Spatial@counts,
                        clust_vr = "subclass",
                        ntop = NULL,
                        hvg = 3000,
                        transf = "uv",
                        method = "nsNMF")

nmf_mod <- nmf_mod_ls[[1]]
Extract matrices form the model:
# get basis matrix W
w <- basis(nmf_mod)
dim(w)

# get coefficient matrix H
h <- coef(nmf_mod)
dim(h)
Look at cell-type specific topic profile
rownames(h) <- paste("Topic", 1:nrow(h), sep = "_")
topic_profile_plts <- dot_plot_profiles_fun(
  h = h,
  train_cell_clust = nmf_mod_ls[[2]]
  )

topic_profile_plts[[2]] + theme(axis.text.x = element_text(angle = 90))
Spot Deconvolution
# Extract count matrix
spot_counts <- anterior@assays$Spatial@counts

# Subset to genes used to train the model
spot_counts <- spot_counts[rownames(spot_counts) %in% rownames(w), ]
Run spots through the basis to get the pertinent coefficients. To do this for every spot we are going to set up a system of linear equations where we need to find the coefficient, we will use non-negative least squares to determine the best coefficient fit.
ct_topic_profiles <- topic_profile_per_cluster_nmf(h = h,
                              train_cell_clust = nmf_mod_ls[[2]])

decon_mtrx <- mixture_deconvolution_nmf(nmf_mod = nmf_mod,
                          mixture_transcriptome = spot_counts,
                          transf = "uv",
                          reference_profiles = ct_topic_profiles, 
                          min_cont = 0.09)

生活很好,有你更好

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。
禁止轉(zhuǎn)載,如需轉(zhuǎn)載請(qǐng)通過(guò)簡(jiǎn)信或評(píng)論聯(lián)系作者。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容