10X單細(xì)胞(10X空間轉(zhuǎn)錄組)之單細(xì)胞精度分析細(xì)胞通訊

開工了,到了我們上班的時(shí)間了,剛開始,我們來分享一些新出現(xiàn)的方法,看看方法上有哪些更新,我們雖然做不了潮流的引領(lǐng)者,但是要跟隨潮流,保證自己不被時(shí)代拋棄,今天分享的文獻(xiàn)在Comparative analysis of cell-cell communication at single-cell resolution,里面對(duì)當(dāng)前細(xì)胞通訊的方法做了一定的總結(jié),我們借鑒一下,并看看作者的方法有哪些可取之處。

目前的單細(xì)胞通訊的分析特點(diǎn)

  • at the level of the cell type or cluster, discarding single-cell level information.(這個(gè)大家應(yīng)該都非常清楚了,通訊強(qiáng)度就是基因表達(dá)的平均值相乘)

然而真實(shí)的細(xì)胞通訊

  • 細(xì)胞通訊 does not operate at the level of the group; rather, such interactions take place between individual cells(真正的細(xì)胞通訊是細(xì)胞之間,而不是細(xì)胞類型群體之間,這也是空間轉(zhuǎn)錄組為什么分析空間臨近通訊的原因)。

如果我們要真正的研究細(xì)胞之間的通訊,那么

  • analyze interactions at the level of the single cell
  • leverage the full information content contained within scRNA-seq data by looking at up- and down-stream cellular activity
  • enable comparative analysis between conditions;
  • robust to multiple experimental designs

不知道大家對(duì)單細(xì)胞做細(xì)胞通訊的方式有什么更好的建議呢???,我們來看看作者開發(fā)的方法,Scriabin,an adaptable and computationally-efficient method for CCC analysis.Scriabin 通過結(jié)合收集的配體-受體相互作用數(shù)據(jù)庫、下游細(xì)胞內(nèi)信號(hào)傳導(dǎo)模型、基于anchors的數(shù)據(jù)集集成和基因網(wǎng)絡(luò)分析,以單細(xì)胞分辨率剖析復(fù)雜的交流途徑,以在單細(xì)胞分辨率下恢復(fù)具有生物學(xué)意義的 CCC edges。

Scriabin的分析框架

  • the cell-cell interaction matrix workflow, optimal for smaller datasets, analyzes communication methods used for each cell-cell pair in the dataset(細(xì)胞對(duì)分析通訊),CCC 的基本單位是表達(dá)配體的發(fā)送細(xì)胞 Ni,這些配體被接收細(xì)胞 Nj 表達(dá)的同源受體接收。 Scriabin 通過計(jì)算數(shù)據(jù)集中每對(duì)細(xì)胞對(duì)每個(gè)配體-受體對(duì)表達(dá)的幾何平均值,在細(xì)胞-細(xì)胞相互作用矩陣 M 中編碼此信息.由于配體-受體相互作用是定向的,Scriabin將每個(gè)細(xì)胞分別視為“發(fā)送者”(配體表達(dá))和“接收者”(受體表達(dá)),從而保持 CCC 網(wǎng)絡(luò)的定向性質(zhì)。 M 可以類似于基因表達(dá)矩陣進(jìn)行處理,并用于降維、聚類和差異分析
  • the summarized interaction graph workflow, designed for large comparative analyses, identifies cell-cell pairs with different total communicative potential between samples(識(shí)別有意義的配受體對(duì))
  • the interaction program discovery workflow, suitable for any dataset size, finds modules of co-expressed ligand-receptor pairs(第三點(diǎn)尤為重要)。
圖片.png

圖片.png

具體的方法

Generation of cell-cell interaction matrix

將一對(duì)細(xì)胞之間的細(xì)胞-細(xì)胞相互作用向量定義為每個(gè)同源配體-受體對(duì)的表達(dá)值的幾何平均值。 形式上,sender cell Ni 和receiver cell Nj 之間的交互向量 V 由下式給出:

圖片.png

where ????,???? represent a cognate ligand-receptor pair.選擇將配體和受體表達(dá)值相乘,以便配體或受體表達(dá)的零值將導(dǎo)致相互作用向量的相應(yīng)索引為零。 此外,選擇取配體-受體表達(dá)值乘積的平方根,這樣高表達(dá)的配體-受體對(duì)不會(huì)不成比例地驅(qū)動(dòng)下游分析。This definition is equivalent to the geometric mean。細(xì)胞-細(xì)胞相互作用矩陣 M 是通過連接細(xì)胞-細(xì)胞相互作用向量來構(gòu)建的。 線性回歸用于校正 M 因測序深度引起的變化,其中 Ni、Nj 的總測序深度定義為 Ni 和 Ni 中唯一分子標(biāo)識(shí)符 (UMI) 的總和。 M 用作低維嵌入的輸入以進(jìn)行可視化,并用作最近鄰圖的輸入以進(jìn)行基于圖的聚類。

矩陣的下游分析(就是Seurat的基本操作)
圖片.png

通訊programs的分析,Interaction program discovery workflow

Iterative approximation of a ligand-receptor pair topological overlap matrix (TOM)
圖片.png
Identification and significance testing of interaction programs(層次聚類,WGCNA)
圖片.png

看一看示例代碼

安裝
devtools::install_github("BlishLab/scriabin", ref = "main")
single-dataset(單一數(shù)據(jù)樣本)
library(Seurat)
library(SeuratData)
library(scriabin)
library(tidyverse)
#####示例數(shù)據(jù)
install.packages("https://seurat.nygenome.org/src/contrib/panc8.SeuratData_3.0.2.tar.gz", repos = NULL, type = "source") 
library(panc8.SeuratData)
panc8 <- LoadData("panc8")
####Analyze the indrop dataset for the interaction programs tutorial
panc_fl <- subset(panc8, cells = colnames(panc8)[panc8$tech=="fluidigmc1"])
panc_fl <- SCTransform(panc_fl, verbose = F) %>%
  RunPCA(verbose = F) %>%
  RunUMAP(dims = 1:30, verbose = F)
DimPlot(panc_fl, group.by = "celltype", label = T, repel = T) + NoLegend()
圖片.png
First we generate a cell-cell interaction matrix object on the alpha and beta cells in the dataset. Then we scale, find variable features, run PCA and UMAP, find neighbor graphs and clusters, treating this matrix just like we would a gene expression matrix
panc_fl_ccim <- GenerateCCIM(subset(panc_fl, cells = colnames(panc_fl)[panc_fl$celltype %in% c("alpha","beta")]))
panc_fl_ccim <- ScaleData(panc_fl_ccim) %>% 
  FindVariableFeatures() %>%
  RunPCA() %>% 
  RunUMAP(dims = 1:10) %>%
  FindNeighbors(dims = 1:10) %>%
  FindClusters(resolution = 0.2)
p1 <- DimPlot(panc_fl_ccim, group.by = "sender_celltype")
p2 <- DimPlot(panc_fl_ccim, group.by = "receiver_celltype")
p1+p2
DimPlot(panc_fl_ccim, label = T, repel = T) + NoLegend()

cluster4_5.edges <- FindMarkers(panc_fl_ccim, ident.1 = "4", ident.2 = "5")
cluster4_5.edges %>% top_n(20,wt = abs(avg_log2FC))

CCIMFeaturePlot(panc_fl_ccim, seu = panc_fl, 
            features = c("EFNA1","NPNT"), type_plot = "sender")
圖片.png

multi-dataset

library(Seurat)
library(SeuratData)
library(scriabin)

scriabin::load_nichenet_database()

library(SeuratData)
InstallData("ifnb")
ifnb <- UpdateSeuratObject(LoadData("ifnb"))
ifnb <- PercentageFeatureSet(ifnb, pattern = "MT-", col.name = "percent.mt") %>%
  SCTransform(ifnb, vars.to.regress = "percent.mt", verbose = F) %>%
  RunPCA(ifnb, verbose = F) %>% 
  FindNeighbors(ifnb, dims = 1:30, verbose = F) %>%
  FindClusters(ifnb, verbose = F) %>%
  RunUMAP(ifnb, dims = 1:30, verbose = F)
DimPlot(ifnb, label = T, repel = T) + NoLegend()
DimPlot(ifnb, label = T, repel = T, group.by = "seurat_annotations") + NoLegend()
DimPlot(ifnb, group.by = "stim")
圖片.png

This dataset is composed of two sub-datasets: PBMCs stimulated with IFNB (STIM) or control (CTRL).

If we assume there is no batch effect between these two datasets, we can observe from these dimensionality reduction projections the intense transcriptional perturbation caused by this stimulation. This is one situation in which it is easy to see why clustering/subclustering for high resolution CCC analysis can be a problematic task: the degree of perturbation is so high that the monocytes from these two datasets will never cluster together. We need a way to align these subspaces together. Given a monocyte in CTRL, what is its molecular counterpart in STIM?

We take recent progress in dataset integration methodology to develop a high-resolution alignment process we call "binning", where we assign each cell a bin identity so that we maximize the similarity of the cells within a bin, and maximize the representation of all the samples we wish to compare in each bin. After bins are identified, they are tested for the significance of their connectivity.

Dataset binning

There are two workflow options for binning in terms of bin identification:

  1. Bin by coarse cell identities (recommended). If the user specifies coarse cell identities (eg. in a PBMC dataset, T/NK cells, B cells, myeloid cells), this can prevent anchors from being identified between cells that the user believes to be in non-overlapping cell categories. This can result in cleaner bins, but there must be enough cells in each coarse identity from each dataset to proceed properly. A downside is that there may be some small groups of cells that aren't related to major cell populations (eg. a small population of platelets in a PBMC dataset).
  2. Bin all cells. Without specifying coarse cell identities for binning, potentially spurious associations may form between cells (especially in lower quality samples).

Significance testing proceeds by a permutation test: comparing connectivity of the identified bin against randomly generated bins. There are two workflow options for binning in terms of generating random bins:

  1. Pick cells from the same cell type in each random bin (recommended). If the user supplies granular cell type identities, a random bin will be constructed with cells from the same cell type identity. The more granular the cell type calls supplied, the more rigorous the significance testing.
  2. Without supplying granular cell type identities, the dataset will be clustered and the cluster identities used for significance testing. Generating random bins across the entire dataset would result in very few non-significant bins identified.

Here we will define a coarse cell identity to use for bin identification, and then use the Seurat-provided annotations for significance testing.

Additionally, Scriabin allows different reductions to be used for neighbor graph construction. For example, the results from a reference-based sPCA can be used for binning if the cell type relationships in the reference are considered more informative.

ifnb$coarse_ident <- mapvalues(ifnb$seurat_annotations, from = unique(ifnb$seurat_annotations),
                               to = c("myeloid","myeloid","T/misc","T/misc",
                                      "T/misc","T/misc","T/misc","B",
                                      "B","myeloid","myeloid","T/misc","T/misc"))
ifnb <- BinDatasets(ifnb, split.by = "stim", dims = 1:30, 
                    coarse_cell_types = "coarse_ident", sigtest_cell_types = "seurat_annotations")
圖片.png
ifnb_split <- SplitObject(ifnb, split.by = "stim")
sum_ig <- AssembleInteractionGraphs(ifnb, by = "prior", split.by = "stim")
ifnb_split <- pblapply(ifnb_split, function(x) {BuildPriorInteraction(x, correct.depth = T)})
ogig <- lapply(ifnb_split, function(x) {
  as.matrix(x@graphs$prior_interaction)
})
圖片.png

interaction-programs

library(Seurat)
library(SeuratData)
library(scriabin)
library(tidyverse)
library(ComplexHeatmap)
library(cowplot)
install.packages("https://seurat.nygenome.org/src/contrib/panc8.SeuratData_3.0.2.tar.gz", repos = NULL, type = "source") 
library(panc8.SeuratData)
panc8 <- LoadData("panc8")

panc_id <- subset(panc8, cells = colnames(panc8)[panc8$tech=="indrop"])
panc_id <- SCTransform(panc_id, verbose = F) %>%
  RunPCA(verbose = F) %>%
  RunUMAP(dims = 1:30, verbose = F)
DimPlot(panc_id, group.by = "celltype", label = T, repel = T) + NoLegend()
圖片.png
panc_ip <- InteractionPrograms(panc_id, iterate.threshold = 300)
#test for interaction program significance
panc_ip_sig <- InteractionProgramSignificance(panc_ip, n.replicate = 500)
#score cells by expression of interaction program
panc_id <- ScoreInteractionPrograms(panc_id, panc_ip_sig)

panc_id_ip_lig <- as.matrix(panc_id@meta.data %>% 
  select("celltype",
         starts_with("ligands")) %>%
  group_by(celltype) %>%
  summarise_if(is.numeric, mean) %>% column_to_rownames("celltype"))
Heatmap(panc_id_ip_lig, show_column_names = F)
panc_id_ip_rec <- as.matrix(panc_id@meta.data %>% 
  select("celltype",
         starts_with("receptors")) %>%
  group_by(celltype) %>%
  summarise_if(is.numeric, mean) %>% column_to_rownames("celltype"))
Heatmap(panc_id_ip_rec, show_column_names = F)
圖片.png
act_stellate_ip <- panc_id_ip_lig["activated_stellate",]
poi <- gsub("ligands_","",names(which(act_stellate_ip==max(act_stellate_ip))))
#Seurat's FeaturePlot has a nice option to blend expression of two features together on the same plot
p <- FeaturePlot(panc_id, 
          features = c(paste0("ligands_",poi),
                            paste0("receptors_",poi)), 
          blend = T, combine = F, 
          cols = c("grey90","purple","yellowgreen"), order = T)
p[[3]] + NoLegend()
DimPlot(panc_id, group.by = "celltype", label = T, repel = T) + NoLegend()
圖片.png
moi <- reshape2::melt(mod_df %>% dplyr::filter(name==poi) %>%
  select("lr_pair",contains("connectivity"))) %>% arrange(-value)
moi$lr_pair <- factor(moi$lr_pair, levels = unique(moi$lr_pair))
ggplot(moi, aes(x = lr_pair, y = value, color = variable)) + 
  geom_point() + theme_cowplot() + ggpubr::rotate_x_text() + labs(x = NULL, y = "Intramodular\nconnectivity")
圖片.png
最后問一句,大家覺得這個(gè)方法改進(jìn)之處合理么?

生活很好,有你更好

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。
禁止轉(zhuǎn)載,如需轉(zhuǎn)載請(qǐng)通過簡信或評(píng)論聯(lián)系作者。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容