第五計(jì) 趁火打劫
本指趁人家失火的時(shí)候去搶東西?,F(xiàn)比喻乘人之危,撈一把。是以“剛”喻己,以“柔”喻敵,言乘敵之危,就勢(shì)而取勝。
scRNA-seq整合簡(jiǎn)介
對(duì)兩個(gè)或多個(gè)單細(xì)胞數(shù)據(jù)集的聯(lián)合分析提出了獨(dú)特的挑戰(zhàn)。特別是,在標(biāo)準(zhǔn)工作流程下,識(shí)別多個(gè)數(shù)據(jù)集中存在的細(xì)胞群體可能會(huì)成問(wèn)題。Seurat v4包括一組用于匹配(或“對(duì)齊”)跨數(shù)據(jù)集的共享細(xì)胞群體的方法。這些方法首先確定處于匹配生物學(xué)狀態(tài)(“錨”)的細(xì)胞的跨數(shù)據(jù)集對(duì),既可以用于校正數(shù)據(jù)集之間的技術(shù)差異(即批效應(yīng)校正),也可以用于對(duì)基因組進(jìn)行比較性scRNA-seq分析跨實(shí)驗(yàn)條件。
下面,我們展示了Stuart *,Butler *等人,2019中所述的scRNA-seq整合方法,以對(duì)處于靜止或干擾素刺激狀態(tài)的人免疫細(xì)胞(PBMC)進(jìn)行比較分析。
整合目標(biāo)
以下教程旨在概述使用Seurat集成過(guò)程可能進(jìn)行的復(fù)雜細(xì)胞類型的比較分析。在這里,我們解決了一些關(guān)鍵目標(biāo):
- 創(chuàng)建“集成”數(shù)據(jù)分析以進(jìn)行下游分析
- 識(shí)別兩個(gè)數(shù)據(jù)集中都存在的單元格類型
- 獲得在對(duì)照和刺激細(xì)胞中均保守的細(xì)胞類型標(biāo)記
- 比較數(shù)據(jù)集以找到對(duì)刺激的細(xì)胞類型特異性反應(yīng)
設(shè)置Seurat對(duì)象
為了方便起見(jiàn),我們通過(guò)SeuratData軟件包分發(fā)此數(shù)據(jù)集。
library(Seurat)
library(SeuratData)
library(patchwork)
# install dataset
InstallData("ifnb")
# load dataset
LoadData("ifnb")
# split the dataset into a list of two seurat objects (stim and CTRL)
ifnb.list <- SplitObject(ifnb, split.by = "stim")
# normalize and identify variable features for each dataset independently
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})
# select features that are repeatedly variable across datasets for integration
features <- SelectIntegrationFeatures(object.list = ifnb.list)
執(zhí)行整合
然后,我們使用FindIntegrationAnchors()函數(shù)來(lái)識(shí)別錨點(diǎn),該函數(shù)將Seurat對(duì)象的列表作為輸入,并使用這些錨點(diǎn)將兩個(gè)數(shù)據(jù)集與集成在一起IntegrateData()`。
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features)
# this command creates an 'integrated' data assay
immune.combined <- IntegrateData(anchorset = immune.anchors)
進(jìn)行綜合分析
現(xiàn)在,我們可以在所有單元上運(yùn)行單個(gè)集成分析!
# specify that we will perform downstream analysis on the corrected data note that the original
# unmodified data still resides in the 'RNA' assay
DefaultAssay(immune.combined) <- "integrated"
# Run the standard workflow for visualization and clustering
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap", label = TRUE, repel = TRUE)
p1 + p2

為了并排可視化這兩個(gè)條件,我們可以使用split.by參數(shù)來(lái)顯示每個(gè)以聚類著色的條件。
DimPlot(immune.combined, reduction = "umap", split.by = "stim")

識(shí)別保守的細(xì)胞類型標(biāo)記
為了鑒定在各種條件下保守的規(guī)范細(xì)胞類型標(biāo)記基因,我們提供了該FindConservedMarkers()`功能。此功能對(duì)每個(gè)數(shù)據(jù)集/組執(zhí)行差異基因表達(dá)測(cè)試,并使用MetaDE R軟件包中的薈萃分析方法組合p值。例如,無(wú)論簇6中的刺激條件如何,我們都可以計(jì)算出保守標(biāo)記的基因(NK細(xì)胞)。
# For performing differential expression after integration, we switch back to the original data
DefaultAssay(immune.combined) <- "RNA"
nk.markers <- FindConservedMarkers(immune.combined, ident.1 = 6, grouping.var = "stim", verbose = FALSE)
head(nk.markers)
## CTRL_p_val CTRL_avg_log2FC CTRL_pct.1 CTRL_pct.2 CTRL_p_val_adj
## GNLY 0 6.006422 0.944 0.045 0
## FGFBP2 0 3.223246 0.503 0.020 0
## CLIC3 0 3.466418 0.599 0.024 0
## PRF1 0 2.654683 0.424 0.017 0
## CTSW 0 2.991829 0.533 0.029 0
## KLRD1 0 2.781453 0.497 0.019 0
## STIM_p_val STIM_avg_log2FC STIM_pct.1 STIM_pct.2 STIM_p_val_adj
## GNLY 0.000000e+00 5.853573 0.956 0.060 0.000000e+00
## FGFBP2 7.275492e-161 2.200379 0.260 0.016 1.022425e-156
## CLIC3 0.000000e+00 3.549919 0.627 0.031 0.000000e+00
## PRF1 0.000000e+00 4.102686 0.862 0.057 0.000000e+00
## CTSW 0.000000e+00 3.139620 0.596 0.035 0.000000e+00
## KLRD1 0.000000e+00 2.880055 0.558 0.027 0.000000e+00
## max_pval minimump_p_val
## GNLY 0.000000e+00 0
## FGFBP2 7.275492e-161 0
## CLIC3 0.000000e+00 0
## PRF1 0.000000e+00 0
## CTSW 0.000000e+00 0
## KLRD1 0.000000e+00 0
我們可以為每個(gè)簇探索這些標(biāo)記基因,并使用它們將我們的簇注釋為特定的細(xì)胞類型。
FeaturePlot(immune.combined, features = c("CD3D", "SELL", "CREM", "CD8A", "GNLY", "CD79A", "FCGR3A",
"CCL2", "PPBP"), min.cutoff = "q9")

immune.combined <- RenameIdents(immune.combined, `0` = "CD14 Mono", `1` = "CD4 Naive T", `2` = "CD4 Memory T",
`3` = "CD16 Mono", `4` = "B", `5` = "CD8 T", `6` = "NK", `7` = "T activated", `8` = "DC", `9` = "B Activated",
`10` = "Mk", `11` = "pDC", `12` = "Eryth", `13` = "Mono/Mk Doublets", `14` = "HSPC")
DimPlot(immune.combined, label = TRUE)

DotPlot()帶有split.by`參數(shù)的函數(shù)可用于查看各種條件下的保守細(xì)胞類型標(biāo)記,顯示表達(dá)水平和表達(dá)任何給定基因的簇中細(xì)胞的百分比。在這里,我們?yōu)?4個(gè)簇中的每個(gè)簇繪制2-3個(gè)強(qiáng)標(biāo)記基因。
Idents(immune.combined) <- factor(Idents(immune.combined), levels = c("HSPC", "Mono/Mk Doublets",
"pDC", "Eryth", "Mk", "DC", "CD14 Mono", "CD16 Mono", "B Activated", "B", "CD8 T", "NK", "T activated",
"CD4 Naive T", "CD4 Memory T"))
markers.to.plot <- c("CD3D", "CREM", "HSPH1", "SELL", "GIMAP5", "CACYBP", "GNLY", "NKG7", "CCL5",
"CD8A", "MS4A1", "CD79A", "MIR155HG", "NME1", "FCGR3A", "VMO1", "CCL2", "S100A9", "HLA-DQA1",
"GPR183", "PPBP", "GNG11", "HBA2", "HBB", "TSPAN13", "IL3RA", "IGJ", "PRSS57")
DotPlot(immune.combined, features = markers.to.plot, cols = c("blue", "red"), dot.scale = 8, split.by = "stim") +
RotatedAxis()

確定跨條件的差異表達(dá)基因
現(xiàn)在,我們已經(jīng)排列了刺激細(xì)胞和對(duì)照細(xì)胞,我們可以開(kāi)始進(jìn)行比較分析,并觀察刺激引起的差異。廣泛觀察這些變化的一種方法是繪制受刺激細(xì)胞和對(duì)照細(xì)胞的平均表達(dá),并在散點(diǎn)圖上尋找視覺(jué)異常值的基因。在這里,我們采用受刺激的和對(duì)照的原始T細(xì)胞和CD14單核細(xì)胞群體的平均表達(dá),并生成散點(diǎn)圖,突出顯示對(duì)干擾素刺激表現(xiàn)出戲劇性反應(yīng)的基因。
library(ggplot2)
library(cowplot)
theme_set(theme_cowplot())
t.cells <- subset(immune.combined, idents = "CD4 Naive T")
Idents(t.cells) <- "stim"
avg.t.cells <- as.data.frame(log1p(AverageExpression(t.cells, verbose = FALSE)$RNA))
avg.t.cells$gene <- rownames(avg.t.cells)
cd14.mono <- subset(immune.combined, idents = "CD14 Mono")
Idents(cd14.mono) <- "stim"
avg.cd14.mono <- as.data.frame(log1p(AverageExpression(cd14.mono, verbose = FALSE)$RNA))
avg.cd14.mono$gene <- rownames(avg.cd14.mono)
genes.to.label = c("ISG15", "LY6E", "IFI6", "ISG20", "MX1", "IFIT2", "IFIT1", "CXCL10", "CCL8")
p1 <- ggplot(avg.t.cells, aes(CTRL, STIM)) + geom_point() + ggtitle("CD4 Naive T Cells")
p1 <- LabelPoints(plot = p1, points = genes.to.label, repel = TRUE)
p2 <- ggplot(avg.cd14.mono, aes(CTRL, STIM)) + geom_point() + ggtitle("CD14 Monocytes")
p2 <- LabelPoints(plot = p2, points = genes.to.label, repel = TRUE)
p1 + p2

如您所見(jiàn),許多相同的基因在這兩種細(xì)胞類型中均被上調(diào),可能代表保守的干擾素應(yīng)答途徑。
因?yàn)槲覀冇行判拇_定出跨條件的常見(jiàn)細(xì)胞類型,所以我們可以詢問(wèn)相同條件下不同條件下哪些基因會(huì)發(fā)生變化。首先,我們?cè)趍eta.data插槽中創(chuàng)建一列,以保存細(xì)胞類型和刺激信息,并將當(dāng)前標(biāo)識(shí)切換到該列。然后,我們用于FindMarkers()`查找受激B細(xì)胞和對(duì)照B細(xì)胞之間不同的基因。請(qǐng)注意,此處顯示的許多頂級(jí)基因與我們之前繪制的核心干擾素應(yīng)答基因相同。此外,我們看到的像CXCL10的基因?qū)魏思?xì)胞和B細(xì)胞干擾素的反應(yīng)也具有特異性,在該列表中也顯示出很高的意義。
immune.combined$celltype.stim <- paste(Idents(immune.combined), immune.combined$stim, sep = "_")
immune.combined$celltype <- Idents(immune.combined)
Idents(immune.combined) <- "celltype.stim"
b.interferon.response <- FindMarkers(immune.combined, ident.1 = "B_STIM", ident.2 = "B_CTRL", verbose = FALSE)
head(b.interferon.response, n = 15)
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## ISG15 5.398167e-155 4.5889194 0.998 0.240 7.586044e-151
## IFIT3 2.209577e-150 4.5032297 0.964 0.052 3.105118e-146
## IFI6 7.060888e-150 4.2375542 0.969 0.080 9.922666e-146
## ISG20 7.147214e-146 2.9387415 1.000 0.672 1.004398e-141
## IFIT1 7.650201e-138 4.1295888 0.914 0.032 1.075083e-133
## MX1 1.124186e-120 3.2883709 0.905 0.115 1.579819e-116
## LY6E 2.504364e-118 3.1297866 0.900 0.152 3.519383e-114
## TNFSF10 9.454398e-110 3.7783774 0.791 0.025 1.328627e-105
## IFIT2 1.672384e-105 3.6569980 0.783 0.035 2.350201e-101
## B2M 5.564362e-96 0.6100242 1.000 1.000 7.819599e-92
## PLSCR1 1.128239e-93 2.8205802 0.796 0.117 1.585514e-89
## IRF7 6.602529e-92 2.5832239 0.838 0.190 9.278534e-88
## CXCL10 4.402118e-82 5.2406913 0.639 0.010 6.186297e-78
## UBE2L6 2.995453e-81 2.1487435 0.852 0.300 4.209510e-77
## PSMB9 1.755809e-76 1.6379482 0.940 0.573 2.467438e-72
可視化基因表達(dá)中這些變化的另一種有用方法是split.by選擇FeaturePlot()或VlnPlot()功能。這將顯示給定基因列表的FeaturePlots,并按分組變量(此處為刺激條件)進(jìn)行劃分。諸如CD3D和GNLY之類的基因是典型的細(xì)胞類型標(biāo)記(對(duì)于T細(xì)胞和NK / CD8 T細(xì)胞),實(shí)際上不受干擾素刺激的影響,并且在對(duì)照組和受刺激組中顯示出相似的基因表達(dá)模式。另一方面,IFI6和ISG15是核心干擾素反應(yīng)基因,因此在所有細(xì)胞類型中均被上調(diào)。最后,CD14和CXCL10是顯示細(xì)胞類型特異性干擾素應(yīng)答的基因。CD14單核細(xì)胞受刺激后,CD14表達(dá)下降,這可能導(dǎo)致在有監(jiān)督的分析框架中進(jìn)行錯(cuò)誤分類,從而強(qiáng)調(diào)了整合分析的價(jià)值。
FeaturePlot(immune.combined, features = c("CD3D", "GNLY", "IFI6"), split.by = "stim", max.cutoff = 3,
cols = c("grey", "red"))
[圖片上傳失敗...(image-275df6-1615650992246)]
plots <- VlnPlot(immune.combined, features = c("LYZ", "ISG15", "CXCL10"), split.by = "stim", group.by = "celltype",
pt.size = 0, combine = FALSE)
wrap_plots(plots = plots, ncol = 1)
