簡(jiǎn)介
??目前單細(xì)胞數(shù)據(jù)分析有不少的挑戰(zhàn),比如稀疏矩陣,超高維度數(shù)據(jù)降維,批次效應(yīng)校正,聚類算法的選擇,多組學(xué)數(shù)據(jù)整合等挑戰(zhàn)。有需求就需要造輪子,當(dāng)前也有了部分工具可以處理部分挑戰(zhàn),但準(zhǔn)確性還需要具體評(píng)估,本篇文章介紹實(shí)操整合多個(gè)批次數(shù)據(jù)工具SeuratV3中嵌合的整合算法處理官網(wǎng)實(shí)示例PBMC刺激及對(duì)照實(shí)驗(yàn)組數(shù)據(jù),SeuratV3簡(jiǎn)介及實(shí)操一文中有對(duì)該工具的具體介紹及常規(guī)實(shí)操。
整合目標(biāo)
概述使用Seurat集成過程可能實(shí)現(xiàn)的復(fù)雜細(xì)胞類型的比較分析。
- 識(shí)別兩個(gè)數(shù)據(jù)集中存在的單元格類型
- 獲得在對(duì)照和受刺激細(xì)胞中都保守的細(xì)胞類型標(biāo)記
- 比較數(shù)據(jù)集以找到刺激的細(xì)胞類型特異性反應(yīng)
生成Seurat對(duì)象
??基因表達(dá)矩陣可以在官網(wǎng)提供路徑找到。首先讀入兩個(gè)計(jì)數(shù)矩陣并設(shè)置Seurat對(duì)象。
library(Seurat)
library(cowplot)
ctrl.data <- read.table(file = "../data/immune_control_expression_matrix.txt.gz", sep = "\t")
stim.data <- read.table(file = "../data/immune_stimulated_expression_matrix.txt.gz", sep = "\t")
# Set up control object
ctrl <- CreateSeuratObject(counts = ctrl.data, project = "IMMUNE_CTRL", min.cells = 5)
ctrl$stim <- "CTRL"
ctrl <- subset(ctrl, subset = nFeature_RNA > 500)
ctrl <- NormalizeData(ctrl, verbose = FALSE)
ctrl <- FindVariableFeatures(ctrl, selection.method = "vst", nfeatures = 2000)
# Set up stimulated object
stim <- CreateSeuratObject(counts = stim.data, project = "IMMUNE_STIM", min.cells = 5)
stim$stim <- "STIM"
stim <- subset(stim, subset = nFeature_RNA > 500)
stim <- NormalizeData(stim, verbose = FALSE)
stim <- FindVariableFeatures(stim, selection.method = "vst", nfeatures = 2000)
運(yùn)行集成工具
??然后使用FindIntegrationAnchors函數(shù)識(shí)別錨點(diǎn),該函數(shù)將Seurat對(duì)象列表作為輸入,并使用這些錨點(diǎn)將兩個(gè)數(shù)據(jù)集集成在一起IntegrateData。
immune.anchors <- FindIntegrationAnchors(object.list = list(ctrl, stim), dims = 1:20)
immune.combined <- IntegrateData(anchorset = immune.anchors, dims = 1:20)
綜合分析
??現(xiàn)在可以對(duì)所有細(xì)胞進(jìn)行單一的綜合分析!
DefaultAssay(immune.combined) <- "integrated"
# Run the standard workflow for visualization and clustering
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
# t-SNE and Clustering
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:20)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:20)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap", label = TRUE)
plot_grid(p1, p2)

??為了并排顯示這兩個(gè)條件,可以使用split.by參數(shù)來顯示由簇著色的每個(gè)條件。
DimPlot(immune.combined, reduction = "umap", split.by = "stim")
識(shí)別保守的細(xì)胞類型標(biāo)記
??為了鑒定在條件下保守的經(jīng)典細(xì)胞類型標(biāo)記基因,提供了該FindConservedMarkers功能。該功能對(duì)每個(gè)數(shù)據(jù)集/組執(zhí)行差異基因表達(dá)測(cè)試,并使用MetaDE R包中的元分析方法組合p值。例如,可以計(jì)算保守標(biāo)記的基因,而不考慮簇6(NK細(xì)胞)中的刺激條件。
DefaultAssay(immune.combined) <- "RNA"
nk.markers <- FindConservedMarkers(immune.combined, ident.1 = 7, grouping.var = "stim", verbose = FALSE)
head(nk.markers)
??可以探索每個(gè)簇的這些標(biāo)記基因,并使用它們來注釋我們的簇作為特定的細(xì)胞類型。
FeaturePlot(immune.combined, features = c("CD3D", "SELL", "CREM", "CD8A", "GNLY", "CD79A", "FCGR3A",
"CCL2", "PPBP"), min.cutoff = "q9")

immune.combined <- RenameIdents(immune.combined, `0` = "CD14 Mono", `1` = "CD4 Naive T", `2` = "CD4 Memory T",
`3` = "CD16 Mono", `4` = "B", `5` = "CD8 T", `6` = "T activated", `7` = "NK", `8` = "DC", `9` = "B Activated",
`10` = "Mk", `11` = "pDC", `12` = "Eryth", `13` = "Mono/Mk Doublets")
DimPlot(immune.combined, label = TRUE)

??DotPlot具有該split.by參數(shù)的功能可用于跨條件查看保守細(xì)胞類型標(biāo)記,顯示表達(dá)水平和表達(dá)任何給定基因的簇中細(xì)胞的百分比。在這里為13個(gè)簇中的每一個(gè)繪制2-3個(gè)強(qiáng)標(biāo)記基因。
Idents(immune.combined) <- factor(Idents(immune.combined), levels = c("Mono/Mk Doublets", "pDC",
"Eryth", "Mk", "DC", "CD14 Mono", "CD16 Mono", "B Activated", "B", "CD8 T", "NK", "T activated",
"CD4 Naive T", "CD4 Memory T"))
markers.to.plot <- c("CD3D", "CREM", "HSPH1", "SELL", "GIMAP5", "CACYBP", "GNLY", "NKG7", "CCL5",
"CD8A", "MS4A1", "CD79A", "MIR155HG", "NME1", "FCGR3A", "VMO1", "CCL2", "S100A9", "HLA-DQA1",
"GPR183", "PPBP", "GNG11", "HBA2", "HBB", "TSPAN13", "IL3RA", "IGJ")
DotPlot(immune.combined, features = rev(markers.to.plot), cols = c("blue", "red"), dot.scale = 8,
split.by = "stim") + RotatedAxis()

確定跨條件的差異表達(dá)基因
??現(xiàn)在已經(jīng)對(duì)受刺激細(xì)胞和對(duì)照細(xì)胞進(jìn)行了比對(duì),可以開始進(jìn)行比較分析并觀察刺激引起的差異。廣泛觀察這些變化的一種方法是繪制受刺激細(xì)胞和對(duì)照細(xì)胞的平均表達(dá),并在散點(diǎn)圖上尋找視覺異常值的基因。在這里,采用受刺激和對(duì)照幼稚T細(xì)胞和CD14單核細(xì)胞群的平均表達(dá)并產(chǎn)生散點(diǎn)圖,突出表現(xiàn)出對(duì)干擾素刺激的顯著反應(yīng)的基因。
t.cells <- subset(immune.combined, idents = "CD4 Naive T")
Idents(t.cells) <- "stim"
avg.t.cells <- log1p(AverageExpression(t.cells, verbose = FALSE)$RNA)
avg.t.cells$gene <- rownames(avg.t.cells)
cd14.mono <- subset(immune.combined, idents = "CD14 Mono")
Idents(cd14.mono) <- "stim"
avg.cd14.mono <- log1p(AverageExpression(cd14.mono, verbose = FALSE)$RNA)
avg.cd14.mono$gene <- rownames(avg.cd14.mono)
genes.to.label = c("ISG15", "LY6E", "IFI6", "ISG20", "MX1", "IFIT2", "IFIT1", "CXCL10", "CCL8")
p1 <- ggplot(avg.t.cells, aes(CTRL, STIM)) + geom_point() + ggtitle("CD4 Naive T Cells")
p1 <- LabelPoints(plot = p1, points = genes.to.label, repel = TRUE)
p2 <- ggplot(avg.cd14.mono, aes(CTRL, STIM)) + geom_point() + ggtitle("CD14 Monocytes")
p2 <- LabelPoints(plot = p2, points = genes.to.label, repel = TRUE)
plot_grid(p1, p2)

??許多相同的基因在這兩種細(xì)胞類型中都被上調(diào),并且可能代表一種保守的干擾素應(yīng)答途徑。
因?yàn)槲覀冇行判脑谡麄€(gè)條件下鑒定出共同的細(xì)胞類型,可以詢問在相同類型的細(xì)胞的不同條件下哪些基因會(huì)發(fā)生變化。首先在meta.data插槽中創(chuàng)建一個(gè)列,以保存單元格類型和刺激信息,并將當(dāng)前標(biāo)識(shí)切換到該列。然后FindMarkers用來找到刺激和對(duì)照B細(xì)胞之間不同的基因。請(qǐng)注意,此處顯示的許多頂級(jí)基因與我們之前作為核心干擾素應(yīng)答基因繪制的基因相同。此外,看到的像CXCL10這樣的基因?qū)魏思?xì)胞和B細(xì)胞干擾素反應(yīng)具有特異性,在該列表中也顯示出非常重要的基因。
immune.combined$celltype.stim <- paste(Idents(immune.combined), immune.combined$stim, sep = "_")
immune.combined$celltype <- Idents(immune.combined)
Idents(immune.combined) <- "celltype.stim"
b.interferon.response <- FindMarkers(immune.combined, ident.1 = "B_STIM", ident.2 = "B_CTRL", verbose = FALSE)
head(b.interferon.response, n = 15)
??另一種可視化基因表達(dá)變化的有用方法是split.by選擇FeaturePlot或VlnPlot功能。這將顯示給定基因列表的FeaturePlots,由分組變量(這里的刺激條件)分開。諸如CD3D和GNLY的基因是規(guī)范細(xì)胞類型標(biāo)記(對(duì)于T細(xì)胞和NK / CD8T細(xì)胞),其實(shí)際上不受干擾素刺激的影響并且在對(duì)照和刺激組中顯示相似的基因表達(dá)模式。另一方面,IFI6和ISG15是核心干擾素應(yīng)答基因,并且在所有細(xì)胞類型中相應(yīng)地上調(diào)。最后,CD14和CXCL10是顯示細(xì)胞類型特異性干擾素應(yīng)答的基因。CD14單核細(xì)胞刺激后CD14表達(dá)降低,這可能導(dǎo)致監(jiān)督分析框架中的錯(cuò)誤分類,強(qiáng)調(diào)綜合分析的價(jià)值。
FeaturePlot(immune.combined, features = c("CD3D", "GNLY", "IFI6"), split.by = "stim", max.cutoff = 3,
cols = c("grey", "red"))

plots <- VlnPlot(immune.combined, features = c("LYZ", "ISG15", "CXCL10"), split.by = "stim", group.by = "celltype",
pt.size = 0, combine = FALSE)
CombinePlots(plots = plots, ncol = 1)

參考材料:
https://satijalab.org/seurat/v3.0/immune_alignment.html