我們?cè)谏弦黄幸呀?jīng)對(duì)損傷愈合組的單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)進(jìn)行了細(xì)胞類型注釋,接下來(lái),我們使用整合分析,將文中數(shù)據(jù)與其他文章數(shù)據(jù)整合來(lái)驗(yàn)證注釋的準(zhǔn)確性,并借助已注釋好的單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)去注釋其他的單細(xì)胞數(shù)據(jù)。
本篇除之前一直使用的損傷愈合組數(shù)據(jù)外,其他數(shù)據(jù)來(lái)自GEO數(shù)據(jù)庫(kù)的GSE265972(VU)、GSE165816(DFU)和EBI的E-MTAB-8142(human_adult_skin)。
1. 損傷愈合組與其他人皮膚數(shù)據(jù)的整合
在正文中,作者為了驗(yàn)證細(xì)胞注釋的結(jié)果,將損傷愈合組的單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)與已發(fā)表的一個(gè)成年人皮膚的單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)(E-MTAB-8142)進(jìn)行了整合。該數(shù)據(jù)可以從EBI數(shù)據(jù)庫(kù)中獲取,下載的兩個(gè)文件分別為表達(dá)矩陣文本文件(arrayexpress_counts.txt)和細(xì)胞元信息文件(arrayexpress_metadata.txt),將這兩個(gè)文件讀入后創(chuàng)建為SeuratObject,并使用元信息替換meta.data。我們依然使用harmony對(duì)兩個(gè)數(shù)據(jù)集進(jìn)行整合,在整合時(shí)以樣本維度而不是分組維度。
library(Seurat)
library(harmony)
library(dplyr)
library(ggplot2)
###讀入注釋好的acute_wound數(shù)據(jù)
obj <- readRDS('/path/to/sc_data/cell_type.rds')
obj$group <- 'acute_wound'
###讀入E-MTAB-8142數(shù)據(jù)并轉(zhuǎn)換為Seurat對(duì)象
count <- read.table('E-MTAB-8142/arrayexpress_counts.txt', header=T, row.names='Gene', sep='\t')
pub <- CreateSeuratObject(count, assay='RNA', min.cells=3, min.features=200)
meta <- read.table('E-MTAB-8142/arrayexpress_metadata.txt', header=T, row.names='Cell', sep='\t')
pub@meta.data <- meta
pub$group <- 'human_adult_skin'
colnames(pub@meta.data)[5] <- 'main_type'
colnames(pub@meta.data)[6] <- 'cell_type'
###合并兩個(gè)對(duì)象
DefaultAssay(obj) <- 'RNA'
data_merge <- merge(x=obj, y=pub)
data_merge <- SCTransform(data_merge, verbose = FALSE, return.only.var.genes = FALSE, assay='RNA')
data_merge <- RunPCA(data_merge, verbose = FALSE)
data_merge <- RunUMAP(data_merge, dims=1:30)
DimPlot(data_merge, reduction = "umap", group.by='group', pt.size=0.3, label = F, repel = TRUE)
###整合
data_merge <- RunHarmony(data_merge, group.by.vars = "orig.ident", dims=1:30, reduction.use='pca', assay.use = "SCT")
data_merge <- RunUMAP(data_merge, reduction = "harmony", dims = 1:30)
DimPlot(data_merge, reduction = "umap", group.by='group', pt.size=0.3, label = F, repel = TRUE)


通過(guò)UMAP圖來(lái)可視化兩個(gè)數(shù)據(jù)集分別的注釋結(jié)果。
DimPlot(data_merge, reduction = "umap", group.by='main_type',
split.by='group', pt.size=0.1, label = T, label.size=3, repel = TRUE) + NoLegend()


對(duì)比原文圖,我們可以看出harmony無(wú)法完全消除兩個(gè)數(shù)據(jù)集之間的差異,說(shuō)明兩個(gè)數(shù)據(jù)集在生物學(xué)背景上是不同的;而不同數(shù)據(jù)集的主要細(xì)胞類型在UMAP圖的分布上又是基本一致的,說(shuō)明整合具有較好的效果。
接下來(lái),文中又使用FindTransferAnchors來(lái)尋找兩個(gè)數(shù)據(jù)集之間的錨點(diǎn),并計(jì)算預(yù)測(cè)得分來(lái)表征細(xì)胞亞類之間的相似性。文中以每個(gè)query的cell_type中預(yù)測(cè)的各reference中細(xì)胞類型得分的中位數(shù)來(lái)指征二者的相似性。
###Find anchor
obj.anchors <- FindTransferAnchors(reference = pub, query = obj, dims = 1:30)
predictions <- TransferData(anchorset = obj.anchors, refdata = pub$cell_type, dims = 1:30)
meta <- cbind(acute_wound=obj@meta.data$cell_type, predictions)
meta <- meta[, c(-2,-43)]
colnames(meta) <- c("acute_wound", "Fibroblast_2", "Fibroblast_3", "Stromal_Schwann",
"Fibroblast_1", "Schwann", "Pericyte",
"Vascular_endothelium_2", "Vascular_endothelium_1", "Vascular_endothelium_3",
"Keratinocyte_postmitotic", "Keratinocyte_CD83", "Keratinocyte_premitotic", "Keratinocyte_mitotic",
"Melanocyte", "Plasma_cell", "Macrophage_2", "DC2","Macrophage_1",
"IL23_DC","Monocyte","moDC1","Migratory_cDC","DC1","pDC","moDC2",
"LC4","Th","NK2","LC2","KLF10.LC","LC3","LC1",
"Mast_cell","Treg","ILC","NK1","Tc",
"Lymphatic_endothelium_2","Lymphatic_endothelium_1","NK3")
###計(jì)算median prediction score并生成矩陣
mat <- matrix(, nrow=length(levels(factor(meta$acute_wound))),
ncol=length(colnames(meta))-1,
dimnames=list(levels(factor(meta$acute_wound)),colnames(meta)[-1]))
for (i in levels(factor(meta$acute_wound))) {
for (j in 2:length(colnames(meta))) {
mat[i, colnames(meta)[j]] <- median(meta[meta$acute_wound==i, colnames(meta)[j]])
}
}
這樣我們就獲得了一個(gè)行為acute_wound的細(xì)胞類型,列為human_adult_skin的細(xì)胞類型,填充值為每個(gè)acute_wound的細(xì)胞類型中每個(gè)human_adult_skin細(xì)胞類型預(yù)測(cè)得分的中位數(shù)(0~1)。下面我們使用ComplexHeatmap包做出文中的熱圖。
library(ComplexHeatmap)
###矩陣排序
mat_ord <- mat[c("Bas-I", "Bas-prolif", "Bas-mig",
"Spi-I", "Spi-II", "Spi-mig",
"Gra", "HF",
"MEL",
"FB-I", "FB-III", "FB-prolif",
"PC-vSMC", "LE", "VE",
"NK-cell", "Th", "Plasma_Bcell", "Mast-cell",
"Mono-Mac", "cDC1", "cDC2", "DC3", "LC"),
c("Keratinocyte_CD83", "Keratinocyte_mitotic", "Keratinocyte_postmitotic", "Keratinocyte_premitotic",
"Melanocyte",
"Schwann", "Stromal_Schwann",
"Fibroblast_1", "Fibroblast_2", "Fibroblast_3",
"Pericyte", "pDC",
"Vascular_endothelium_1", "Vascular_endothelium_2", "Vascular_endothelium_3",
"Lymphatic_endothelium_1", "Lymphatic_endothelium_2",
"ILC", "NK1", "NK2", "NK3",
"IL23_DC", "Tc", "Th", "Treg",
"Mast_cell", "Plasma_cell",
"Macrophage_1", "Macrophage_2",
"DC1", "DC2", "LC1", "LC2", "LC3", "LC4",
"Monocyte", "Migratory_cDC", "KLF10.LC",
"moDC1", "moDC2")]
###為細(xì)胞類型設(shè)置顏色
aw.cols <- c("#d94701", "#fd8d3c", "#fdbe85",
"#33A02C", "#72BF5A", "#B2DF8A",
"#f768a1", "#d4b9da",
"#737373",
"#0570b0", "#92c5de", "#d1e5f0",
"#1a9850", "#fb9a99", "#8d4720",
"#35978f", "#41b6c4", "#80cdc1","#df65b0",
"#dd3497", "#807dba","#6a3d9a","#9e9ac8", "#b15928")
names(aw.cols) <- levels(factor(meta[meta$group=='acute_wound', 'cell_type'],
levels=c("Bas-I", "Bas-prolif", "Bas-mig",
"Spi-I", "Spi-II", "Spi-mig",
"Gra", "HF",
"MEL",
"FB-I", "FB-III", "FB-prolif",
"PC-vSMC", "LE", "VE",
"NK-cell", "Th", "Plasma_Bcell", "Mast-cell",
"Mono-Mac", "cDC1", "cDC2", "DC3", "LC"),
ordered=T))
pub.cols <- c("#B5B6DB", "#166E8A", "#7B7DC6", "#7DBFD4",
"#D5DEA0",
"#A6559B", "#3E64A2",
"#A4D7E2", "#6BA0D5", "#3A64AD",
"#E4DCC0", "#F19570",
"#EAD6E8", "#EF93AA", "#E23725",
"#0D783D", "#7FBD70",
"#CEDFEF", "#4DA1D1", "#1B4179", "#0F223E",
"#86AB3E", "#A0CA7A", "#90398F", "#A67FBA",
"#C39371", "#F2DDEB",
"#D43A6B", "#B97781",
"#DF7A90", "#E49CC4",
"#194791", "#BADFE9", "#9CDCED", "#1F5BBD",
"#852E8A", "#EFC1DA", "#7D85BA",
"#4AB6AF", "#68D4CD")
names(pub.cols) <- levels(factor(meta[meta$group=='human_adult_skin', 'cell_type'],
levels=c("Keratinocyte_CD83", "Keratinocyte_mitotic", "Keratinocyte_postmitotic", "Keratinocyte_premitotic",
"Melanocyte",
"Schwann", "Stromal_Schwann",
"Fibroblast_1", "Fibroblast_2", "Fibroblast_3",
"Pericyte", "pDC",
"Vascular_endothelium_1", "Vascular_endothelium_2", "Vascular_endothelium_3",
"Lymphatic_endothelium_1", "Lymphatic_endothelium_2",
"ILC", "NK1", "NK2", "NK3",
"IL23_DC", "Tc", "Th", "Treg",
"Mast_cell", "Plasma_cell",
"Macrophage_1", "Macrophage_2",
"DC1", "DC2", "LC1", "LC2", "LC3", "LC4",
"Monocyte", "Migratory_cDC", "KLF10.LC",
"moDC1", "moDC2"),
ordered=T))
###行、列注釋
row_an <- rowAnnotation(foo=rownames(mat_ord),
col=list(foo=aw.cols), show_legend=F,
show_annotation_name = FALSE)
col_an <- HeatmapAnnotation(foo=colnames(mat_ord),
col=list(foo=pub.cols), show_legend=F,
show_annotation_name = FALSE)
###做熱圖
Heatmap(mat_ord,
cluster_rows=F, cluster_columns=F,
col=colorRamp2(c(0, 0.5, 1),c('#1C1C1C', '#CD3278', '#FFF68F')),
rect_gp=gpar(lwd=2, col='grey'),
right_annotation=row_an, bottom_annotation=col_an,
heatmap_legend_param = list(title = "Median predicted score"))


原文應(yīng)該是對(duì)human_adult_skin的細(xì)胞類型進(jìn)行了部分的合并,因此我們做出的圖的橫軸和原文有一定差別,但是這不妨礙我們從熱圖中看出兩個(gè)數(shù)據(jù)集的cell_type注釋具有較高的一致性。
2. 損傷愈合組與VU、DFU的整合
我們首先使用前面介紹的預(yù)處理和組內(nèi)整合方法,將VU和DFU數(shù)據(jù)分別進(jìn)行預(yù)處理和整合。然后使用harmony整合三個(gè)數(shù)據(jù)集,由于我們前面介紹了多次harmony的整合代碼,這里不做贅述,只是注意在整合前可以給三個(gè)分組加上一個(gè)label,以便后續(xù)作圖或進(jìn)行其他分析時(shí)可以根據(jù)label進(jìn)行拆分。
obj1$group <- 'acute_wound'
obj2$group <- 'VU'
obj3$group <- 'DFU'
直接來(lái)看下整合的結(jié)果。

可以看到harmony較好的將三個(gè)數(shù)據(jù)集整合到一起。
文中首先使用FindTransferAnchors和MapQuery函數(shù)將acute_wound的cell_type label轉(zhuǎn)移至VU和DFU中。之后,又結(jié)合整合結(jié)果的無(wú)監(jiān)督聚類對(duì)細(xì)胞類型進(jìn)行精細(xì)化標(biāo)注,將模糊的細(xì)胞類型分配結(jié)果與各細(xì)胞群體中的主要類型進(jìn)行匹配,來(lái)實(shí)現(xiàn)急慢性傷口間細(xì)胞異質(zhì)性的精準(zhǔn)比較。
但我們?nèi)狈ο鄳?yīng)的背景知識(shí),所以我們僅將label轉(zhuǎn)移和無(wú)監(jiān)督聚類的結(jié)果進(jìn)行展示,后面還會(huì)用到三個(gè)數(shù)據(jù)集的注釋結(jié)果進(jìn)行分析,這里暫且按下不表。
obj1 <- readRDS('/path/to/sc_data/cell_type.rds')
obj1$group <- 'acute_wound'
obj1 <- RunUMAP(obj1, reduction = "harmony", dims = 1:30, return.model = TRUE)
data_merge <- readRDS('three_inte.rds')
data.anchors <- FindTransferAnchors(reference = obj1, query = data_merge, dims = 1:30, reference.reduction = "pca")
data_merge <- MapQuery(anchorset = data.anchors,
reference = obj1, query = data_merge,
refdata = list(celltype = "cell_type"),
reference.reduction = "pca", reduction.model = "umap")
data_merge2 <- MapQuery(anchorset = data.anchors,
reference = obj1, query = data_merge,
refdata = list(celltype = "main_type"),
reference.reduction = "pca", reduction.model = "umap")
cols <- c("#1c6597", "#cc6805", "#c6abd4", "#569395", "#dc65d8", "#f99190", "#057605","#575757")
names(cols) <- c('Fibroblast', 'Keratinocyte', 'Myeloid', 'Lymphoid', 'Mast', 'Endothelial', 'Pericyte and SMC', 'Melanocyte')
DimPlot(data_merge2, group.by='predicted.celltype', split.by='group', pt.size=0.3, cols=cols, raster.dpi=c(256, 256))
cols <- c(colorRampPalette(c("#1f78b4", "#a6cee3"))(3),
colorRampPalette(c("#33a02c", "#b2df8a"))(3), "#df65b0",
colorRampPalette(c("#fdbf6f", "#ff7f00"))(3),
"#696969", "#d8bfd8", "#008b00", "#fb9a99", "#8b5a2b", "#da70d6",
colorRampPalette(c("#66cdaa", "#5f9ea0"))(2),
colorRampPalette(c("#cab2d6", "#6a3d9a"))(6))
names(cols) <- c("FB-I","FB-III","FB-prolif",
"Spi-I","Spi-II","Spi-mig","Gra",
"Bas-I","Bas-mig","Bas-prolif",
"MEL","HF","PC-vSMC","LE","VE",
"Mast-cell","Th","NK-cell",
"Mono-Mac","cDC1","cDC2","DC3","Plasma_Bcell","LC")
DimPlot(data_merge, group.by='predicted.celltype', split.by='group', pt.size=0.3, cols=cols, raster.dpi=c(256, 256))



我們暫時(shí)以細(xì)胞亞類label轉(zhuǎn)移的結(jié)果作為VU和DFU的細(xì)胞類型注釋結(jié)果,在后面進(jìn)行分析時(shí)再結(jié)合無(wú)監(jiān)督聚類結(jié)果調(diào)整。
以上,我們就完成了驗(yàn)證注釋準(zhǔn)確性的數(shù)據(jù)整合,和使用已注釋好的單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)去注釋其他的單細(xì)胞數(shù)據(jù)的數(shù)據(jù)整合。
往期內(nèi)容:
單細(xì)胞/時(shí)空文章代碼復(fù)現(xiàn)——數(shù)據(jù)預(yù)處理及整合
單細(xì)胞/時(shí)空文章代碼復(fù)現(xiàn)——細(xì)胞類型注釋及UMAP圖優(yōu)化