單細(xì)胞/時(shí)空文章代碼復(fù)現(xiàn)——數(shù)據(jù)集整合

我們?cè)谏弦黄幸呀?jīng)對(duì)損傷愈合組的單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)進(jìn)行了細(xì)胞類型注釋,接下來(lái),我們使用整合分析,將文中數(shù)據(jù)與其他文章數(shù)據(jù)整合來(lái)驗(yàn)證注釋的準(zhǔn)確性,并借助已注釋好的單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)去注釋其他的單細(xì)胞數(shù)據(jù)。
本篇除之前一直使用的損傷愈合組數(shù)據(jù)外,其他數(shù)據(jù)來(lái)自GEO數(shù)據(jù)庫(kù)的GSE265972(VU)、GSE165816(DFU)和EBI的E-MTAB-8142(human_adult_skin)。

1. 損傷愈合組與其他人皮膚數(shù)據(jù)的整合

在正文中,作者為了驗(yàn)證細(xì)胞注釋的結(jié)果,將損傷愈合組的單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)與已發(fā)表的一個(gè)成年人皮膚的單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)(E-MTAB-8142)進(jìn)行了整合。該數(shù)據(jù)可以從EBI數(shù)據(jù)庫(kù)中獲取,下載的兩個(gè)文件分別為表達(dá)矩陣文本文件(arrayexpress_counts.txt)和細(xì)胞元信息文件(arrayexpress_metadata.txt),將這兩個(gè)文件讀入后創(chuàng)建為SeuratObject,并使用元信息替換meta.data。我們依然使用harmony對(duì)兩個(gè)數(shù)據(jù)集進(jìn)行整合,在整合時(shí)以樣本維度而不是分組維度。

library(Seurat)
library(harmony)
library(dplyr)
library(ggplot2)

###讀入注釋好的acute_wound數(shù)據(jù)
obj <- readRDS('/path/to/sc_data/cell_type.rds')
obj$group <- 'acute_wound'

###讀入E-MTAB-8142數(shù)據(jù)并轉(zhuǎn)換為Seurat對(duì)象
count <- read.table('E-MTAB-8142/arrayexpress_counts.txt', header=T, row.names='Gene', sep='\t')
pub <- CreateSeuratObject(count, assay='RNA', min.cells=3, min.features=200)
meta <- read.table('E-MTAB-8142/arrayexpress_metadata.txt', header=T, row.names='Cell', sep='\t')
pub@meta.data <- meta

pub$group <- 'human_adult_skin'
colnames(pub@meta.data)[5] <- 'main_type'
colnames(pub@meta.data)[6] <- 'cell_type'

###合并兩個(gè)對(duì)象
DefaultAssay(obj) <- 'RNA'
data_merge <- merge(x=obj, y=pub)

data_merge <- SCTransform(data_merge, verbose = FALSE, return.only.var.genes = FALSE, assay='RNA')
data_merge <- RunPCA(data_merge, verbose = FALSE)
data_merge <- RunUMAP(data_merge, dims=1:30)
DimPlot(data_merge, reduction = "umap", group.by='group', pt.size=0.3, label = F, repel = TRUE)

###整合
data_merge <- RunHarmony(data_merge, group.by.vars = "orig.ident", dims=1:30, reduction.use='pca', assay.use = "SCT")
data_merge <- RunUMAP(data_merge, reduction = "harmony", dims = 1:30)
DimPlot(data_merge, reduction = "umap", group.by='group', pt.size=0.3, label = F, repel = TRUE)
整合前兩組樣本的UMAP圖
整合后兩組樣本的UMAP圖

通過(guò)UMAP圖來(lái)可視化兩個(gè)數(shù)據(jù)集分別的注釋結(jié)果。

DimPlot(data_merge, reduction = "umap", group.by='main_type', 
        split.by='group', pt.size=0.1, label = T, label.size=3, repel = TRUE) + NoLegend()
兩個(gè)數(shù)據(jù)集的細(xì)胞大類
原文圖

對(duì)比原文圖,我們可以看出harmony無(wú)法完全消除兩個(gè)數(shù)據(jù)集之間的差異,說(shuō)明兩個(gè)數(shù)據(jù)集在生物學(xué)背景上是不同的;而不同數(shù)據(jù)集的主要細(xì)胞類型在UMAP圖的分布上又是基本一致的,說(shuō)明整合具有較好的效果。
接下來(lái),文中又使用FindTransferAnchors來(lái)尋找兩個(gè)數(shù)據(jù)集之間的錨點(diǎn),并計(jì)算預(yù)測(cè)得分來(lái)表征細(xì)胞亞類之間的相似性。文中以每個(gè)query的cell_type中預(yù)測(cè)的各reference中細(xì)胞類型得分的中位數(shù)來(lái)指征二者的相似性。

###Find anchor
obj.anchors <- FindTransferAnchors(reference = pub, query = obj, dims = 1:30)
predictions <- TransferData(anchorset = obj.anchors, refdata = pub$cell_type, dims = 1:30)

meta <- cbind(acute_wound=obj@meta.data$cell_type, predictions)
meta <- meta[, c(-2,-43)]
colnames(meta) <- c("acute_wound", "Fibroblast_2", "Fibroblast_3", "Stromal_Schwann", 
                     "Fibroblast_1", "Schwann", "Pericyte", 
                     "Vascular_endothelium_2", "Vascular_endothelium_1", "Vascular_endothelium_3", 
                     "Keratinocyte_postmitotic", "Keratinocyte_CD83", "Keratinocyte_premitotic", "Keratinocyte_mitotic", 
                     "Melanocyte", "Plasma_cell", "Macrophage_2", "DC2","Macrophage_1",
                     "IL23_DC","Monocyte","moDC1","Migratory_cDC","DC1","pDC","moDC2",
                     "LC4","Th","NK2","LC2","KLF10.LC","LC3","LC1",
                     "Mast_cell","Treg","ILC","NK1","Tc",
                     "Lymphatic_endothelium_2","Lymphatic_endothelium_1","NK3")
###計(jì)算median prediction score并生成矩陣
mat <- matrix(, nrow=length(levels(factor(meta$acute_wound))), 
              ncol=length(colnames(meta))-1,
              dimnames=list(levels(factor(meta$acute_wound)),colnames(meta)[-1]))
for (i in levels(factor(meta$acute_wound))) {
    for (j in 2:length(colnames(meta))) {
        mat[i, colnames(meta)[j]] <- median(meta[meta$acute_wound==i, colnames(meta)[j]])
    }
}

這樣我們就獲得了一個(gè)行為acute_wound的細(xì)胞類型,列為human_adult_skin的細(xì)胞類型,填充值為每個(gè)acute_wound的細(xì)胞類型中每個(gè)human_adult_skin細(xì)胞類型預(yù)測(cè)得分的中位數(shù)(0~1)。下面我們使用ComplexHeatmap包做出文中的熱圖。

library(ComplexHeatmap)
###矩陣排序
mat_ord <- mat[c("Bas-I", "Bas-prolif", "Bas-mig", 
                 "Spi-I", "Spi-II", "Spi-mig", 
                 "Gra", "HF", 
                 "MEL", 
                 "FB-I", "FB-III", "FB-prolif", 
                 "PC-vSMC", "LE", "VE", 
                 "NK-cell", "Th", "Plasma_Bcell", "Mast-cell", 
                 "Mono-Mac", "cDC1", "cDC2", "DC3", "LC"), 
               c("Keratinocyte_CD83", "Keratinocyte_mitotic", "Keratinocyte_postmitotic", "Keratinocyte_premitotic",
                 "Melanocyte",
                 "Schwann", "Stromal_Schwann", 
                 "Fibroblast_1", "Fibroblast_2", "Fibroblast_3", 
                 "Pericyte", "pDC",
                 "Vascular_endothelium_1", "Vascular_endothelium_2", "Vascular_endothelium_3", 
                 "Lymphatic_endothelium_1", "Lymphatic_endothelium_2",
                 "ILC", "NK1", "NK2", "NK3",
                 "IL23_DC", "Tc", "Th", "Treg",
                 "Mast_cell", "Plasma_cell",
                 "Macrophage_1", "Macrophage_2", 
                 "DC1", "DC2", "LC1", "LC2", "LC3", "LC4",
                 "Monocyte", "Migratory_cDC", "KLF10.LC", 
                 "moDC1", "moDC2")]
 ###為細(xì)胞類型設(shè)置顏色                
 aw.cols <- c("#d94701", "#fd8d3c", "#fdbe85",
             "#33A02C", "#72BF5A", "#B2DF8A",
             "#f768a1", "#d4b9da",
             "#737373",
             "#0570b0", "#92c5de", "#d1e5f0",
             "#1a9850", "#fb9a99", "#8d4720",
             "#35978f", "#41b6c4", "#80cdc1","#df65b0",
             "#dd3497", "#807dba","#6a3d9a","#9e9ac8", "#b15928")

names(aw.cols) <- levels(factor(meta[meta$group=='acute_wound', 'cell_type'], 
                      levels=c("Bas-I", "Bas-prolif", "Bas-mig", 
                               "Spi-I", "Spi-II", "Spi-mig", 
                               "Gra", "HF", 
                               "MEL", 
                               "FB-I", "FB-III", "FB-prolif", 
                               "PC-vSMC", "LE", "VE", 
                               "NK-cell", "Th", "Plasma_Bcell", "Mast-cell", 
                               "Mono-Mac", "cDC1", "cDC2", "DC3", "LC"),
                      ordered=T))
                      
 pub.cols <- c("#B5B6DB", "#166E8A", "#7B7DC6", "#7DBFD4",
              "#D5DEA0",
              "#A6559B", "#3E64A2", 
              "#A4D7E2", "#6BA0D5", "#3A64AD",
              "#E4DCC0", "#F19570", 
              "#EAD6E8", "#EF93AA", "#E23725", 
              "#0D783D", "#7FBD70",
              "#CEDFEF", "#4DA1D1", "#1B4179", "#0F223E",
              "#86AB3E", "#A0CA7A", "#90398F", "#A67FBA", 
              "#C39371", "#F2DDEB", 
              "#D43A6B", "#B97781", 
              "#DF7A90", "#E49CC4", 
              "#194791", "#BADFE9", "#9CDCED", "#1F5BBD",
              "#852E8A", "#EFC1DA", "#7D85BA", 
              "#4AB6AF", "#68D4CD")
             
names(pub.cols) <- levels(factor(meta[meta$group=='human_adult_skin', 'cell_type'], 
                          levels=c("Keratinocyte_CD83", "Keratinocyte_mitotic", "Keratinocyte_postmitotic", "Keratinocyte_premitotic",
                                   "Melanocyte",
                                   "Schwann", "Stromal_Schwann", 
                                   "Fibroblast_1", "Fibroblast_2", "Fibroblast_3", 
                                   "Pericyte", "pDC",
                                   "Vascular_endothelium_1", "Vascular_endothelium_2", "Vascular_endothelium_3", 
                                   "Lymphatic_endothelium_1", "Lymphatic_endothelium_2",
                                   "ILC", "NK1", "NK2", "NK3",
                                   "IL23_DC", "Tc", "Th", "Treg",
                                   "Mast_cell", "Plasma_cell",
                                   "Macrophage_1", "Macrophage_2",
                                   "DC1", "DC2", "LC1", "LC2", "LC3", "LC4",
                                   "Monocyte", "Migratory_cDC", "KLF10.LC", 
                                   "moDC1", "moDC2"),
                          ordered=T))
###行、列注釋
row_an <- rowAnnotation(foo=rownames(mat_ord), 
                        col=list(foo=aw.cols), show_legend=F, 
                        show_annotation_name = FALSE)
col_an <- HeatmapAnnotation(foo=colnames(mat_ord), 
                            col=list(foo=pub.cols), show_legend=F, 
                            show_annotation_name = FALSE)
###做熱圖
Heatmap(mat_ord, 
        cluster_rows=F, cluster_columns=F, 
        col=colorRamp2(c(0, 0.5, 1),c('#1C1C1C', '#CD3278', '#FFF68F')), 
        rect_gp=gpar(lwd=2, col='grey'), 
        right_annotation=row_an, bottom_annotation=col_an, 
        heatmap_legend_param = list(title = "Median predicted score"))
復(fù)現(xiàn)圖
原文圖

原文應(yīng)該是對(duì)human_adult_skin的細(xì)胞類型進(jìn)行了部分的合并,因此我們做出的圖的橫軸和原文有一定差別,但是這不妨礙我們從熱圖中看出兩個(gè)數(shù)據(jù)集的cell_type注釋具有較高的一致性。

2. 損傷愈合組與VU、DFU的整合

我們首先使用前面介紹的預(yù)處理和組內(nèi)整合方法,將VU和DFU數(shù)據(jù)分別進(jìn)行預(yù)處理和整合。然后使用harmony整合三個(gè)數(shù)據(jù)集,由于我們前面介紹了多次harmony的整合代碼,這里不做贅述,只是注意在整合前可以給三個(gè)分組加上一個(gè)label,以便后續(xù)作圖或進(jìn)行其他分析時(shí)可以根據(jù)label進(jìn)行拆分。

obj1$group <- 'acute_wound'
obj2$group <- 'VU'
obj3$group <- 'DFU'

直接來(lái)看下整合的結(jié)果。

三個(gè)數(shù)據(jù)集整合后的UMAP圖

可以看到harmony較好的將三個(gè)數(shù)據(jù)集整合到一起。

文中首先使用FindTransferAnchorsMapQuery函數(shù)將acute_wound的cell_type label轉(zhuǎn)移至VU和DFU中。之后,又結(jié)合整合結(jié)果的無(wú)監(jiān)督聚類對(duì)細(xì)胞類型進(jìn)行精細(xì)化標(biāo)注,將模糊的細(xì)胞類型分配結(jié)果與各細(xì)胞群體中的主要類型進(jìn)行匹配,來(lái)實(shí)現(xiàn)急慢性傷口間細(xì)胞異質(zhì)性的精準(zhǔn)比較。

但我們?nèi)狈ο鄳?yīng)的背景知識(shí),所以我們僅將label轉(zhuǎn)移和無(wú)監(jiān)督聚類的結(jié)果進(jìn)行展示,后面還會(huì)用到三個(gè)數(shù)據(jù)集的注釋結(jié)果進(jìn)行分析,這里暫且按下不表。

obj1 <- readRDS('/path/to/sc_data/cell_type.rds')
obj1$group <- 'acute_wound'
obj1 <- RunUMAP(obj1, reduction = "harmony", dims = 1:30, return.model = TRUE)

data_merge <- readRDS('three_inte.rds')

data.anchors <- FindTransferAnchors(reference = obj1, query = data_merge, dims = 1:30, reference.reduction = "pca")

data_merge <- MapQuery(anchorset = data.anchors, 
                       reference = obj1, query = data_merge, 
                       refdata = list(celltype = "cell_type"), 
                       reference.reduction = "pca", reduction.model = "umap")

data_merge2 <- MapQuery(anchorset = data.anchors, 
                        reference = obj1, query = data_merge,
                        refdata = list(celltype = "main_type"), 
                        reference.reduction = "pca", reduction.model = "umap")

cols <- c("#1c6597", "#cc6805", "#c6abd4", "#569395", "#dc65d8", "#f99190", "#057605","#575757")
names(cols) <- c('Fibroblast', 'Keratinocyte', 'Myeloid', 'Lymphoid', 'Mast', 'Endothelial', 'Pericyte and SMC', 'Melanocyte')

DimPlot(data_merge2, group.by='predicted.celltype', split.by='group', pt.size=0.3, cols=cols, raster.dpi=c(256, 256))

cols <- c(colorRampPalette(c("#1f78b4", "#a6cee3"))(3),
          colorRampPalette(c("#33a02c", "#b2df8a"))(3), "#df65b0",  
          colorRampPalette(c("#fdbf6f", "#ff7f00"))(3), 
          "#696969", "#d8bfd8", "#008b00", "#fb9a99", "#8b5a2b", "#da70d6",
          colorRampPalette(c("#66cdaa", "#5f9ea0"))(2), 
          colorRampPalette(c("#cab2d6", "#6a3d9a"))(6))
names(cols) <- c("FB-I","FB-III","FB-prolif",
                 "Spi-I","Spi-II","Spi-mig","Gra",
                 "Bas-I","Bas-mig","Bas-prolif",
                 "MEL","HF","PC-vSMC","LE","VE",
                 "Mast-cell","Th","NK-cell",
                 "Mono-Mac","cDC1","cDC2","DC3","Plasma_Bcell","LC")
                 
DimPlot(data_merge, group.by='predicted.celltype', split.by='group', pt.size=0.3, cols=cols, raster.dpi=c(256, 256))
主要細(xì)胞類型在三個(gè)數(shù)據(jù)集中的分布
原文圖
細(xì)胞亞類的UMAP圖

我們暫時(shí)以細(xì)胞亞類label轉(zhuǎn)移的結(jié)果作為VU和DFU的細(xì)胞類型注釋結(jié)果,在后面進(jìn)行分析時(shí)再結(jié)合無(wú)監(jiān)督聚類結(jié)果調(diào)整。

以上,我們就完成了驗(yàn)證注釋準(zhǔn)確性的數(shù)據(jù)整合,和使用已注釋好的單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)去注釋其他的單細(xì)胞數(shù)據(jù)的數(shù)據(jù)整合。

往期內(nèi)容:
單細(xì)胞/時(shí)空文章代碼復(fù)現(xiàn)——數(shù)據(jù)預(yù)處理及整合
單細(xì)胞/時(shí)空文章代碼復(fù)現(xiàn)——細(xì)胞類型注釋及UMAP圖優(yōu)化

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容