Seurat4.0系列教程2:多模式數(shù)據(jù)聯(lián)合分析

加載數(shù)據(jù)

能夠同時(shí)測(cè)量來(lái)自同一細(xì)胞的多種數(shù)據(jù)類型,稱為多模式聯(lián)合分析,代表了單細(xì)胞基因組學(xué)的一個(gè)新的和令人興奮的前沿。例如CITE-seq能夠同時(shí)測(cè)量來(lái)自同一細(xì)胞的轉(zhuǎn)錄組和細(xì)胞表面蛋白質(zhì)。其他令人興奮的技術(shù),如[10 XGenomics],允許對(duì) scRNA-seq和scATAC-seq進(jìn)行配對(duì)測(cè)量。 Seurat 4,可以無(wú)縫存儲(chǔ)、分析和探索多樣化的多模式細(xì)胞數(shù)據(jù)集。

在這里,我們分析8,617個(gè)臍帶血單核細(xì)胞(CBMCs)的數(shù)據(jù)集,其中轉(zhuǎn)錄組與11種表面蛋白質(zhì)的豐度配對(duì),對(duì)這些蛋白質(zhì)的水平與DNA進(jìn)行量化。
首先,我們加載兩個(gè)計(jì)數(shù)矩陣:一個(gè)用于RNA測(cè)量,另一個(gè)用于抗體衍生標(biāo)簽(ADT)。您可以在此處下載ADT文件RNA文件

library(Seurat)
library(ggplot2)
library(patchwork)
# Load in the RNA UMI matrix

# Note that this dataset also contains ~5% of mouse cells, which we can use as negative controls
# for the protein measurements. For this reason, the gene expression matrix has HUMAN_ or MOUSE_
# appended to the beginning of each gene.
cbmc.rna <- as.sparse(read.csv(file = "../data/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz", sep = ",", 
    header = TRUE, row.names = 1))

# To make life a bit easier going forward, we're going to discard all but the top 100 most
# highly expressed mouse genes, and remove the 'HUMAN_' from the CITE-seq prefix
cbmc.rna <- CollapseSpeciesExpressionMatrix(cbmc.rna)

# Load in the ADT UMI matrix
cbmc.adt <- as.sparse(read.csv(file = "../data/GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz", sep = ",", 
    header = TRUE, row.names = 1))

# Note that since measurements were made in the same cells, the two matrices have identical
# column names
all.equal(colnames(cbmc.rna), colnames(cbmc.adt))
## [1] TRUE

設(shè)置seurat對(duì)象,添加RNA和蛋白質(zhì)數(shù)據(jù)

現(xiàn)在,我們創(chuàng)建一個(gè) Seurat 對(duì)象,并將 ADT 數(shù)據(jù)添加為第二個(gè)檢測(cè)

# creates a Seurat object based on the scRNA-seq data
cbmc <- CreateSeuratObject(counts = cbmc.rna)

# We can see that by default, the cbmc object contains an assay storing RNA measurement
Assays(cbmc)

## [1] "RNA"

# create a new assay to store ADT information
adt_assay <- CreateAssayObject(counts = cbmc.adt)

# add this assay to the previously created Seurat object
cbmc[["ADT"]] <- adt_assay

# Validate that the object now contains multiple assays
Assays(cbmc)

## [1] "RNA" "ADT"

# Extract a list of features measured in the ADT assay
rownames(cbmc[["ADT"]])

##  [1] "CD3"    "CD4"    "CD8"    "CD45RA" "CD56"   "CD16"   "CD10"   "CD11c" 
##  [9] "CD14"   "CD19"   "CD34"   "CCR5"   "CCR7"

# Note that we can easily switch back and forth between the two assays to specify the default
# for visualization and analysis

# List the current default assay
DefaultAssay(cbmc)

## [1] "RNA"
# Switch the default to ADT
DefaultAssay(cbmc) <- "ADT"
DefaultAssay(cbmc)
## [1] "ADT"

基于 scRNA-seq 數(shù)據(jù)進(jìn)行細(xì)胞聚類

下面的步驟表示基于 scRNA-seq 數(shù)據(jù)的 PBMC 的快速聚類。有關(guān)單個(gè)步驟或更高級(jí)選項(xiàng)的更多詳細(xì)信息,請(qǐng)參閱此處的 PBMC 聚類引導(dǎo)教程

# Note that all operations below are performed on the RNA assay Set and verify that the default
# assay is RNA
DefaultAssay(cbmc) <- "RNA"
DefaultAssay(cbmc)
## [1] "RNA"
# perform visualization and clustering steps
cbmc <- NormalizeData(cbmc)
cbmc <- FindVariableFeatures(cbmc)
cbmc <- ScaleData(cbmc)
cbmc <- RunPCA(cbmc, verbose = FALSE)
cbmc <- FindNeighbors(cbmc, dims = 1:30)
cbmc <- FindClusters(cbmc, resolution = 0.8, verbose = FALSE)
cbmc <- RunUMAP(cbmc, dims = 1:30)
DimPlot(cbmc, label = TRUE)
image.png

并排可視化多模式數(shù)據(jù)

現(xiàn)在,我們已經(jīng)從 scRNA-seq 文件中獲得了聚類,我們可以在數(shù)據(jù)集中可視化蛋白質(zhì)或RNA分子的表達(dá)。重要的是,Seurat 提供了在模式之間切換的幾種方法,并指定您感興趣的分析或可視化模式。這一點(diǎn)尤其重要,因?yàn)樵谀承┣闆r下,相同的功能可能以多種方式存在 - 例如,此數(shù)據(jù)集包含 B 細(xì)胞標(biāo)記 CD19(蛋白質(zhì)和 RNA 水平)的獨(dú)立測(cè)量。

# Normalize ADT data,
DefaultAssay(cbmc) <- "ADT"
cbmc <- NormalizeData(cbmc, normalization.method = "CLR", margin = 2)
DefaultAssay(cbmc) <- "RNA"

# Note that the following command is an alternative but returns the same result
cbmc <- NormalizeData(cbmc, normalization.method = "CLR", margin = 2, assay = "ADT")

# Now, we will visualize CD14 levels for RNA and protein By setting the default assay, we can
# visualize one or the other
DefaultAssay(cbmc) <- "ADT"
p1 <- FeaturePlot(cbmc, "CD19", cols = c("lightgrey", "darkgreen")) + ggtitle("CD19 protein")
DefaultAssay(cbmc) <- "RNA"
p2 <- FeaturePlot(cbmc, "CD19") + ggtitle("CD19 RNA")

# place plots side-by-side
p1 | p2
image.png
# Alternately, we can use specific assay keys to specify a specific modality Identify the key
# for the RNA and protein assays
Key(cbmc[["RNA"]])

## [1] "rna_"

Key(cbmc[["ADT"]])

## [1] "adt_"

# Now, we can include the key in the feature name, which overrides the default assay
p1 <- FeaturePlot(cbmc, "adt_CD19", cols = c("lightgrey", "darkgreen")) + ggtitle("CD19 protein")
p2 <- FeaturePlot(cbmc, "rna_CD19") + ggtitle("CD19 RNA")
p1 | p2
image

識(shí)別 scRNA-seq 亞群的細(xì)胞表面marker

我們可以利用我們的配對(duì) CITE-seq 測(cè)量來(lái)幫助注釋源自 scRNA-seq 的cluster,并識(shí)別蛋白質(zhì)和RNA標(biāo)記。

# as we know that CD19 is a B cell marker, we can identify cluster 6 as expressing CD19 on the
# surface
VlnPlot(cbmc, "adt_CD19")
image
# we can also identify alternative protein and RNA markers for this cluster through differential
# expression
adt_markers <- FindMarkers(cbmc, ident.1 = 5, assay = "ADT")
rna_markers <- FindMarkers(cbmc, ident.1 = 5, assay = "RNA")

head(adt_markers)

##                p_val avg_log2FC pct.1 pct.2     p_val_adj
## CD10   1.161293e-206  0.4512418     1     1 1.509680e-205
## CCR7   2.052649e-189  0.2835441     1     1 2.668443e-188
## CD34   9.647958e-188  0.4379917     1     1 1.254234e-186
## CCR5   4.601039e-150  0.2871257     1     1 5.981350e-149
## CD45RA  6.699498e-86 -2.2198583     1     1  8.709348e-85
## CD14    3.093576e-62 -0.7499958     1     1  4.021649e-61

head(rna_markers)

##               p_val avg_log2FC pct.1 pct.2 p_val_adj
## AC109351.1        0  0.3203893 0.265 0.005         0
## CTD-2090I13.1     0  2.0024376 0.972 0.062         0
## DCAF5             0  0.6637418 0.619 0.055         0
## DYNLL2            0  2.0387603 0.984 0.094         0
## FAM186B           0  0.3000479 0.244 0.002         0
## HIST2H2AB         0  1.3104432 0.812 0.013         0

多模式數(shù)據(jù)的其他可視化方法

# Draw ADT scatter plots (like biaxial plots for FACS). Note that you can even 'gate' cells if
# desired by using HoverLocator and FeatureLocator
FeatureScatter(cbmc, feature1 = "adt_CD19", feature2 = "adt_CD3")
image
# view relationship between protein and RNA
FeatureScatter(cbmc, feature1 = "adt_CD3", feature2 = "rna_CD3E")
image
FeatureScatter(cbmc, feature1 = "adt_CD4", feature2 = "adt_CD8")
image
# Let's look at the raw (non-normalized) ADT counts. You can see the values are quite high,
# particularly in comparison to RNA values. This is due to the significantly higher protein copy
# number in cells, which significantly reduces 'drop-out' in ADT data
FeatureScatter(cbmc, feature1 = "adt_CD4", feature2 = "adt_CD8", slot = "counts")
image

加載來(lái)自 10 xGenomics的多模式數(shù)據(jù)

Seurat 還能夠分析使用 CellRanger v3 處理的多摸式10x實(shí)驗(yàn)的數(shù)據(jù):例如,我們使用 7,900 個(gè)外周血單核細(xì)胞 (PBMC) 的數(shù)據(jù)集重新創(chuàng)建上述圖,可從此處的 10X Genomics中免費(fèi)獲得。

pbmc10k.data <- Read10X(data.dir = "../data/pbmc10k/filtered_feature_bc_matrix/")
rownames(x = pbmc10k.data[["Antibody Capture"]]) <- gsub(pattern = "_[control_]*TotalSeqB", replacement = "", 
    x = rownames(x = pbmc10k.data[["Antibody Capture"]]))

pbmc10k <- CreateSeuratObject(counts = pbmc10k.data[["Gene Expression"]], min.cells = 3, min.features = 200)
pbmc10k <- NormalizeData(pbmc10k)
pbmc10k[["ADT"]] <- CreateAssayObject(pbmc10k.data[["Antibody Capture"]][, colnames(x = pbmc10k)])
pbmc10k <- NormalizeData(pbmc10k, assay = "ADT", normalization.method = "CLR")

plot1 <- FeatureScatter(pbmc10k, feature1 = "adt_CD19", feature2 = "adt_CD3", pt.size = 1)
plot2 <- FeatureScatter(pbmc10k, feature1 = "adt_CD4", feature2 = "adt_CD8a", pt.size = 1)
plot3 <- FeatureScatter(pbmc10k, feature1 = "adt_CD3", feature2 = "CD3E", pt.size = 1)
(plot1 + plot2 + plot3) & NoLegend()
image.png
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容