Compiled: 2021-03-18
Source: vignettes/multimodal_vignette.Rmd
教程鏈接:https://satijalab.org/seurat/articles/multimodal_vignette.html
對(duì)同一個(gè)細(xì)胞同時(shí)測(cè)量多種數(shù)據(jù)類型,稱為multimodal analysis。例如,
- CITE-seq可以同時(shí)測(cè)量同一細(xì)胞的轉(zhuǎn)錄組和細(xì)胞表面蛋白。
- 10x multiome試劑盒允許成對(duì)測(cè)量細(xì)胞轉(zhuǎn)錄組和染色質(zhì)可及性(即scRNA-seq+scATAC-seq)
我們?cè)O(shè)計(jì)了Seurat4以實(shí)現(xiàn)對(duì)多種多模態(tài)單細(xì)胞數(shù)據(jù)集的無(wú)縫存儲(chǔ)、分析和探索。
1.數(shù)據(jù)介紹和加載
此篇教程,我們使用8,617 cord blood mononuclear cells (CBMCs)臍帶血單個(gè)核細(xì)胞,配對(duì)的單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)和11個(gè)表面蛋白。
首先,我們加載進(jìn)來(lái)兩個(gè)count矩陣,一個(gè)是RNA,一個(gè)是antibody-derived tags (ADT)。數(shù)據(jù)可以從這里下載:
- RNA:ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz
- ADT:ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz
# Note that this dataset also contains ~5% of mouse cells, which we can use as negative controls
# for the protein measurements. For this reason, the gene expression matrix has HUMAN_ or MOUSE_
# appended to the beginning of each gene.
cbmc.rna <- as.sparse(read.csv(file = "data/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz", sep = ",", header = TRUE, row.names = 1))
# To make life a bit easier going forward, we're going to discard all but the top 100 most
# highly expressed mouse genes, and remove the 'HUMAN_' from the CITE-seq prefix
cbmc.rna <- CollapseSpeciesExpressionMatrix(cbmc.rna)
# Load in the ADT UMI matrix
cbmc.adt <- as.sparse(read.csv(file = "data/GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz", sep = ",", header = TRUE, row.names = 1))
# Note that since measurements were made in the same cells, the two matrices have identical
# column names
all.equal(colnames(cbmc.rna), colnames(cbmc.adt))
2.構(gòu)造Seurat對(duì)象
現(xiàn)在,我們構(gòu)造一個(gè)Seurat對(duì)象,然后將ADT數(shù)據(jù)添加為第二個(gè)assay
# creates a Seurat object based on the scRNA-seq data
cbmc <- CreateSeuratObject(counts = cbmc.rna)
# We can see that by default, the cbmc object contains an assay storing RNA measurement
Assays(cbmc)
[1] "RNA"
# create a new assay to store ADT information
adt_assay <- CreateAssayObject(counts = cbmc.adt)
# add this assay to the previously created Seurat object
cbmc[["ADT"]] <- adt_assay
# Validate that the object now contains multiple assays
Assays(cbmc)
[1] "RNA" "ADT"
提取ADT中測(cè)量的基因
# Extract a list of features measured in the ADT assay
rownames(cbmc[["ADT"]])
[1] "CD3" "CD4" "CD8" "CD45RA" "CD56" "CD16" "CD10" "CD11c" "CD14" "CD19" "CD34" "CCR5" "CCR7"
對(duì)于assay,我們可以在RNA和ADT之間來(lái)回切換以便于后續(xù)的分析和可視化
# List the current default assay
DefaultAssay(cbmc)
# Switch the default to ADT
DefaultAssay(cbmc) <- "ADT"
DefaultAssay(cbmc)
3.基于單細(xì)胞數(shù)據(jù)對(duì)細(xì)胞進(jìn)行聚類
這個(gè)地方使用的是一個(gè)簡(jiǎn)版必要步驟的代碼,省略了很多參數(shù)和可視化的解讀,如果要看詳細(xì)的,還是需要去看那篇入門教程:https://satijalab.org/seurat/articles/pbmc3k_tutorial.html,中文版:http://www.itdecent.cn/p/2b5c5c849ec0
# Note that all operations below are performed on the RNA assay Set and verify that the default assay is RNA
DefaultAssay(cbmc) <- "RNA"
DefaultAssay(cbmc)
# perform visualization and clustering steps
cbmc <- NormalizeData(cbmc)
cbmc <- FindVariableFeatures(cbmc)
cbmc <- ScaleData(cbmc)
cbmc <- RunPCA(cbmc, verbose = FALSE)
cbmc <- FindNeighbors(cbmc, dims = 1:30)
cbmc <- FindClusters(cbmc, resolution = 0.8, verbose = FALSE)
cbmc <- RunUMAP(cbmc, dims = 1:30)
DimPlot(cbmc, label = TRUE)

4.并排可視化多種模式
現(xiàn)在,經(jīng)過(guò)前面的處理,我們有了單細(xì)胞數(shù)據(jù)的聚類信息,我們就可以對(duì)RNA或蛋白的表達(dá)進(jìn)行可視化。值得注意的是:Seurat提供了好幾種方法在不同模態(tài)數(shù)據(jù)之間進(jìn)行切換,這一點(diǎn)尤其重要,因?yàn)樵谀承┣闆r下,相同的特征可以以多種形式出現(xiàn)——例如,該數(shù)據(jù)集包含B細(xì)胞標(biāo)記物CD19(蛋白質(zhì)和RNA水平)的獨(dú)立測(cè)量。
# Normalize ADT data,
DefaultAssay(cbmc) <- "ADT"
cbmc <- NormalizeData(cbmc, normalization.method = "CLR", margin = 2)
DefaultAssay(cbmc) <- "RNA"
# Note that the following command is an alternative but returns the same result
cbmc <- NormalizeData(cbmc, normalization.method = "CLR", margin = 2, assay = "ADT")
# Now, we will visualize CD14 levels for RNA and protein By setting the default assay, we can
# visualize one or the other
DefaultAssay(cbmc) <- "ADT"
p1 <- FeaturePlot(cbmc, "CD19", cols = c("lightgrey", "darkgreen")) + ggtitle("CD19 protein")
DefaultAssay(cbmc) <- "RNA"
p2 <- FeaturePlot(cbmc, "CD19") + ggtitle("CD19 RNA")
# place plots side-by-side
p1 | p2

或者,我們可以使用特定的assay關(guān)鍵詞來(lái)指定特定的方式識(shí)別RNA和蛋白質(zhì)assay關(guān)鍵詞
# Alternately, we can use specific assay keys to specify a specific modality Identify the key for the RNA and protein assays
Key(cbmc[["RNA"]])
[1] "rna_"
Key(cbmc[["ADT"]])
[1] "adt_"
現(xiàn)在,我們可以在基因名字中加上key進(jìn)行可視化
# Now, we can include the key in the feature name, which overrides the default assay
p1 <- FeaturePlot(cbmc, "adt_CD19", cols = c("lightgrey", "darkgreen")) + ggtitle("CD19 protein")
p2 <- FeaturePlot(cbmc, "rna_CD19") + ggtitle("CD19 RNA")
p1 | p2

5.識(shí)別scRNA-seq簇的細(xì)胞表面標(biāo)記物
我們可以利用我們配對(duì)的CITE-seq測(cè)量來(lái)幫助注釋來(lái)自scRNA-seq的cluster,并識(shí)別蛋白質(zhì)和RNA marker。
# as we know that CD19 is a B cell marker, we can identify cluster 5 as expressing CD19 on the surface
VlnPlot(cbmc, "adt_CD19")

我們也可以通過(guò)差異表達(dá)來(lái)篩選候選的cluster的蛋白和RNA marker
adt_markers <- FindMarkers(cbmc, ident.1 = 5, assay = "ADT")
rna_markers <- FindMarkers(cbmc, ident.1 = 5, assay = "RNA")
head(adt_markers)
p_val avg_logFC pct.1 pct.2 p_val_adj
CD19 7.164366e-218 2.0582747 1 1 9.313675e-217
CD45RA 7.330397e-110 0.8163007 1 1 9.529515e-109
CD4 1.736704e-108 -1.1652553 1 1 2.257715e-107
CD14 9.016660e-106 -0.7940111 1 1 1.172166e-104
CD3 9.578480e-89 -1.1131282 1 1 1.245202e-87
CD8 1.218701e-18 -0.8828616 1 1 1.584311e-17
head(rna_markers)
p_val avg_logFC pct.1 pct.2 p_val_adj
IGHM 0 3.260649 0.971 0.044 0
TCL1A 0 2.917187 0.902 0.028 0
CD79A 0 2.888065 0.957 0.045 0
CD79B 0 2.615201 0.945 0.089 0
IGLC2 0 2.591397 0.286 0.005 0
MS4A1 0 2.493754 0.850 0.016 0
6.多模態(tài)數(shù)據(jù)的額外可視化
繪制ADT散點(diǎn)圖(如FACS的雙軸圖)。請(qǐng)注意,如果需要,您甚至可以使用HoverLocator和FeatureLocator來(lái)“gate”細(xì)胞。
FeatureScatter(cbmc, feature1 = "adt_CD19", feature2 = "adt_CD3")

# view relationship between protein and RNA
FeatureScatter(cbmc, feature1 = "adt_CD3", feature2 = "rna_CD3E")

與RNA數(shù)據(jù)相比,蛋白的原始count要高很多。
# Let's look at the raw (non-normalized) ADT counts. You can see the values are quite high, particularly in comparison to RNA values. This is due to the significantly higher protein copy number in cells, which significantly reduces 'drop-out' in ADT data
FeatureScatter(cbmc, feature1 = "adt_CD4", feature2 = "adt_CD8", slot = "counts")

7.加載來(lái)自10X多模態(tài)實(shí)驗(yàn)的數(shù)據(jù)
Seurat還能夠分析CellRanger v3處理的多模態(tài)10X實(shí)驗(yàn)數(shù)據(jù);例如,我們使用10X Genomics免費(fèi)提供的7,900個(gè)外周血單個(gè)核細(xì)胞(PBMC)數(shù)據(jù)集重建上面的圖。
數(shù)據(jù)下載鏈接:https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_10k_protein_v3/pbmc_10k_protein_v3_filtered_feature_bc_matrix.tar.gz
pbmc10k.data <- Read10X(data.dir = "../data/pbmc10k/filtered_feature_bc_matrix/")
rownames(x = pbmc10k.data[["Antibody Capture"]]) <- gsub(pattern = "_[control_]*TotalSeqB", replacement = "", x = rownames(x = pbmc10k.data[["Antibody Capture"]]))
pbmc10k <- CreateSeuratObject(counts = pbmc10k.data[["Gene Expression"]], min.cells = 3, min.features = 200)
pbmc10k <- NormalizeData(pbmc10k)
pbmc10k[["ADT"]] <- CreateAssayObject(pbmc10k.data[["Antibody Capture"]][, colnames(x = pbmc10k)])
pbmc10k <- NormalizeData(pbmc10k, assay = "ADT", normalization.method = "CLR")
plot1 <- FeatureScatter(pbmc10k, feature1 = "adt_CD19", feature2 = "adt_CD3", pt.size = 1)
plot2 <- FeatureScatter(pbmc10k, feature1 = "adt_CD4", feature2 = "adt_CD8a", pt.size = 1)
plot3 <- FeatureScatter(pbmc10k, feature1 = "adt_CD3", feature2 = "CD3E", pt.size = 1)
(plot1 + plot2 + plot3) & NoLegend()

8.Seurat多模態(tài)數(shù)據(jù)的附加功能
Seurat v4還包括用于分析、可視化和集成多模態(tài)數(shù)據(jù)集的附加功能。欲了解更多信息,請(qǐng)瀏覽以下資源:
- Defining cellular identity from multimodal data using WNN analysis in Seurat v4 vignette link
- Mapping scRNA-seq data onto CITE-seq references [vignette]
- Introduction to the analysis of spatial transcriptomics analysis [vignette]
- Analysis of 10x multiome (paired scRNA-seq + ATAC) using WNN analysis [vignette]
- Signac: Analysis, interpretation, and exploration of single-cell chromatin datasets [package]
- Mixscape: an analytical toolkit for pooled single-cell genetic screens [vignette]