Seurat - Dimensional Reduction Vignette
我們知道單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)一個(gè)主要的特點(diǎn)就是數(shù)據(jù)稀疏,維度較高?;诖耍琒eurat提供了不少降維的方法:

主要是PCA,TSNE,UMAP三種,其實(shí)降維方法何其的多:

那么,我們?nèi)绻雽?duì)我們的數(shù)據(jù)應(yīng)用其他降維方法,我們需要如何操作呢?今天我們就帶大家走一走,Seurat對(duì)象的【multi-dimensional scaling (MDS)】降維方法。若要求原始空間中樣本之間的距離在低維空間中得以保持,即得到"多維縮放" (Multiple Dimensional Scaling,簡稱 MDS),基于此,來探究降維的一般方法以及進(jìn)一步了解Seurat的數(shù)據(jù)結(jié)構(gòu)。
什么,PCA,TSNE,UMAP我還沒搞明白呢? MDS是什么意思?看看運(yùn)來哥上一段感情經(jīng)歷的筆記啊:
數(shù)量生態(tài)學(xué)筆記||非約束排序|NMDS
Seurat3 中的降維結(jié)構(gòu)

在Seurat v3.0中,存儲(chǔ)和與維度縮減信息的交互已經(jīng)被一般化并正式化為DimReduc對(duì)象。每個(gè)維度縮減過程作為一個(gè)命名列表的元素存儲(chǔ)在object@slot中的DimReduc對(duì)象中。訪問這些縮減可以通過[[操作符調(diào)用所需的縮減的名稱來完成。例如,在使用RunPCA運(yùn)行主成分分析之后,object[['pca']]將包含pca的結(jié)果。通過向列表中添加新元素,用戶可以添加額外的、自定義的維度縮減。每個(gè)存儲(chǔ)的維度縮減包含以下slot:
-
cell.embeddings:stores the coordinates for each cell in low-dimensional space. -
feature.loadings:stores the weight for each feature along each dimension of the embedding -
feature.loadings.projected:Seurat typically calculate the dimensional reduction on a subset of genes (for example, high-variance genes), and then project that structure onto the entire dataset (all genes). The results of that projection (calculated withProjectDim) are stored in this slot. Note that the cell loadings will remain unchanged after projection but there are now feature loadings for all feature -
stdev:The standard deviations of each dimension. Most often used with PCA (storing the square roots of the eigenvalues of the covariance matrix) and can be useful when looking at the drop off in the amount of variance that is explained by each successive dimension. -
key:Sets the column names for the cell.embeddings and feature.loadings matrices. For example, for PCA, the column names are PC1, PC2, etc., so the key is “PC”. -
jackstraw:Stores the results of the jackstraw procedure run using this dimensional reduction technique. Currently supported only for PCA. -
misc:Bonus slot to store any other information you might want
為了訪問這些插槽,我們提供了Embeddings、Loadings和Stdev函數(shù):
library(Seurat)
pbmc_small[["pca"]]
A dimensional reduction object with key PC_
Number of dimensions: 19
Projected dimensional reduction calculated: TRUE
Jackstraw run: TRUE
Computed using assay: RNA
我們用相應(yīng)的函數(shù)方法來查看一下啊
> head(Embeddings(pbmc_small, reduction = "pca")[, 1:5]) # 細(xì)胞 PCA坐標(biāo)值
PC_1 PC_2 PC_3 PC_4 PC_5
ATGCCAGAACGACT -0.77403708 -0.8996461 -0.2493078 0.5585948 0.4650838
CATGGCCTGTGCAT -0.02602702 -0.3466795 0.6651668 0.4182900 0.5853204
GAACCTGATGAACC -0.45650250 0.1795811 1.3175907 2.0137210 -0.4818851
TGACTGGATTCTCA -0.81163243 -1.3795340 -1.0019320 0.1390503 -1.5982232
AGTCAGACTGCACA -0.77403708 -0.8996461 -0.2493078 0.5585948 0.4650838
TCTGATACACGTGT -0.77403708 -0.8996461 -0.2493078 0.5585948 0.4650838
> head(Loadings(pbmc_small, reduction = "pca")[, 1:5]) # 基因在每個(gè)主成分中的loading值
PC_1 PC_2 PC_3 PC_4 PC_5
PPBP 0.33832535 0.04095778 0.02926261 0.03111034 -0.090420744
IGLL5 -0.03504289 0.05815335 -0.29906272 0.54744454 0.214603428
VDAC3 0.11990482 -0.10994433 -0.02386025 0.06015126 -0.809207588
CD1C -0.04690284 0.19835522 -0.35090617 -0.51112169 -0.130306281
AKR1C3 -0.03894635 -0.42880452 0.08845847 -0.27274386 0.087791646
PF4 0.34392057 0.02474860 -0.02519515 -0.01231411 -0.006725932
> head(Stdev(pbmc_small, reduction = "pca")) # 標(biāo)準(zhǔn)差
[1] 2.7868782 1.6145733 1.3162945 1.1241143 1.0347596 0.9876531
Seurat提供了RunPCA (pca)和RunTSNE (tsne),并表示了通常應(yīng)用于scRNA-seq數(shù)據(jù)的降維技術(shù)。當(dāng)使用這些功能時(shí),所有插槽都會(huì)自動(dòng)填充。
我們還允許用戶添加單獨(dú)計(jì)算的自定義維縮減技術(shù)的結(jié)果(例如,多維縮放(MDS)或零膨脹因子分析)。您所需要的只是一個(gè)矩陣,其中包含低維空間中每個(gè)單元的坐標(biāo),如下所示.
存儲(chǔ)自定義維度縮減計(jì)算
Classical (Metric) Multidimensional Scaling
Classical multidimensional scaling (MDS) of a data matrix. Also known as principal coordinates analysis (Gower, 1966).
雖然不是作為Seurat包的一部分,但它很容易在r中運(yùn)行多維縮放(MDS)。如果你有興趣運(yùn)行MDS并將輸出存儲(chǔ)在Seurat對(duì)象中:
# Before running MDS, we first calculate a distance matrix between all pairs of cells. Here we
# use a simple euclidean distance metric on all genes, using scale.data as input
d <- dist(t(GetAssayData(pbmc_small, slot = "scale.data")))
# Run the MDS procedure, k determines the number of dimensions
mds <- cmdscale(d = d, k = 2)
head(mds)
[,1] [,2]
ATGCCAGAACGACT 0.77403708 -0.8996461
CATGGCCTGTGCAT 0.02602702 -0.3466795
GAACCTGATGAACC 0.45650250 0.1795811
TGACTGGATTCTCA 0.81163243 -1.3795340
AGTCAGACTGCACA 0.77403708 -0.8996461
TCTGATACACGTGT 0.77403708 -0.8996461
# cmdscale returns the cell embeddings, we first label the columns to ensure downstream
# consistency
colnames(mds) <- paste0("MDS_", 1:2)
# We will now store this as a custom dimensional reduction called 'mds'
pbmc_small[["mds"]] <- CreateDimReducObject(embeddings = mds, key = "MDS_", assay = DefaultAssay(pbmc_small))
pbmc_small
An object of class Seurat
230 features across 80 samples within 1 assay
Active assay: RNA (230 features)
3 dimensional reductions calculated: pca, tsne, mds
我們的對(duì)象中已經(jīng)有了mds這個(gè)slot了,下面我們像pca , tsne. umap,那樣可視化它:
# We can now use this as you would any other dimensional reduction in all downstream functions
DimPlot(pbmc_small, reduction = "mds", pt.size = 0.5)

pbmc_small <- ProjectDim(pbmc_small, reduction = "mds")
MDS_ 1
Positive: HLA-DPB1, HLA-DQA1, S100A9, S100A8, GNLY, RP11-290F20.3, CD1C, AKR1C3, IGLL5, VDAC3
PARVB, RUFY1, PGRMC1, MYL9, TREML1, CA2, TUBB1, PPBP, PF4, SDPR
Negative: SDPR, PF4, PPBP, TUBB1, CA2, TREML1, MYL9, PGRMC1, RUFY1, PARVB
VDAC3, IGLL5, AKR1C3, CD1C, RP11-290F20.3, GNLY, S100A8, S100A9, HLA-DQA1, HLA-DPB1
MDS_ 2
Positive: HLA-DPB1, HLA-DQA1, S100A8, S100A9, CD1C, RP11-290F20.3, PARVB, IGLL5, MYL9, SDPR
PPBP, CA2, RUFY1, TREML1, PF4, TUBB1, PGRMC1, VDAC3, AKR1C3, GNLY
Negative: GNLY, AKR1C3, VDAC3, PGRMC1, TUBB1, PF4, TREML1, RUFY1, CA2, PPBP
SDPR, MYL9, IGLL5, PARVB, RP11-290F20.3, CD1C, S100A9, S100A8, HLA-DQA1, HLA-DPB1
Warning message:
In print.DimReduc(x = redeuc, dims = dims.print, nfeatures = nfeatures.print, :
Only 2 dimensions have been computed.
# Display the results as a heatmap
DimHeatmap(pbmc_small, reduction = "mds", dims = 1, cells = 500, projected = TRUE, balanced = TRUE)

VlnPlot(pbmc_small, features = "MDS_1")

查看MDS1維度如何與PC1維度相關(guān)性:
# See how the first MDS dimension is correlated with the first PC dimension
FeatureScatter(pbmc_small, feature1 = "MDS_1", feature2 = "PC_1")

FeatureScatter(pbmc_small, feature1 = "MDS_1", feature2 = "tSNE_1")
