seurat對象處理

Seurat是單細胞分析經(jīng)常使用的分析包。seurat對象的處理是分析的一個難點,這里我根據(jù)我自己的理解整理了下常用的seurat對象處理的一些操作,有不足或者錯誤的地方希望大家指正~
首先是從10X數(shù)據(jù)或者其他數(shù)據(jù)生成一個seurat對象(這里直接拷貝的官網(wǎng)的教程https://satijalab.org/seurat/essential_commands.html)也可以是其他的代碼。

pbmc.counts <- Read10X(data.dir = "~/Downloads/pbmc3k/filtered_gene_bc_matrices/hg19/")
pbmc <- CreateSeuratObject(counts = pbmc.counts)

首先在Rstudio中運行幫助?seurat

Each Seurat object has a number of slots which store information. Key slots to access are listed below.

Slots:
raw.data
The raw project data
data
The normalized expression matrix (log-scale)
scale.data
scaled (default is z-scoring each gene) expression matrix; used for dimmensional reduction and heatmap visualization
var.genes
Vector of genes exhibiting high variance across single cells
is.expr
Expression threshold to determine if a gene is expressed (0 by default)
ident
THe 'identity class' for each cell
meta.data
Contains meta-information about each cell, starting with number of genes detected (nGene) and the original identity class (orig.ident); more information is added using AddMetaData
project.name
Name of the project (for record keeping)
dr
List of stored dimmensional reductions; named by technique
assay
List of additional assays for multimodal analysis; named by technique
hvg.info
The output of the mean/variability analysis for all genes
imputed
Matrix of imputed gene scores
cell.names
Names of all single cells (column names of the expression matrix)
cluster.tree
List where the first element is a phylo object containing the phylogenetic tree relating different identity classes
snn
Spare matrix object representation of the SNN graph
calc.params
Named list to store all calculation-related parameter choices
kmeans
Stores output of gene-based clustering from DoKMeans
spatial
Stores internal data and calculations for spatial mapping of single cells
misc
Miscellaneous spot to store any data alongisde the object (for example, gene lists)
version
Version of package used in object creation

但在實際的分析中沒有這么多變量。大家可以用@或者$來獲取有的變量。

使用$獲取的變量截圖(不同的數(shù)據(jù)可能略有差異)

上面我在后面分析用到的是orig.identgroup還有seurat_clusters變量,這里分別存儲的是樣本名,分組以及cluster信息。

1、基本信息獲取

先來直接輸出seurat對象看看:

> pbmc # 測試數(shù)據(jù),進行了PCA和UMAP分析
An object of class Seurat 
25540 features across 46636 samples within 2 assays 
Active assay: integrated (2000 features, 2000 variable features)
 1 other assay present: RNA
 2 dimensional reductions calculated: pca, umap

一些可以查詢和提取的基本信息:

colnames(x = pbmc)  # 各個細胞的編號
Cells(pbmc)   # 和上面的一樣,各個細胞的編號
rownames(x = pbmc)   # 基因名
ncol(x = pbmc)   #列數(shù)
nrow(x = pbmc)   #行數(shù)
dim(pbmc)  # 行數(shù)和列數(shù)
#  獲取細胞類型
Idents(object = pbmc)
levels(pbmc)
table(Idents(pbmc))  # 獲取每個細胞類型的細胞數(shù)目表格
# 其他的一些細胞類型的處理
# Stash cell identity classes
pbmc[["old.ident"]] <- Idents(object = pbmc)
pbmc <- StashIdent(object = pbmc, save.name = "old.ident")

# Set identity classes
Idents(object = pbmc) <- "CD4 T cells"
Idents(object = pbmc, cells = 1:10) <- "CD4 T cells"

# Set identity classes to an existing column in meta data
Idents(object = pbmc, cells = 1:10) <- "orig.ident"
Idents(object = pbmc) <- "orig.ident"

# Rename identity classes
pbmc <- RenameIdents(object = pbmc, `CD4 T cells` = "T Helper cells")


我們可以直接根據(jù)levels(pbmc)獲取所有的細胞類型

2、subset函數(shù)篩選

# 篩選某一種或多種細胞類型
subset(x = pbmc, idents = "B cells")
subset(x = pbmc, idents = c("CD4 T cells", "CD8 T cells"), invert = TRUE)
# 還可以根據(jù)表達量的值來進行篩選
# Subset on the expression level of a gene/feature
subset(x = pbmc, subset = MS4A1 > 3)
# Subset on a combination of criteria
subset(x = pbmc, subset = MS4A1 > 3 & PC1 > 5)
subset(x = pbmc, subset = MS4A1 > 3, idents = "B cells")
# Subset on a value in the object meta data
subset(x = pbmc, subset = orig.ident == "Replicate1")
# Downsample the number of cells per identity class
subset(x = pbmc, downsample = 100)
#篩選基因
subset(x = pbmc_small, features = VariableFeatures(object = pbmc_small))

# 也可以使用數(shù)組的形式提取
pbmc_small_sub = pbmc_small[,pbmc_small@meta.data$seurat_clusters %in% c(0,2)]
pbmc_small_sub = pbmc_small[, Idents(pbmc_small) %in% c( "T cell" ,  "B cell" )]  # 需要此時的pbmc_small數(shù)據(jù)Idents(pbmc_small)為細胞類型

3、數(shù)據(jù)獲取

# 讀取保存在@meta.data中的數(shù)據(jù)
# View metadata data frame, stored in object@meta.data
pbmc[[]]

# 提取某一類型的數(shù)據(jù)
# Retrieve specific values from the metadata
pbmc$nCount_RNA
pbmc[[c("percent.mito", "nFeature_RNA")]]

# 增加分組信息 
# Add metadata, see ?AddMetaData
random_group_labels <- sample(x = c("g1", "g2"), size = ncol(x = pbmc), replace = TRUE)
pbmc$groups <- random_group_labels
# 使用GetAssayData函數(shù)獲取'counts', 'data'和'scale.data'信息
# Retrieve or set data in an expression matrix ('counts', 'data', and 'scale.data')
GetAssayData(object = pbmc, slot = "counts")
pbmc <- SetAssayData(object = pbmc, slot = "scale.data", new.data = new.data)
# Get cell embeddings and feature loadings
Embeddings(object = pbmc, reduction = "pca")
Loadings(object = pbmc, reduction = "pca")
Loadings(object = pbmc, reduction = "pca", projected = TRUE)
# FetchData can pull anything from expression matrices, cell embeddings, or metadata
FetchData(object = pbmc, vars = c("PC_1", "percent.mito", "MS4A1"))

因為不同版本中的變量可能會有變化,這里的FetchData的前綴可以從Key(pbmc)獲取,比如

4、計算

# 獲取平均表達量
Idents(scRNA_data) <- "seurat_clusters"   # 這一步可以指定要計算哪一個分組的平均表達量,可以選擇細胞類型("CellType")cluster("seurat_clusters")或者是樣本類型("orig.ident"),要注意這里的變量名稱不一定正確,要根據(jù)數(shù)據(jù)中的具體變量來指定
AverageExp <- AverageExpression(scRNA_data)
expr <- AverageExp$RNA
# 增加分組前綴,這里增加的是"Cluster"
for(i in 1:ncol(expr)){colnames(expr)[i] = paste("Cluster", colnames(expr)[i],sep = "")}
expr截圖

5、數(shù)據(jù)替換/修改

有時候需要對seurat對象的數(shù)據(jù)進行替換或修改

library(Seurat)
# 替換cell ID名稱,-1改成_1
new_obj <- RenameCells(obj, new.names=gsub("-1", "_1", colnames(obj)))
# 如果有多個樣本,篩選細胞
barcode_names <- obj$orig.ident
sampleA_barcode_name = attr(x[x=="sampleA"],"names")

一些參考資料:
1、https://satijalab.org/seurat/essential_commands.html
2、https://satijalab.org/seurat/v3.0/interaction_vignette.html
3、http://www.itdecent.cn/p/d43f16bdfed9

最后編輯于
?著作權歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

友情鏈接更多精彩內(nèi)容