【單細(xì)胞測(cè)序數(shù)據(jù)分析-1】認(rèn)識(shí)Seurat對(duì)象數(shù)據(jù)結(jié)構(gòu)/數(shù)據(jù)格式及操作

一文了解單細(xì)胞對(duì)象數(shù)據(jù)結(jié)構(gòu)/數(shù)據(jù)格式,單細(xì)胞數(shù)據(jù)操作不迷茫。本文內(nèi)容包括 單細(xì)胞seurat對(duì)象數(shù)據(jù)結(jié)構(gòu), 內(nèi)容構(gòu)成,對(duì)象的調(diào)用、操作,常見(jiàn)函數(shù)的應(yīng)用等。

1. 單細(xì)胞seurat對(duì)象數(shù)據(jù)結(jié)構(gòu)

單細(xì)胞數(shù)據(jù)結(jié)構(gòu).jpg
  • 此圖引用自周運(yùn)來(lái)老師,稍作注釋。原文地址http://www.itdecent.cn/p/13142bf51e81。里面還有SingleCellExperiment,anndata,h5 數(shù)據(jù)結(jié)構(gòu)的介紹,寫(xiě)的很詳細(xì),推薦閱讀。

2. 數(shù)據(jù)結(jié)構(gòu)及內(nèi)容

Assays

默認(rèn)情況下,我們是對(duì)Seurat中的RNA的Assay進(jìn)行操作??梢酝ㄟ^(guò)@active.assay查看當(dāng)前默認(rèn)的assay,通過(guò)DefaultAssay()更改當(dāng)前的默認(rèn)assay。
結(jié)構(gòu)
counts 存儲(chǔ)原始數(shù)據(jù),是稀疏矩陣
data存儲(chǔ)logNormalize() 規(guī)范化的data??偙磉_(dá)式對(duì)每個(gè)單元格的要素表達(dá)式度量進(jìn)行標(biāo)準(zhǔn)化,將其乘以比例因子(默認(rèn)為10,000),并對(duì)結(jié)果進(jìn)行對(duì)數(shù)轉(zhuǎn)換
scale.data#存儲(chǔ) ScaleData()縮放后的data,此步驟需要時(shí)間久。

meta.data

元數(shù)據(jù),對(duì)每個(gè)細(xì)胞的描述。一般的meta.data包括orig.ident, nCount_RNA, nFeature_RNA, 以及計(jì)算后的percent.mt,RNA_snn_res.0.5等

image.png

可以通過(guò)pbmc$percent.mtpbmc[['percent.mt']]來(lái)調(diào)用、操作

reduction

降維后的每個(gè)細(xì)胞的坐標(biāo)信息,包括pca,tsne,umap等

3. 對(duì)象操作

① 通過(guò)結(jié)構(gòu)圖上的@,$符號(hào)依次取

② 兩個(gè)中括號(hào)操作,pbmc[[ ]]。

教程中,pbmc[['percent.MT']]向meta.data添加 percent.MT 這一列。
pbmc[[]],中括號(hào)取的是上面結(jié)構(gòu)圖中的二級(jí)數(shù)據(jù)名稱

  • 以上兩種方法的區(qū)別是?

如果取的一級(jí)結(jié)構(gòu)Assays的下屬內(nèi)容:

無(wú)差別

 > class(pbmc[['RNA']])
 [1] "Assay"
 attr(,"package")
 [1] "Seurat"
 > class(pbmc@assays$RNA)
 [1] "Assay"
 attr(,"package")
 [1] "Seurat"

如果是一級(jí)結(jié)構(gòu)meta.data里的下屬內(nèi)容:

返回的數(shù)據(jù)類型不同

> class(pbmc[['nCount_RNA']])
[1] "data.frame"
> class(pbmc@meta.data$nCount_RNA)
[1] "numeric"

pbmc[['nCount_RNA']] 取出來(lái),是所有細(xì)胞的nCount_RNA,包含細(xì)胞信息,數(shù)據(jù)框

image.png

pbmc@meta.data$nCount_RNA取出來(lái)的是單獨(dú)nCount_RNA一列,是向量
image.png

如果取的一級(jí)結(jié)構(gòu)里reductions的下屬內(nèi)容:

無(wú)差別

> class(pbmc[['pca']])
 [1] "DimReduc"
 attr(,"package")
 [1] "Seurat"
> class(pbmc@reductions$pca)
 [1] "DimReduc"
 attr(,"package")
 [1] "Seurat"

4. seurat常用函數(shù)的應(yīng)用

之前寫(xiě)了對(duì)seurat對(duì)象的主要 操作列表http://www.itdecent.cn/p/b9078c71f057,這里不贅述。下面主要看數(shù)據(jù)操作。

#數(shù)據(jù)準(zhǔn)備
library(Seurat)
pbmc <- readRDS(file = "pbmc/pbmc3k_final.rds")
set.seed(42)
pbmc$replicate <- sample(c("rep1", "rep2"), size = ncol(pbmc), replace = TRUE) #隨機(jī)給樣本設(shè)置兩個(gè)樣本,分別為rep1,rep2,添加到metadata中。replace =T,有放回抽樣
DimPlot(pbmc, reduction = "umap")

為什么我畫(huà)出來(lái)的和教程上的不一樣?


IMG_1388.JPG
  • 切換細(xì)胞ident,簡(jiǎn)單統(tǒng)計(jì)各樣本細(xì)胞cluster 頻率信息
#切換細(xì)胞ident,從細(xì)胞名稱轉(zhuǎn)為樣本rep1,rep2。繪圖看批次性
pbmc$CellType <- Idents(pbmc)
Idents(pbmc) <- "replicate"
DimPlot(pbmc, reduction = "umap")
Idents(pbmc) <- "CellType" #轉(zhuǎn)變回來(lái),下一步使用
#查看不同樣本、細(xì)胞名稱的 細(xì)胞數(shù)量
table(Idents(pbmc))
table(pbmc$replicate)
prop.table(table(Idents(pbmc)))#計(jì)算每種細(xì)胞頻率
table(Idents(pbmc), pbmc$replicate)#計(jì)算每個(gè)樣本、每種細(xì)胞的數(shù)量
prop.table(table(Idents(pbmc), pbmc$replicate), margin = 2)#計(jì)算每個(gè)重復(fù)、每種細(xì)胞的頻率,margin=2,按照列的值 頻率和為1來(lái)計(jì)算。因?yàn)榱惺菢颖?,每個(gè)樣本的細(xì)胞頻率和為1。
  • 選擇特定細(xì)胞,取子集
#取子集的幾種方法
subset(pbmc, subset = MS4A1 > 1)
subset(pbmc, subset = replicate == "rep2")
subset(pbmc, idents = c("NK", "B"))
subset(pbmc, idents = c("NK", "B"), invert = TRUE) # 提取除了NK,B細(xì)胞之外的細(xì)胞
WhichCells(pbmc, idents = "NK")# 返回是NK細(xì)胞的細(xì)胞名
  • 提取表達(dá)矩陣GetAssayData()
GetAssayData(object, slot, assay) # slot = counts, data, scale.data
GetAssayData(object = pbmc_small[["RNA"]], slot = "data")[1:5,1:5]#出來(lái)的是稀疏矩陣,所以用as.matrix()直接轉(zhuǎn)換

##①?gòu)腁ssay中提取
d <- as.matrix(GetAssayData(object = pbmc_small[["RNA"]], slot = "data")[1:5,1:5])
#前五個(gè)基因和前五個(gè)細(xì)胞

##②從Seurat對(duì)象中提取
f <- as.matrix(GetAssayData(object = pbmc_small, slot = "counts")[1:5,1:5])

#提取NK細(xì)胞的表達(dá)矩陣,用于其他分析
nk.raw.data <- as.matrix(GetAssayData(pbmc, slot = "counts")[, WhichCells(pbmc, ident = "NK")]) #直接提取NK細(xì)胞的原始表達(dá)矩陣
  • 計(jì)算cluster內(nèi)平均基因表達(dá)量AverageExpression()
cluster.averages <- AverageExpression(pbmc) #返回一個(gè)list元素的list。這個(gè)list元素是data.frame的list,cluster.averages$RNA$`Memory CD4 T`
head(cluster.averages[["RNA"]][, 1:5])

#將計(jì)算結(jié)果返回seurat用于下步驟計(jì)算【將空格替換成_,不然ggplot2會(huì)fail】
orig.levels <- levels(pbmc)#clusters
Idents(pbmc) <- gsub(pattern = " ", replacement = "_", x = Idents(pbmc))
orig.levels <- gsub(pattern = " ", replacement = "_", x = orig.levels)
levels(pbmc) <- orig.levels
##levels 和Ident的cluster名稱里的空格都要改
cluster.averages <- AverageExpression(pbmc, return.seurat = TRUE)#計(jì)算結(jié)果存在 cluster.averages[['RNA']]@data 中, 是matrix
cluster.averages
CellScatter(cluster.averages, cell1 = "NK", cell2 = "CD8_T")
#添加樣本作為細(xì)胞身份
cluster.averages <- AverageExpression(pbmc, return.seurat = TRUE, add.ident = "replicate")
CellScatter(cluster.averages, cell1 = "CD8_T_rep1", cell2 = "CD8_T_rep2")
DoHeatmap(cluster.averages, features = unlist(TopFeatures(pbmc[["pca"]], balanced = TRUE)), size = 3, draw.lines = FALSE) 
  • 對(duì)于其中TopFeatures() 不了解,查了下
#TopFeatures中balanced的參數(shù),看了幫助文檔,沒(méi)明白。試一下。 
#TopFeatures適用對(duì)象就是降維后的對(duì)象,單細(xì)胞測(cè)序中pbmc[['pca']]可用,試了下pbmc[["umap"]],報(bào)錯(cuò)。溢出邊界。
> unlist(TopFeatures(pbmc[["pca"]], balanced = TRUE))
positive1  positive2  positive3  positive4  positive5  positive6  positive7 
"CST3"   "TYROBP"     "LST1"     "AIF1"      "FTL"     "FTH1"      "LYZ" 
positive8  positive9 positive10  negative1  negative2  negative3  negative4 
"FCN1"   "S100A9"     "TYMP"   "MALAT1"      "LTB"     "IL32"     "IL7R" 
negative5  negative6  negative7  negative8  negative9 negative10 
"CD2"      "B2M"    "ACAP1"     "CD27"   "STK17A"     "CTSW" 
> unlist(TopFeatures(pbmc[["pca"]], balanced = FALSE))
[1] "MALAT1"   "SPI1"     "IFITM3"   "SERPINA1" "LGALS2"   "CTSS"    
[7] "S100A8"   "LGALS1"   "CFD"      "FCER1G"   "TYMP"     "S100A9"  
[13] "FCN1"     "LYZ"      "FTH1"     "FTL"      "AIF1"     "LST1"    
[19] "TYROBP"   "CST3"  

參考教程
https://satijalab.org/seurat/archive/v3.1/interaction_vignette.html
http://www.itdecent.cn/p/1a609ee4caef

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容