Seurat包------標(biāo)準(zhǔn)流程

Seurat官網(wǎng)上詳細(xì)的指導(dǎo)完全可以滿足Seurat包初級使用。不過該網(wǎng)站是英文的,為了方便大家迅速上手,我來走一遍標(biāo)準(zhǔn)流程。我用的是Windows 10, R4.0。
我走的流程原網(wǎng)站地址:https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html

首先我們需要在自己的RStudio中安裝Seurat包

install.packages('Seurat')
library('Seurat')
packageVersion("Seurat")
?Seurat

原參考頁面中還使用了一些相關(guān)的R包,所以我們也需要一并安裝上,如果你已經(jīng)安裝了這些包就跳過這一步

install.packages(c('dplyr','patchwork'))

安裝好R包之后,我們要Load進(jìn)來現(xiàn)在的工作環(huán)境

library(dplyr)
library(Seurat)
library(patchwork)

示例數(shù)據(jù)可以在官網(wǎng)下載:https://support.10xgenomics.com/
讀入的數(shù)據(jù)可以是一個矩陣,行代表基因,列代表細(xì)胞。

1.數(shù)據(jù)導(dǎo)入

list.files('pbmc3k_filtered_gene_bc_matrices/filtered_gene_bc_matrices/hg19')
pbmc.counts <- Read10X(data.dir = "pbmc3k_filtered_gene_bc_matrices/filtered_gene_bc_matrices/hg19")

創(chuàng)建Seurat對象

pbmc <- CreateSeuratObject(counts = pbmc.counts)
pbmc
str(pbmc)

數(shù)據(jù)集中測到的少于200個基因的細(xì)胞(min.features = 200)和少于3個細(xì)胞覆蓋的基因(min.cells = 3)被過濾掉

pbmc <- CreateSeuratObject(counts = pbmc.counts, project = "pbmc3k", min.cells = 3, min.features = 200)

2.數(shù)據(jù)質(zhì)控

質(zhì)控的參數(shù)主要有兩個:
1.每個細(xì)胞測到的unique feature數(shù)目(unique feature代表一個細(xì)胞檢測到的基因的數(shù)目,可以根據(jù)數(shù)據(jù)的質(zhì)量進(jìn)行調(diào)整)
2.每個細(xì)胞檢測到的線粒體基因的比例,理論上線粒體基因組與核基因組相比,只占很小一部分。所以線粒體基因表達(dá)比例過高的細(xì)胞會被過濾。

pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)

nFeature_RNA代表每個細(xì)胞測到的基因數(shù)目,nCount代表每個細(xì)胞測到所有基因的表達(dá)量之和,percent.mt代表測到的線粒體基因的比例。
# FeatureScatter is typically used to visualize feature-feature relationships, but can be used
# for anything calculated by the object, i.e. columns in object metadata, PC scores etc.
plot1 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot1 + plot2

去除線粒體基因表達(dá)比例過高的細(xì)胞,和一些極值細(xì)胞。

pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)

3.標(biāo)準(zhǔn)化

pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
#鑒定細(xì)胞間表達(dá)量高變的基因(feature selection)
#這一步的目的是鑒定出細(xì)胞與細(xì)胞之間表達(dá)量相差很大的基因,用于后續(xù)鑒定細(xì)胞類型,
#我們使用默認(rèn)參數(shù),即“vst”方法選取2000個高變基因。
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
# Identify the 10 most highly variable genes
top10 <- head(VariableFeatures(pbmc), 10)
# plot variable features with and without labels
plot1 <- VariableFeaturePlot(pbmc)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
plot1+plot2

4.細(xì)胞分類

1)分類前首先要對數(shù)據(jù)集進(jìn)行降維
#Scaling the data
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
#Perform linear dimensional reduction
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
#Examine and visualize PCA results a few different ways
print(pbmc[["pca"]], dims = 1:5, nfeatures = 5)
VizDimLoadings(pbmc, dims = 1:2, reduction = "pca")
DimPlot(pbmc, reduction = "pca")
DimHeatmap(pbmc, dims = 1, cells = 500, balanced = TRUE)
2)定義數(shù)據(jù)集的“維度”

這里我們需要選擇出主成分的數(shù)目,用于后續(xù)細(xì)胞分類。這里定義的“維度”并不代表細(xì)胞類型的數(shù)目,而是對細(xì)胞分類時需要用到的一個參數(shù)。

#NOTE: This process can take a long time for big datasets, comment out for expediency. 
#More approximate techniques such as those implemented in ElbowPlot() can be used to reduce computation time
pbmc <- JackStraw(pbmc, num.replicate = 100)
pbmc <- ScoreJackStraw(pbmc, dims = 1:20)
JackStrawPlot(pbmc, dims = 1:15)
ElbowPlot(pbmc)
JackStraw
Elbow

JackStraw和Elbow都可以決定數(shù)據(jù)的“維度”。但是Elbow比較直觀,我們選擇Elbow結(jié)果進(jìn)行解讀??梢钥吹?,主成分(PC)7到10之間,數(shù)據(jù)的標(biāo)準(zhǔn)差基本不在下降。所以我們需要在7到10之間進(jìn)行選擇,為了尊重官網(wǎng)的建議,我們選取10,即前10個主成分用于細(xì)胞的分類。

3) 細(xì)胞分類

選擇不同的resolution值可以獲得不同的cluster數(shù)目,值越大cluster數(shù)目越多,默認(rèn)值是0.5.

pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = 0.5) 
#這里我們設(shè)置了dims = 1:10 即選取前10個主成分來分類細(xì)胞。分類的結(jié)果如下,可以看到,細(xì)胞被分為9個類別。
#Look at cluster IDs of the first 5 cells
head(Idents(pbmc), 5)
4) 可視化分類結(jié)果

TSNE和UMAP兩種方法經(jīng)常被用于可視化細(xì)胞類別。

#UMAP
pbmc <- RunUMAP(pbmc, dims = 1:10, label = T)
head(pbmc@reductions$umap@cell.embeddings) # 提取UMAP坐標(biāo)值。
#note that you can set `label = TRUE` or use the LabelClusters function to help label individual clusters
p1 <- DimPlot(pbmc, reduction = "umap")
#T-SNE
pbmc <- RunTSNE(pbmc, dims = 1:10)
head(pbmc@reductions$tsne@cell.embeddings)
p2 <- DimPlot(pbmc, reduction = "tsne")
p1 + p2
saveRDS(pbmc, file = "pbmc_tutorial.rds")  #保存數(shù)據(jù),用于后續(xù)個性化分析

5.提取各個細(xì)胞類型的marker gene

利用 FindMarkers 命令,可以找到找到各個細(xì)胞類型中與其他類別的差異表達(dá)基因,作為該細(xì)胞類型的生物學(xué)標(biāo)記基因。其中ident.1參數(shù)設(shè)置待分析的細(xì)胞類別,min.pct表示該基因表達(dá)數(shù)目占該類細(xì)胞總數(shù)的比例

#find all markers of cluster 1
cluster1.markers <- FindMarkers(pbmc, ident.1 = 1, min.pct = 0.25)
head(cluster1.markers, n = 5)
#利用 DoHeatmap 命令可以可視化marker基因的表達(dá)
pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25)
?FindMarkers
top3 <- pbmc.markers %>% group_by(cluster) %>% top_n(n = 3, wt = avg_logFC)
DoHeatmap(pbmc, features = top3$gene) + NoLegend()
DoHeatmap

6.探索感興趣的基因

Seurat提供了許多方法使我們能夠方便的探索感興趣的基因在各個細(xì)胞類型中的表達(dá)情況

VlnPlot(pbmc, features = c("MS4A1", "CD79A"))
#我們能夠看到,MS4A1和CD79A兩個基因在細(xì)胞群體3中特異性表達(dá)。
#you can plot raw counts as well
VlnPlot(pbmc, features = c("NKG7", "PF4"), slot = "counts", log = TRUE)

FeaturePlot(pbmc, features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP", "CD8A"))
#這種展示方法把基因表達(dá)量映射到UMAP結(jié)果中,同樣可以直觀的看到基因表達(dá)的特異性。

7.利用先驗(yàn)知識定義細(xì)胞類型

通過對比我們鑒定的marker gene與已發(fā)表的細(xì)胞類型特意的基因表達(dá)marker,可以定義我們劃分出來的細(xì)胞類群。最后,給我們定義好的細(xì)胞類群加上名稱

new.cluster.ids <- c("Naive CD4 T", "Memory CD4 T", "CD14+ Mono", "B", "CD8 T", "FCGR3A+ Mono", "NK", "DC", "Platelet")
names(new.cluster.ids) <- levels(pbmc)
pbmc <- RenameIdents(pbmc, new.cluster.ids)
DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()

今天就分享到這里啦!

如果你關(guān)注了我,希望你與我一起學(xué)習(xí),一起成長!?

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容