這個(gè)數(shù)據(jù)集GSE129516,就是拿到了如下所示的數(shù)據(jù)文件:
GEO下載的</figcaption>
我首先讀取了一個(gè)文件,看了看,就是表達(dá)矩陣,所以直接CreateSeuratObject即可,都省去了3個(gè)文件的組合命令。
表達(dá)矩陣?yán)?lt;/figcaption>
首先批量讀取每個(gè)10x樣品的表達(dá)矩陣
保證當(dāng)前工作目錄下面有后綴是matrices.csv.gz的文件,就是前面下載的6個(gè)文件:
rm(list=ls())
options(stringsAsFactors = F)
library(Seurat)
fs=list.files(pattern = 'matrices.csv.gz')
fs
sceList <- lapply(fs, function(x){
a=read.csv( x )
a[1:4,1:4]
raw.data=a[,-1]
rownames(raw.data)=a[,1]
library(stringr)
p=str_split(x,'_',simplify = T)[,2]
sce <- CreateSeuratObject(counts = raw.data,project = p )
})` </pre>
每個(gè)matrices.csv.gz文件都讀取后,提供CreateSeuratObject構(gòu)建成為對(duì)象。如果是讀取10x數(shù)據(jù)需要三個(gè)文件:barcodes.tsv, genes.tsv, matrix.mtx,那個(gè)更簡(jiǎn)單哦!
然后使用seurat最出名的多個(gè)10x合并算法
多個(gè)單細(xì)胞對(duì)象的整合,這里直接使用標(biāo)準(zhǔn)代碼即可:
pro='integrated'
for (i in 1:length(sceList)) {
sceList[[i]] <- NormalizeData(sceList[[i]], verbose = FALSE)
sceList[[i]] <- FindVariableFeatures(sceList[[i]], selection.method = "vst",
nfeatures = 2000, verbose = FALSE)
}
sceList
sce.anchors <- FindIntegrationAnchors(object.list = sceList, dims = 1:30)
sce.integrated <- IntegrateData(anchorset = sce.anchors, dims = 1:30)
因?yàn)槭?個(gè)10X樣品,所以這個(gè)步驟會(huì)略微有點(diǎn)耗費(fèi)時(shí)間哦!
接著走標(biāo)準(zhǔn)的降維聚類分群
因?yàn)槭菢?gòu)建好的對(duì)象,所以后續(xù)分析都是常規(guī)代碼:
library(ggplot2)
library(cowplot)
# switch to integrated assay. The variable features of this assay are automatically
# set during IntegrateData
DefaultAssay(sce.integrated) <- "integrated"
# Run the standard workflow for visualization and clustering
sce.integrated <- ScaleData(sce.integrated, verbose = FALSE)
sce.integrated <- RunPCA(sce.integrated, npcs = 30, verbose = FALSE)
sce.integrated <- RunUMAP(sce.integrated, reduction = "pca", dims = 1:30)
p1 <- DimPlot(sce.integrated, reduction = "umap", group.by = "orig.ident")
p2 <- DimPlot(sce.integrated, reduction = "umap", group.by = "orig.ident", label = TRUE,
repel = TRUE) + NoLegend()
plot_grid(p1, p2)
p2
ggsave(filename = paste0(pro,'_umap.pdf') )
sce=sce.integrated
DimHeatmap(sce, dims = 1:12, cells = 100, balanced = TRUE)
ElbowPlot(sce)
sce <- FindNeighbors(sce, dims = 1:15)
sce <- FindClusters(sce, resolution = 0.2)
table(sce@meta.data$integrated_snn_res.0.2)
sce <- FindClusters(sce, resolution = 0.8)
table(sce@meta.data$integrated_snn_res.0.8)
DimPlot(sce, reduction = "umap")
ggsave(filename = paste0(pro,'_umap_seurat_res.0.8.pdf') )
DimPlot(sce, reduction = "umap",split.by = 'orig.ident')
ggsave(filename = paste0(pro,'_umap_seurat_res.0.8_split.pdf') )
save(sce,file = 'integrated_after_seurat.Rdata')` </pre>
最后對(duì)聚類的不同細(xì)胞亞群進(jìn)行注釋
前面呢是標(biāo)準(zhǔn)的聚類分群,每個(gè)細(xì)胞亞群僅僅是一個(gè)編號(hào),實(shí)際上還需要給予它們一定的生物學(xué)意義,我們這里采用SingleR的標(biāo)準(zhǔn)代碼:
rm(list=ls())
options(stringsAsFactors = F)
library(Seurat)
load(file = 'integrated_after_seurat.Rdata')
DefaultAssay(sce) <- "RNA"
# for B cells : cluster, 1,21
VlnPlot(object = sce, features ='Cd19',log =T )
VlnPlot(object = sce, features ='Cd79a',log =T )
library(SingleR)
sce_for_SingleR <- GetAssayData(sce, slot="data")
clusters=sce@meta.data$seurat_clusters
mouseImmu <- ImmGenData()
pred.mouseImmu <- SingleR(test = sce_for_SingleR, ref = mouseImmu, labels = mouseImmu$label.main,
method = "cluster", clusters = clusters,
assay.type.test = "logcounts", assay.type.ref = "logcounts")
mouseRNA <- MouseRNAseqData()
pred.mouseRNA <- SingleR(test = sce_for_SingleR, ref = mouseRNA, labels = mouseRNA$label.fine ,
method = "cluster", clusters = clusters,
assay.type.test = "logcounts", assay.type.ref = "logcounts")
cellType=data.frame(ClusterID=levels(sce@meta.data$seurat_clusters),
mouseImmu=pred.mouseImmu$labels,
mouseRNA=pred.mouseRNA$labels )
head(cellType)
sce@meta.data$singleR=cellType[match(clusters,cellType$ClusterID),'mouseRNA']
pro='first_anno'
DimPlot(sce,reduction = "umap",label=T, group.by = 'singleR')
ggplot2::ggsave(filename = paste0(pro,'_tSNE_singleR.pdf'))
DimPlot(sce,reduction = "umap",label=T,split.by ='orig.ident',group.by = 'singleR')
ggplot2::ggsave(filename = paste0(pro,'_tSNE_singleR_by_orig.ident.pdf'))
save(sce,file = 'last_sce.Rdata')` </pre>
出圖如下:
降維聚類分群注釋</figcaption>
可以看到效果還是杠桿的,而且我全程都是標(biāo)準(zhǔn)代碼,就是follow群主的教程即可,我的R也是半吊子水平,只有你敢動(dòng)手,這個(gè)圖你也可以自己親手做出來哦。