讀數(shù)據(jù)

合并數(shù)據(jù),創(chuàng)建seurat對(duì)象

合并樣本的數(shù)據(jù)可以使用cbind() 基因數(shù)相同時(shí)
或者用merge() 基因數(shù)不同時(shí)
創(chuàng)建時(shí),用min.cells,min.features進(jìn)行簡(jiǎn)單的過(guò)濾

QC

人類中MT開頭,小鼠中mt開頭

比如第二種情況如果選5%進(jìn)行過(guò)濾,會(huì)過(guò)濾掉很多細(xì)胞,可以考慮用25%過(guò)濾
第五種情況大概率樣本不能用


不同的樣本可以用不同的cutoff,可以參考下面這篇文章中各種組織的線粒體cutoff
Systematic determination of the mitochondrial proportion in human and mice tissues for single-cell RNA-sequencing data quality control - PubMed (nih.gov)

過(guò)濾+標(biāo)準(zhǔn)化

dim(),顯示有幾行幾列

#計(jì)算表達(dá)量變化顯著的基因FindVariableFeatures
experiment.aggregate <- FindVariableFeatures(experiment.aggregate,
selection.method = "vst",
nfeatures = 1000)
正常情況表達(dá)量越高的基因,差異越大,如果不是,代表數(shù)據(jù)量太小了,或者QC有問(wèn)題。
均一化及PCA降維


TSNE聚類(當(dāng)細(xì)胞量幾千時(shí),用UMAP)

找marker

先初篩

p_value:該基因在自己的亞群其他細(xì)胞亞群中的差異
logFC:平均表達(dá)倍數(shù)
pct1:該基因在自己的亞群中百分之多少的細(xì)胞中表達(dá)
pct2:在其他亞群中百分之多少的細(xì)胞中表達(dá)
p_val_adj:一般看這個(gè),就是FDR
細(xì)篩

3.在自己的亞群中至少50%的細(xì)胞表達(dá),也可以PCT2<0.5
4.5.按情況選擇要不要做
注釋(代碼在cell_marker_annotation.R)

0亞群很多是T細(xì)胞的marker,可以初步注釋為T細(xì)胞
1亞群也像T或者NK細(xì)胞
0和1很相似,可以計(jì)算這個(gè)亞群的差異基因,代碼在sub_analysis_scRNA
計(jì)算特定兩組細(xì)胞之間的差異基因,決定要不要把兩個(gè)亞群合并

這里看的是0和3的差異

發(fā)現(xiàn)0和3差異大的都是線粒體基因,說(shuō)明本身差異不大,那就可以合并
再看看0,3和1的差異


如果這些差異是有意義的,那么1可以不合并
免疫細(xì)胞注釋可以用singleR

注釋小結(jié)

可以把每個(gè)亞群找到的marker用在線工具做功能富集,初步看一下功能




代碼
#### Code Description ####
#--- 1. Written by WoLin @ 2019.03.15,last update 19.07.14 ---#
#--- 2. Analysis for single sample ---#
#--- 3. Support 10X data & expression matrix ---#
#--- 4. Need to change:sample data, sam.name, dims ---#
#### 1. 加載分析使用的工具包 ####
library(Seurat)
library(ggplot2)
library(cowplot)
library(Matrix)
library(dplyr)
library(ggsci)
#### 2. 讀入原始表達(dá)數(shù)據(jù) ####
# 以下兩種方式二選一
# 10X 數(shù)據(jù)
bm1 <- Read10X("./BoneMarrow/BM1/")#read10×函數(shù)只需要寫到文件夾,不需要寫文件名
bm2 <- Read10X("./BoneMarrow/BM2/")
# panglaoBD下載的Rdata可以直接用load讀出一個(gè)相同的矩陣
# 這里的列名就是barcode,代表一個(gè)細(xì)胞,行名是基因
colnames(bm1)[1:10] # 看bm1中前10個(gè)細(xì)胞的名字
colnames(bm2)[1:10]
# 合并前在細(xì)胞上(列名)打上樣本標(biāo)簽
colnames(bm1) <- paste(colnames(bm1),"BM1",sep = "_") # 把列名改成細(xì)胞名_樣本來(lái)源
colnames(bm2) <- paste(colnames(bm2),"BM2",sep = "_")
#將所有讀入的數(shù)據(jù)合并成一個(gè)大的矩陣,確保行數(shù)(基因數(shù))相等,增加列
#合并時(shí)需注意行名一致
#既有10X的數(shù)據(jù)又有表達(dá)矩陣的數(shù)據(jù),全部轉(zhuǎn)換為表達(dá)矩陣再進(jìn)行合并
#關(guān)于矩陣合并請(qǐng)見單獨(dú)的矩陣合并腳本“merge_matrix.R”(行數(shù)不一樣樣時(shí)用這個(gè)代碼)
experiment.data <- cbind(bm1,bm2) # 行數(shù)相同時(shí)可以用cbind(樣本1,樣本2)
#創(chuàng)建一個(gè)叫multi的文件夾用于存放分析結(jié)果
sam.name <- "multi"
if(!dir.exists(sam.name)){
dir.create(sam.name)
}
#### 3. 創(chuàng)建Seurat分析對(duì)象 ####
experiment.aggregate <- CreateSeuratObject(
experiment.data,
project = "multi",
min.cells = 10,#基因至少在10個(gè)細(xì)胞里有表達(dá),過(guò)濾基因
min.features = 200,#細(xì)胞最少表達(dá)200個(gè)基因,過(guò)濾細(xì)胞
names.field = 2,
names.delim = "_")
#將數(shù)據(jù)寫到文件中一邊后續(xù)分析使用
save(experiment.aggregate,file=paste0("./",sam.name,"/",sam.name,"_raw_SeuratObject.RData"))
#### 4. 數(shù)據(jù)概覽 & QC ####
#查看SeuratObject中的對(duì)象
slotNames(experiment.aggregate)
#assay
experiment.aggregate@assays
#細(xì)胞及細(xì)胞中基因與RNA數(shù)量
dim(experiment.aggregate@meta.data)#顯示有幾行幾列,行是細(xì)胞
View(experiment.aggregate@meta.data)
#第一列是樣本名,在創(chuàng)建seurat對(duì)象時(shí)定義下劃線后面的部分是樣本名
#第二列ncountRNA是UMI數(shù),第三列nfeature是基因數(shù)
table(experiment.aggregate@meta.data$orig.ident)#統(tǒng)計(jì)matadata的第一列元素的個(gè)數(shù)
#轉(zhuǎn)換成表達(dá)矩陣
experiment.aggregate.matrix <- as.matrix(experiment.aggregate@assays$RNA@counts)
##QC:統(tǒng)計(jì)線粒體基因在每個(gè)細(xì)胞中的占比
experiment.aggregate[["percent.mt"]] <- PercentageFeatureSet(experiment.aggregate,
pattern = "^MT-")#MT開頭的是線粒體基因,在小鼠中是mt開頭
#小提琴圖可視化
pdf(paste0("./",sam.name,"/QC-VlnPlot.pdf"),width = 8,height = 4.5)
VlnPlot(experiment.aggregate, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"),
ncol = 3)
dev.off()
##QC:統(tǒng)計(jì)基因數(shù),RNA,線粒體基因分布
gene.freq <- do.call("cbind", tapply(experiment.aggregate@meta.data$nFeature_RNA,
experiment.aggregate@meta.data$orig.ident,quantile,probs=seq(0,1,0.05)))
rna.freq <- do.call("cbind", tapply(experiment.aggregate@meta.data$nCount_RNA,experiment.aggregate@meta.data$orig.ident,
quantile,probs=seq(0,1,0.05)))
mt.freq <- do.call("cbind", tapply(experiment.aggregate@meta.data$percent.mt,experiment.aggregate@meta.data$orig.ident,quantile,probs=seq(0,1,0.05)))
freq.combine <- as.data.frame(cbind(gene.freq,rna.freq,mt.freq))
colnames(freq.combine) <- c(paste(colnames(gene.freq),"Gene",sep = "_"),
paste(colnames(rna.freq),"RNA",sep = "_"),
paste(colnames(mt.freq),"MT",sep = "_"))
write.table(freq.combine,file = paste0(sam.name,"/QC-gene_frequency.txt"),quote = F,sep = "\t")
rm(gene.freq,rna.freq,mt.freq)
View(freq.combine)
#先看小提琴圖,如果細(xì)胞質(zhì)量還可以,保留80%的細(xì)胞,如果比較差,保留60-70%細(xì)胞
##QC:基因數(shù)與線粒體基因以及RNA數(shù)量的分布相關(guān)性
plot1 <- FeatureScatter(experiment.aggregate, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(experiment.aggregate, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
pdf(paste0("./",sam.name,"/QC-FeatureScatter.pdf"),width = 8,height = 4.5)
CombinePlots(plots = list(plot1, plot2),legend = "none")
dev.off()
rm(plot1,plot2)
#紅色點(diǎn)(BM1)已經(jīng)接近飽和曲線的拐點(diǎn)(測(cè)到umi越多,測(cè)到的基因越多,到一定程度,即使增加umi,也不能增加很多測(cè)到的基因)
#藍(lán)色點(diǎn)(BM2)還沒到飽和曲線的拐點(diǎn),這個(gè)樣本的細(xì)胞測(cè)到的基因量少
#### 5. 篩選細(xì)胞 ####
cat("Before filter :",nrow(experiment.aggregate@meta.data),"cells\n")
experiment.aggregate <- subset(experiment.aggregate,
subset =
nFeature_RNA > 500 & # 基因數(shù)>500
nCount_RNA > 1000 & # UMI>1000
nCount_RNA < 20000 & # UMI<20000(過(guò)濾雙細(xì)胞)
percent.mt < 5) # 線粒體基因百分比<5
cat("After filter :",nrow(experiment.aggregate@meta.data),"cells\n")
table(experiment.aggregate@meta.data$orig.ident)#看過(guò)濾完兩個(gè)樣本還有多少細(xì)胞
#### 6. 表達(dá)量標(biāo)準(zhǔn)化 ####
experiment.aggregate <- NormalizeData(experiment.aggregate,
normalization.method = "LogNormalize",
scale.factor = 10000)
#計(jì)算表達(dá)量變化顯著的基因FindVariableFeatures
experiment.aggregate <- FindVariableFeatures(experiment.aggregate,
selection.method = "vst",
nfeatures = 1000)
#一般500-2500個(gè)feature(基因),細(xì)胞類型越復(fù)雜,需要的feature(基因)越多
#展示標(biāo)準(zhǔn)化之后的整體表達(dá)水平
top10 <- head(x = VariableFeatures(experiment.aggregate), 10)
plot1 <- VariableFeaturePlot(experiment.aggregate)
plot2 <- LabelPoints(plot = plot1, points = top10)
pdf(file = paste0(sam.name,"/Norm-feature_variable_plot.pdf"),width = 8,height = 5)
CombinePlots(plots = list(plot1, plot2),legend = "none")
dev.off()
#### 7. 均一化與PCA ####
#均一化(需要一點(diǎn)時(shí)間)
experiment.aggregate <- ScaleData(
object = experiment.aggregate,
do.scale = FALSE,
do.center = FALSE,
vars.to.regress = c("orig.ident","percent.mt"))#去批次的因素,這里選擇不同樣本來(lái)源和線粒體基因百分比
#任何批次效應(yīng)校正都會(huì)損失一些信息,所以一開始不要進(jìn)行太強(qiáng)的批次效應(yīng)校正
#PCA降維計(jì)算(兩個(gè)作用:1看批次效應(yīng)校正得怎么樣 2聚類)
experiment.aggregate <- RunPCA(object = experiment.aggregate,
features = VariableFeatures(experiment.aggregate),
verbose = F,npcs = 50)
#PCA結(jié)果展示-1
pdf(paste0("./",sam.name,"/PCA-VizDimLoadings.pdf"),width = 7,height = 5)
VizDimLoadings(experiment.aggregate, dims = 1:2, reduction = "pca")
dev.off()
#PCA結(jié)果展示-2
pdf(paste0("./",sam.name,"/PCA-DimPlot.pdf"),width = 5,height = 4)
DimPlot(experiment.aggregate, reduction = "pca")
dev.off()
#PCA結(jié)果展示-3
pdf(paste0("./",sam.name,"/PCA-DimHeatmap.pdf"),width = 5,height = 4)
DimHeatmap(experiment.aggregate, dims = 1:6, cells = 500, balanced = TRUE)
dev.off()
#### 8. 確定細(xì)胞類群分析PC ####
#耗時(shí)較久,一般不用
experiment.aggregate <- JackStraw(experiment.aggregate, num.replicate = 100,dims = 40)
experiment.aggregate <- ScoreJackStraw(experiment.aggregate, dims = 1:40)
pdf(paste0("./",sam.name,"/PCA-JackStrawPlot_40.pdf"),width = 6,height = 5)
JackStrawPlot(object = experiment.aggregate, dims = 1:40)
dev.off()
#碎石圖
pdf(paste0("./",sam.name,"/PCA-ElbowPlot.pdf"),width = 6,height = 5)
ElbowPlot(experiment.aggregate,ndims = 40)
dev.off()
#一般拐點(diǎn)不超過(guò)20
#確定用于細(xì)胞分群的PC
dim.use <- 1:20
#### 9. 細(xì)胞分群TSNE算法 ####
#TSNE算法(細(xì)胞量比較少的時(shí)候(幾千),用UMAP)
experiment.aggregate <- FindNeighbors(experiment.aggregate, dims = dim.use)#計(jì)算細(xì)胞相似性
experiment.aggregate <- FindClusters(experiment.aggregate, resolution = 0.5)#resolution越高,細(xì)胞分出來(lái)的類越多
experiment.aggregate <- RunTSNE(experiment.aggregate, dims = dim.use,
do.fast = TRUE)
pdf(paste0("./",sam.name,"/CellCluster-TSNEPlot_res0.5_",max(dim.use),"PC.pdf"),width = 5,height = 4)
DimPlot(object = experiment.aggregate, pt.size=0.5,label = T)
dev.off()
#按照數(shù)據(jù)來(lái)源分組展示細(xì)胞異同--畫在一張圖中
pdf(paste0("./",sam.name,"/CellCluster-TSNEPlot_SamGroup_",max(dim.use),"PC.pdf"),width = 5,height = 4)
DimPlot(object = experiment.aggregate,
group.by="orig.ident",
pt.size=0.5,reduction = "tsne")
dev.off()
#按照數(shù)據(jù)來(lái)源分組展示細(xì)胞異同--畫在多張圖中
pdf(paste0("./",sam.name,"/CellCluster-TSNEPlot_SamGroup_slipt_",max(dim.use),"PC.pdf"),width = 8,height = 4)
DimPlot(object = experiment.aggregate,
split.by ="orig.ident",
pt.size=0.5,reduction = "tsne")
dev.off()
table(experiment.aggregate@meta.data$orig.ident)
View(experiment.aggregate@meta.data)#剛才的分群結(jié)果已經(jīng)加到每個(gè)細(xì)胞的最后一列
table(experiment.aggregate@meta.data$orig.ident,experiment.aggregate@meta.data$seurat_clusters)#每個(gè)樣本中每群細(xì)胞數(shù)量
#個(gè)性化畫圖代碼見Sub_analysis_scRNA
#### 10. 計(jì)算marker基因 ####
#這一步計(jì)算的時(shí)候可以把min.pct以及l(fā)ogfc.threshold調(diào)的比較低,然后再基于結(jié)果手動(dòng)篩選
all.markers <- FindAllMarkers(experiment.aggregate, only.pos = TRUE,
min.pct = 0.3, logfc.threshold = 0.25)
write.table(all.markers,
file=paste0("./",sam.name,"/",sam.name,"_total_marker_genes_tsne_",max(dim.use),"PC.txt"),
sep="\t",quote = F,row.names = F)
# 遍歷每一個(gè)cluster然后展示其中前4個(gè)基因
marker.sig <- all.markers %>%
mutate(Ratio = round(pct.1/pct.2,3)) %>%
filter(p_val_adj <= 0.05) # 本條件為過(guò)濾統(tǒng)計(jì)學(xué)不顯著的基因
for(cluster_id in unique(marker.sig$cluster)){
# cluster.markers <- FindMarkers(experiment.aggregate, ident.1 = cluster, min.pct = 0.3)
# cluster.markers <- as.data.frame(cluster.markers) %>%
# mutate(Gene = rownames(cluster.markers))
cl4.genes <- marker.sig %>%
filter(cluster == cluster_id) %>%
arrange(desc(avg_log2FC))
cl4.genes <- cl4.genes[1:min(nrow(cl4.genes),4),"gene"]
#VlnPlot
pvn <- VlnPlot(experiment.aggregate, features = cl4.genes,ncol = 2)
pdf(paste0("./",sam.name,"/MarkerGene-VlnPlot_cluster",cluster_id,"_tsne_",max(dim.use),"PC.pdf"),width = 7,height = 6)
print(pvn)
dev.off()
#Feather plot
pvn <- FeaturePlot(experiment.aggregate,features=cl4.genes,ncol = 2)
pdf(paste0("./",sam.name,"/MarkerGene-FeaturePlot_cluster",cluster_id,"_tsne_",max(dim.use),"PC.pdf"),width = 7,height = 6)
print(pvn)
dev.off()
#RidgePlot
pvn<-RidgePlot(experiment.aggregate, features = cl4.genes, ncol = 2)
pdf(paste0("./",sam.name,"/MarkerGene-RidgePlot_cluster",cluster_id,"_tsne_",max(dim.use),"PC.pdf"),width = 7,height = 6)
print(pvn)
dev.off()
}
rm(cl4.genes,cluster_id,pvn)
#熱圖展示Top marker基因
#篩選top5的marker基因,可以通過(guò)參數(shù)改為其他數(shù)值
top5 <- marker.sig %>% group_by(cluster) %>%
top_n(n = 5, wt = avg_log2FC)
#top-marker基因熱圖
pdf(paste0("./",sam.name,"/MarkerGene-Heatmap_all_cluster_tsne_",max(dim.use),"PC.pdf"))
DoHeatmap(experiment.aggregate, features = top5$gene,size = 2) +
theme(legend.position = "none",
axis.text.y = element_text(size = 6))
dev.off()
#top-marker基因dotplot
pdf(paste0("./",sam.name,"/MarkerGene-DotPlot_all_cluster_tsne_",max(dim.use),"PC.pdf"),width = 50,height = 5)
DotPlot(experiment.aggregate, features = unique(top5$gene))+
RotatedAxis()
dev.off()