1. R實(shí)現(xiàn)進(jìn)度條和并行運(yùn)算
- 在正式開(kāi)始之前插這么一則題外話,因?yàn)檫@次跑10X的流程是在服務(wù)器里,我們64線程128G內(nèi)存的奢華配置不能白白浪費(fèi)了,所以要學(xué)一下R語(yǔ)言的并行運(yùn)算,又因?yàn)橛泻芏嗖襟E要跑好久,不知道跑得怎么樣了,所以要學(xué)一下怎么顯示進(jìn)度條。
1.1. 進(jìn)度條
1.2. 并行運(yùn)算
library(parallel)
cl.cores <- detectCores() #檢查當(dāng)前電腦可用核數(shù)。
cl <- makeCluster(cl.cores) #使用剛才檢測(cè)的核并行運(yùn)算
- 這時(shí)就可以用下面兩種方法中的一種實(shí)現(xiàn)并行運(yùn)算
-
clusterEvalQ(cl,expr)函數(shù)利用創(chuàng)建的cl執(zhí)行expr。這里利用剛才創(chuàng)建的cl核并行運(yùn)算expr。expr是執(zhí)行命令的語(yǔ)句,不過(guò)如果命令太長(zhǎng)的話,一般寫(xiě)到文件里比較好。比如把想執(zhí)行的命令放在Rcode.r里:clusterEvalQ(cl,source(file="Rcode.r"))
-
par開(kāi)頭的apply家族。這族函數(shù)和apply的用法基本一樣,不過(guò)要多加一個(gè)參數(shù)cl。一般如果cl創(chuàng)建如上面cl <- makeCluster(cl.cores)的話,這個(gè)參數(shù)可以直接用作parApply(cl=cl,…)。當(dāng)然Apply也可以是Sapply,Lapply等等。注意par后面的第一個(gè)字母是要大寫(xiě)的。
2. 準(zhǔn)備工作
2.1. 下載數(shù)據(jù)
rm(list = ls())
options(warn=-1)
suppressMessages(library(Seurat))
# 讀取表達(dá)矩陣
start_time <- Sys.time()
raw_dataPBMC <- read.csv('./GSE117988_raw.expMatrix_PBMC.csv.gz', header = TRUE, row.names = 1)
end_time <- Sys.time()
end_time - start_time
# Time difference of 53.70495 secs
dim(raw_dataPBMC)
# [1] 17712 12874
# 17712個(gè)基因,12874個(gè)細(xì)胞
2.2. 歸一化
dataPBMC <- log2(1 + sweep(raw_dataPBMC, 2,
median(colSums(raw_dataPBMC))/colSums(raw_dataPBMC), '*'))
- 非常優(yōu)美的一行代碼
-
sweep函數(shù)一般需要4個(gè)參數(shù)
- 操作對(duì)象:矩陣或數(shù)據(jù)框;
- 1和2中的一個(gè),和
apply一樣;
- 要施加在操作對(duì)象上的向量,如果要對(duì)行操作,那么這個(gè)向量長(zhǎng)度就要和行數(shù)一樣;
- 計(jì)算符,比如:+ - * / < > 等。
2.3. 提取時(shí)間點(diǎn)信息
# 提取時(shí)間點(diǎn)這一分組信息
head(colnames(dataPBMC))
timePoints <- sapply(head(colnames(dataPBMC)),
function(x) unlist(strsplit(x, "[.]"))[2])
timePoints <- sapply(colnames(dataPBMC), function(x) unlist(strsplit(x, "[.]"))[2])
table(timePoints)
timePoints <-ifelse(timePoints == '1', 'PBMC_Pre',
ifelse(timePoints == '2', 'PBMC_EarlyD27',
ifelse(timePoints == '3', 'PBMC_RespD376', 'PBMC_ARD614')))
table(timePoints)
2.4. 質(zhì)控
# 關(guān)注兩點(diǎn)
# 第一點(diǎn):基因在多少細(xì)胞表達(dá)
fivenum(apply(dataPBMC,1,function(x) sum(x>0) ))
# RP4-669L17.10 LAMB3 NAT10 AC093673.5 RPL21
# 1 6 50 207 12102
boxplot(apply(dataPBMC,1,function(x) sum(x>0) ))
# 第二點(diǎn):細(xì)胞中有多少表達(dá)的基因
fivenum(apply(dataPBMC,2,function(x) sum(x>0) ))
# CAAGAAATCGATCCCT.2 GGCCGATTCCAGAGGA.3 TCAACGAAGAGCTGGT.3
# 10 321 395
# TGCCAAAGTCTGCGGT.4 TCTGGAATCTATCGCC.3
# 481 2865
hist(apply(dataPBMC,2,function(x) sum(x>0) ))
- 在一萬(wàn)多個(gè)基因中,75%的基因只在200多個(gè)細(xì)胞中有表達(dá);而在一萬(wàn)多個(gè)細(xì)胞中,75%的細(xì)胞也只表達(dá)不到500個(gè)基因。說(shuō)明這個(gè)數(shù)據(jù)質(zhì)量是一般的,一般來(lái)說(shuō),10X可以測(cè)到800個(gè)基因。
3. 分群、可視化
3.1. 創(chuàng)建Seurat對(duì)象
# Seurat V3
PBMC <- CreateSeuratObject(dataPBMC,
min.cells = 1,
min.features = 0,
project = '10x_PBMC')
PBMC
# An object of class Seurat
# 17712 features across 12874 samples within 1 assay
# Active assay: RNA (17712 features)
3.2. 添加metadata
PBMC <- AddMetaData(object = PBMC,
metadata = apply(raw_dataPBMC, 2, sum),
col.name = 'nUMI_raw')
PBMC <- AddMetaData(object = PBMC, metadata = timePoints, col.name = 'TimePoints')
head(PBMC@meta.data)
# orig.ident nCount_RNA nFeature_RNA # nUMI_raw TimePoints
# AAACCTGAGCGAAGGG.1 10x_PBMC 561.5564 517 # 1896 PBMC_Pre
# AAACCTGAGGTCATCT.1 10x_PBMC 553.9634 328 # 751 PBMC_Pre
# AAACCTGAGTCCTCCT.1 10x_PBMC 522.4279 380 # 1228 PBMC_Pre
# AAACCTGCACCAGCAC.1 10x_PBMC 590.6022 573 # 1919 PBMC_Pre
# AAACCTGGTAACGTTC.1 10x_PBMC 479.7199 263 # 698 PBMC_Pre
# AAACCTGGTAAGGATT.1 10x_PBMC 634.6747 433 # 763 PBMC_Pre
3.3. 聚類標(biāo)準(zhǔn)流程
##step1 標(biāo)準(zhǔn)化
PBMC <- ScaleData(object = PBMC, vars.to.regress = c('nUMI_raw'), model.use = 'linear', use.umi = FALSE)
##step2 找差異基因
PBMC <- FindVariableFeatures(object = PBMC, mean.function = ExpMean, dispersion.function = LogVMR, mean.cutoff = c(0.0125,3), dispersion.cutoff = c(0.5,Inf))
##step3 根據(jù)這些基因進(jìn)行PCA降維
PBMC <- RunPCA(object = PBMC, pc.genes = PBMC@var.genes)
##step4 根據(jù)PCA結(jié)果確定分群
PBMC <- FindNeighbors(PBMC, reduction = "pca", dims = 1:10,
k.param = 35)
PBMC <- FindClusters(object = PBMC,
resolution = 1, verbose=F)
##step5 t-SNE降維
PBMC <- RunTSNE(object = PBMC, dims.use = 1:10)
##step6 可視化
DimPlot(PBMC)
3.4. 保存對(duì)象
save(PBMC,raw_dataPBMC,file = 'patient1.PBMC.output.Rdata')