前言

Immugent最近在讀文獻(xiàn)時(shí)一直遇到各種利用deconvolution基于單細(xì)胞參考數(shù)據(jù)集進(jìn)行bulk細(xì)胞定量的文章，于是專門系統(tǒng)的學(xué)習(xí)了一下此類常用的幾個(gè)軟件，今天就做一個(gè)簡單的分享。

其實(shí)在之前IOBR:一個(gè)R包帶你走進(jìn)數(shù)據(jù)挖掘的殿堂一文中就已經(jīng)系統(tǒng)的介紹各種基于參考基因集進(jìn)行bulk細(xì)胞定量的算法,但是這些算法都是bulk對bulk,要么對基因的選擇要求很高，要么識別的細(xì)胞亞群有限。而近些年大量單細(xì)胞數(shù)據(jù)集的出現(xiàn)，讓我們想到如果利用定義好的單個(gè)細(xì)胞水平的數(shù)據(jù)構(gòu)建reference，結(jié)合deconvolution對bulk數(shù)據(jù)進(jìn)行細(xì)胞定量，那豈不是美滋滋。

于是deconvolution的門面算法之一CIBERSORT為實(shí)現(xiàn)這個(gè)理念，就在2019年推出了新版CIBERSORTx,相應(yīng)文章發(fā)表在了Nature Biotec hnology (IF:54.9),這個(gè)雜志可謂生信甚至實(shí)驗(yàn)技術(shù)的天花板了，可見其價(jià)值之高。

圖片

就當(dāng)小編興致沖沖打算用CIBERSORTx玩一圈TCGA數(shù)據(jù)時(shí)，出現(xiàn)了很多bug，因?yàn)槟壳斑@個(gè)軟件只有網(wǎng)頁版的工具，想要做分析必須將自己的數(shù)據(jù)傳上去，然鵝系統(tǒng)并不能識別新上傳的文件，就很難受了。雖然網(wǎng)頁友好的給出了鏡像文件，但是又搞了個(gè)什么需要用郵箱申請密鑰，于是小編等了好幾天都沒等到，干脆不等了。

于是小編開始尋求新的此類軟件，就發(fā)現(xiàn)了同年發(fā)表的另一個(gè)軟件MuSiC，雖然雜志沒有上一個(gè)好，但也算不錯(cuò)了，而且兩年內(nèi)被引用了幾百次了，可見這個(gè)算法還是值得信賴的。

圖片

最難能可貴的是這個(gè)軟件是一個(gè)非常容易安裝的R包，運(yùn)算也很快，非常的輕便。最給力的是它還可以對小鼠和大鼠的bulk數(shù)據(jù)進(jìn)行deconvolution,因此深得Immugent的喜愛，就走了一下分析流程。

主要流程

作者給出了以下幾套數(shù)據(jù)集(都是已經(jīng)發(fā)表的GE數(shù)據(jù))，以便于熟悉此軟件的用法。

圖片

setwd("D:\\Music")
library(MuSiC)

# bulk data
GSE50244.bulk.eset = readRDS('GSE50244bulkeset.rds')
GSE50244.bulk.eset

# EMTAB single cell dataset
EMTAB.eset = readRDS('EMTABesethealthy.rds')
EMTAB.eset

# Download Xin et al. single cell dataset from Github
XinT2D.eset = readRDS('XinT2Deset.rds')
XinT2D.eset

# Estimate cell type proportions
Est.prop.GSE50244 = music_prop(bulk.eset = GSE50244.bulk.eset, sc.eset = EMTAB.eset, clusters = 'cellType',
                               samples = 'sampleID', select.ct = c('alpha', 'beta', 'delta', 'gamma',
                                                                   'acinar', 'ductal'), verbose = F)
                                                            
                                                            
names(Est.prop.GSE50244)
#[1] "Est.prop.weighted" "Est.prop.allgene"  "Weight.gene"       "r.squared.full"    "Var.prop"  

# Jitter plot of estimated cell type proportions
jitter.fig = Jitter_Est(list(data.matrix(Est.prop.GSE50244$Est.prop.weighted),
                             data.matrix(Est.prop.GSE50244$Est.prop.allgene)),
                        method.name = c('MuSiC', 'NNLS'), title = 'Jitter plot of Est Proportions')

# A more sophisticated jitter plot is provided as below. We seperated the T2D subjects and normal 
#subjects by their HbA1c levels.
library(reshape2)
m.prop.GSE50244 = rbind(melt(Est.prop.GSE50244$Est.prop.weighted), 
                        melt(Est.prop.GSE50244$Est.prop.allgene))

colnames(m.prop.GSE50244) = c('Sub', 'CellType', 'Prop')
m.prop.GSE50244$CellType = factor(m.prop.GSE50244$CellType, levels = c('alpha', 'beta', 'delta', 'gamma', 'acinar', 'ductal'))
m.prop.GSE50244$Method = factor(rep(c('MuSiC', 'NNLS'), each = 89*6), levels = c('MuSiC', 'NNLS'))
m.prop.GSE50244$HbA1c = rep(GSE50244.bulk.eset$hba1c, 2*6)
m.prop.GSE50244 = m.prop.GSE50244[!is.na(m.prop.GSE50244$HbA1c), ]
m.prop.GSE50244$Disease = factor(c('Normal', 'T2D')[(m.prop.GSE50244$HbA1c > 6.5)+1], levels = c('Normal', 'T2D'))
m.prop.GSE50244$D = (m.prop.GSE50244$Disease == 'T2D')/5
m.prop.GSE50244 = rbind(subset(m.prop.GSE50244, Disease == 'Normal'), subset(m.prop.GSE50244, Disease != 'Normal'))

jitter.new = ggplot(m.prop.GSE50244, aes(Method, Prop)) + 
  geom_point(aes(fill = Method, color = Disease, stroke = D, shape = Disease), 
             size = 2, alpha = 0.7, position = position_jitter(width=0.25, height=0)) +
  facet_wrap(~ CellType, scales = 'free') + scale_colour_manual( values = c('white', "gray20")) +
  scale_shape_manual(values = c(21, 24))+ theme_minimal()
library(cowplot)
plot_grid(jitter.fig, jitter.new, labels = 'auto')

圖片

簡單一個(gè)函數(shù)即可實(shí)現(xiàn)基于單細(xì)胞數(shù)據(jù)對bulk進(jìn)行細(xì)胞定可謂非常輕便，除了使用自身算法進(jìn)行定量，MuSiC還嵌入了非負(fù)矩陣的算法來進(jìn)行對比，從上圖可以看出兩者間相似性非常高。

關(guān)聯(lián)分析

眾所周知，β細(xì)胞比例與T2D疾病狀態(tài)有關(guān)。在T2D過程中，β細(xì)胞數(shù)量減少。其中最重要的T2D檢測是糖化血紅蛋白(HbA1c)檢測。當(dāng)糖化血紅蛋白水平大于6.5%時(shí)，診斷為T2D。我們可以看β細(xì)胞比例與糖化血紅蛋白水平的關(guān)系。


# Create dataframe for beta cell proportions and HbA1c levels
m.prop.ana = data.frame(pData(GSE50244.bulk.eset)[rep(1:89, 2), c('age', 'bmi', 'hba1c', 'gender')],
                        ct.prop = c(Est.prop.GSE50244$Est.prop.weighted[, 2], 
                                    Est.prop.GSE50244$Est.prop.allgene[, 2]), 
                        Method = factor(rep(c('MuSiC', 'NNLS'), each = 89), 
                                        levels = c('MuSiC', 'NNLS')))
colnames(m.prop.ana)[1:4] = c('Age', 'BMI', 'HbA1c', 'Gender')
m.prop.ana = subset(m.prop.ana, !is.na(HbA1c))
m.prop.ana$Disease = factor( c('Normal', 'T2D')[(m.prop.ana$HbA1c > 6.5) + 1], c('Normal', 'T2D') )
m.prop.ana$D = (m.prop.ana$Disease == 'T2D')/5

ggplot(m.prop.ana, aes(HbA1c, ct.prop)) + 
  geom_smooth(method = 'lm',  se = FALSE, col = 'black', lwd = 0.25) +
  geom_point(aes(fill = Method, color = Disease, stroke = D, shape = Disease), size = 2, alpha = 0.7) +  facet_wrap(~ Method) + 
  ggtitle('HbA1c vs Beta Cell Type Proportion') + theme_minimal() + 
  scale_colour_manual( values = c('white', "gray20")) + 
  scale_shape_manual(values = c(21, 24))


lm.beta.MuSiC = lm(ct.prop ~ HbA1c + Age + BMI + Gender, data = subset(m.prop.ana, Method == 'MuSiC'))
lm.beta.NNLS = lm(ct.prop ~ HbA1c + Age + BMI + Gender, data = subset(m.prop.ana, Method == 'NNLS'))
summary(lm.beta.MuSiC)
summary(lm.beta.NNLS)

圖片

從上圖我們可以看出β細(xì)胞比例與糖化血紅蛋白水平呈現(xiàn)明顯的負(fù)調(diào)節(jié)關(guān)系，而且MuSiC算出的系數(shù)比非負(fù)矩陣更顯著。

對小鼠bulk數(shù)據(jù)進(jìn)行分析

# Mouse bulk dataset
Mouse.bulk.eset = readRDS('Mousebulkeset.rds')
Mouse.bulk.eset

# Download EMTAB single cell dataset from Github
Mousesub.eset = readRDS('Mousesubeset.rds')
Mousesub.eset

levels(Mousesub.eset$cellType)
# [1] "Endo"     "Podo"     "PT"       "LOH"      "DCT"      "CD-PC"    "CD-IC"    "CD-Trans" "Novel1"  
#[10] "Fib"      "Macro"    "Neutro"   "B lymph"  "T lymph"  "NK"       "Novel2"    

# Produce the first step information
Mousesub.basis = music_basis(Mousesub.eset, clusters = 'cellType', samples = 'sampleID', 
                             select.ct = c('Endo', 'Podo', 'PT', 'LOH', 'DCT', 'CD-PC', 'CD-IC', 'Fib',
                                           'Macro', 'Neutro','B lymph', 'T lymph', 'NK'))

# Plot the dendrogram of design matrix and cross-subject mean of realtive abundance
par(mfrow = c(1, 2))
d <- dist(t(log(Mousesub.basis$Disgn.mtx + 1e-6)), method = "euclidean")
# Hierarchical clustering using Complete Linkage
hc1 <- hclust(d, method = "complete" )
# Plot the obtained dendrogram
plot(hc1, cex = 0.6, hang = -1, main = 'Cluster log(Design Matrix)')
d <- dist(t(log(Mousesub.basis$M.theta + 1e-8)), method = "euclidean")
# Hierarchical clustering using Complete Linkage
# hc2 <- hclust(d, method = "complete" )
hc2 <- hclust(d, method = "complete")
# Plot the obtained dendrogram
plot(hc2, cex = 0.6, hang = -1, main = 'Cluster log(Mean of RA)')

圖片

這個(gè)圖是對不同細(xì)胞進(jìn)行聚類，可以看出免疫細(xì)胞和非免疫細(xì)胞聚類不同。

clusters.type = list(C1 = 'Neutro', C2 = 'Podo', C3 = c('Endo', 'CD-PC', 'LOH', 'CD-IC', 'DCT', 'PT'), C4 = c('Macro', 'Fib', 'B lymph', 'NK', 'T lymph'))

cl.type = as.character(Mousesub.eset$cellType)

for(cl in 1:length(clusters.type)){
  cl.type[cl.type %in% clusters.type[[cl]]] = names(clusters.type)[cl]
}
pData(Mousesub.eset)$clusterType = factor(cl.type, levels = c(names(clusters.type), 'CD-Trans', 'Novel1', 'Novel2'))

# 13 selected cell types
s.mouse = unlist(clusters.type)
s.mouse


load('IEmarkers.RData')
# This RData file provides two vectors of gene names Epith.marker and Immune.marker

# We now construct the list of group marker
IEmarkers = list(NULL, NULL, Epith.marker, Immune.marker)
names(IEmarkers) = c('C1', 'C2', 'C3', 'C4')
# The name of group markers should be the same as the cluster names

Est.mouse.bulk = music_prop.cluster(bulk.eset = Mouse.bulk.eset, sc.eset = Mousesub.eset, group.markers = IEmarkers, 
                                    clusters = 'cellType', groups = 'clusterType', samples = 'sampleID', clusters.type = clusters.type)

用已知細(xì)胞構(gòu)成對MuSiC定量結(jié)果進(jìn)行評估

XinT2D.construct.full = bulk_construct(XinT2D.eset, clusters = 'cellType', samples = 'SubjectName')

names(XinT2D.construct.full)
# [1] "Bulk.counts" "num.real" 

XinT2D.construct.full$Bulk.counts

head(XinT2D.construct.full$num.real)
XinT2D.construct.full$prop.real = relative.ab(XinT2D.construct.full$num.real, by.col = FALSE)
head(XinT2D.construct.full$prop.real)

# Estimate cell type proportions of artificial bulk data
Est.prop.Xin = music_prop(bulk.eset = XinT2D.construct.full$Bulk.counts, sc.eset = EMTAB.eset,
                          clusters = 'cellType', samples = 'sampleID', 
                          select.ct = c('alpha', 'beta', 'delta', 'gamma'))

# Estimation evaluation
Eval_multi(prop.real = data.matrix(XinT2D.construct.full$prop.real), 
           prop.est = list(data.matrix(Est.prop.Xin$Est.prop.weighted), 
                           data.matrix(Est.prop.Xin$Est.prop.allgene)), 
           method.name = c('MuSiC', 'NNLS'))

prop.comp.fig = Prop_comp_multi(prop.real = data.matrix(XinT2D.construct.full$prop.real),
                                prop.est = list(data.matrix(Est.prop.Xin$Est.prop.weighted),
                                                data.matrix(Est.prop.Xin$Est.prop.allgene)),
                                method.name = c('MuSiC', 'NNLS'), 
                                title = 'Heatmap of Real and Est. Prop' )

abs.diff.fig = Abs_diff_multi(prop.real = data.matrix(XinT2D.construct.full$prop.real),
                              prop.est = list(data.matrix(Est.prop.Xin$Est.prop.weighted),
                                              data.matrix(Est.prop.Xin$Est.prop.allgene)),
                              method.name = c('MuSiC', 'NNLS'), 
                              title = 'Abs.Diff between Real and Est. Prop' )

plot_grid(prop.comp.fig, abs.diff.fig, labels = "auto", rel_widths = c(4,3))

圖片

從這個(gè)圖中我們可以看出基于MuSiC算出的各種細(xì)胞比例和實(shí)際細(xì)胞比例相似度很高，比非負(fù)矩陣的結(jié)果要好很多。在原文中作者還和 CIBERSORT (Newman et al. 2015) and bseq-sc ( Baron et al. 2016)都進(jìn)行了對比，感興趣的小伙伴可以去看看原文。

小結(jié)

雖然小編夸了MuSiC一路，但是還是要實(shí)事求是的看待這個(gè)軟件，作者上述的結(jié)果肯定都是選擇利于自身呈現(xiàn)軟件功能進(jìn)行挑選過的。在實(shí)際應(yīng)用過程中小編發(fā)現(xiàn)MuSiC有一些問題還需要理性對待，如它對很多比例很低的細(xì)胞群識別能力很差；而且出現(xiàn)要基于整個(gè)組織全部細(xì)胞進(jìn)行計(jì)算才比較準(zhǔn)確，無法自定義檢測某一些特殊的細(xì)胞亞群。但是好在這個(gè)軟件很輕便，我們可以結(jié)合實(shí)驗(yàn)或者其它算法來綜合使用。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

基于單細(xì)胞數(shù)據(jù)進(jìn)行Bulk定量之MuSiC

基于單細(xì)胞數(shù)據(jù)進(jìn)行Bulk定量之MuSiC

前言

主要流程

關(guān)聯(lián)分析

對小鼠bulk數(shù)據(jù)進(jìn)行分析

用已知細(xì)胞構(gòu)成對MuSiC定量結(jié)果進(jìn)行評估

小結(jié)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

基于單細(xì)胞數(shù)據(jù)進(jìn)行Bulk定量之MuSiC

前言

主要流程

關(guān)聯(lián)分析

對小鼠bulk數(shù)據(jù)進(jìn)行分析

用已知細(xì)胞構(gòu)成對MuSiC定量結(jié)果進(jìn)行評估

小結(jié)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av