說在前面
相信大家在平時(shí)做富集分析時(shí)都會有這樣的一個(gè)需求:如果能知道感興趣的某條通路中各基因的調(diào)控關(guān)系,那么就能準(zhǔn)確識別出hub基因;或者說找到我們感興趣的基因在這條通路中的上下游調(diào)控關(guān)系,從而就可以進(jìn)行后續(xù)的實(shí)驗(yàn)驗(yàn)證。很多情況下只有想象中是完美的,但是只要感想就會有實(shí)現(xiàn)的機(jī)會,對于上面說的這個(gè)想法就在今年被實(shí)現(xiàn)了。
想必國內(nèi)的生信小伙伴都或多或少的聽聞過Y叔的大名,Y叔開發(fā)的一系列生信分析軟件可謂撐起了國內(nèi)生信圈的半邊天。而今天Immugent介紹的這個(gè)軟件也是最近由Y叔和京都大學(xué)的Yasushi Okuno一同開發(fā)的CBNplot,相應(yīng)的文章發(fā)表在Bioinformatics雜志上,篇名為 CBNplot: Bayesian network plots for enrichment analysis。

關(guān)于CBNplot的介紹,生信寶庫會以三篇推文并且以代碼實(shí)操的形式分別介紹其主要功能,下面開始介紹第一部分的用法。
代碼展示
首先我們先從GEO上下載一個(gè)示例數(shù)據(jù),算出差異基因后再做富集分析。
library(DESeq2)
## Load dataset and make metadata
counts = read.table("GSE133624_reads-count-all-sample.txt", header=1, row.names=1)
meta = sapply(colnames(counts), function (x) substring(x,1,1))
meta = data.frame(meta)
colnames(meta) = c("Condition")
dds <- DESeqDataSetFromMatrix(countData = counts,
colData = meta,
design= ~ Condition)
## Prefiltering
filt <- rowSums(counts(dds) < 10) > dim(meta)[1]*0.9
dds <- dds[!filt,]
## Perform DESeq2()
dds = DESeq(dds)
res = results(dds, pAdjustMethod = "bonferroni")
## apply variance stabilizing transformation
v = vst(dds, blind=FALSE)
vsted = assay(v)
## Plot PCA of VST values
DESeq2::plotPCA(v, intgroup="Condition")+
theme_bw()

## Define the input genes, and use clusterProfiler::bitr to convert the ID.
sig = subset(res, padj<0.05)
cand.entrez = clusterProfiler::bitr(rownames(sig), fromType="ENSEMBL", toType="ENTREZID", OrgDb=org.Hs.eg.db)$ENTREZID
## Perform enrichment analysis (ORA)
pway = ReactomePA::enrichPathway(gene = cand.entrez)
pwayGO = clusterProfiler::enrichGO(cand.entrez, ont = "BP", OrgDb = org.Hs.eg.db)
## Convert to SYMBOL
pway = setReadable(pway, OrgDb=org.Hs.eg.db)
pwayGO = setReadable(pwayGO, OrgDb=org.Hs.eg.db)
## Store the similarity
pway = enrichplot::pairwise_termsim(pway)
## Define including samples
incSample = rownames(subset(meta, Condition=="T"))
allEntrez = clusterProfiler::bitr(rownames(res), fromType="ENSEMBL", toType="ENTREZID", OrgDb=org.Hs.eg.db)
res$ENSEMBL <- rownames(res)
lfc <- merge(data.frame(res), allEntrez, by="ENSEMBL")
lfc <- lfc[order(lfc$log2FoldChange, decreasing=TRUE),]
geneList <- lfc$log2FoldChange
names(geneList) <- lfc$ENTREZID
pwayGSE <- ReactomePA::gsePathway(geneList)
sigpway <- subset(pway@result, p.adjust<0.05)
paste(mean(sigpway$Count), sd(sigpway$Count))
基于富集分析的結(jié)果我們就可以使用CBNplot對我們感興趣的通路進(jìn)行展示了。
barplot(pway, showCategory = 15)
#使用bngeneplot函數(shù)繪圖
bngeneplot(results = pway, exp = vsted, pathNum = 17)
#Change the label for the better readability.
bngeneplot(results = pway, exp = vsted, pathNum = 17, labelSize=7, shadowText=TRUE)
# Show the confidence of direction
bngeneplot(results = pway,
exp = vsted,
expSample = incSample,
pathNum = 13, R = 50, showDir = T,
convertSymbol = T,
expRow = "ENSEMBL",
strThresh = 0.7)



可以通過參數(shù)compareRef=TRUE并指定pathDb,可以將基因之間的關(guān)系與參考網(wǎng)絡(luò)進(jìn)行比較。默認(rèn)情況下,兩個(gè)有向網(wǎng)絡(luò)的交集以重疊邊的數(shù)量表示。
library(parallel)
cl = makeCluster(4)
bngeneplot(results = pway,
exp = vsted,
expSample = incSample,
pathNum = 13, R = 30, compareRef = T,
convertSymbol = T, pathDb = "reactome",
expRow = "ENSEMBL", cl = cl)
bngeneplot(results = pway,
exp = vsted,
expSample = incSample,
pathNum = 15, R = 10, compareRef = T,
convertSymbol = T, pathDb = "reactome", compareRefType = "difference",
expRow = "ENSEMBL")


還可以添加一個(gè)barplot來描述邊緣的強(qiáng)度和方向(概率),指定strength plot =TRUE和nStrength。
bngeneplot(results = pway,
exp = vsted,
expSample = incSample,
pathNum = 15, R = 10, compareRef = T,
convertSymbol = T, pathDb = "reactome", compareRefType = "intersection",
expRow = "ENSEMBL", sizeDep = T, dep = dep, strengthPlot = T, nStrength = 10)
cl = makeCluster(8)
bngeneplot(results = pway,
exp = vsted,
expSample = incSample,
pathNum = c(15, 16), R = 10,
convertSymbol = T,
expRow = "ENSEMBL")


展望
在本期推文中,小編從GEO數(shù)據(jù)庫上下載了示例數(shù)據(jù)病,并后續(xù)進(jìn)行了差異分析和富集分析,隨后演示了如何利用CBNplot來展示感興趣通路中的基因之間的調(diào)控關(guān)系。但是這種調(diào)控關(guān)系只是CBNplot基于基因在各樣本之間的表達(dá)水平進(jìn)行的預(yù)測,并不能代表實(shí)際存在的調(diào)控關(guān)系。在實(shí)際應(yīng)用中,還需要根據(jù)CHIPseq,ATAC等實(shí)驗(yàn)數(shù)據(jù)進(jìn)一步證實(shí)某兩個(gè)基因之間有之間的相互作用。無論如何,預(yù)測的結(jié)果可能不是很完美但總歸比沒有好;基于此,我們還可以根據(jù)相關(guān)的生物學(xué)知識和文獻(xiàn)檢索先建立幾個(gè)假說,最后再使用實(shí)驗(yàn)進(jìn)行驗(yàn)證,
好啦,本期推文到這就結(jié)束啦,在下期的推文中,Immugent將會介紹如何使用CBNplot在通路水平進(jìn)行展示。