topGO:大概是致力于GO分析方法論的R包

Gene set enrichment analysis with topGO

Alexa A, Rahnenfuhrer J (2019). topGO: Enrichment Analysis for Gene Ontology. R package version 2.36.0.

DOI: 10.18129/B9.bioc.topGO

1. 關(guān)于 topGO & 安裝三連

topGO 的設(shè)計(jì)旨在實(shí)現(xiàn) GO term 的半自動(dòng)富集分析。一個(gè)主要優(yōu)點(diǎn)是它提供的統(tǒng)一化的基因集檢測(cè)方案 (the unified gene set testing framework) 。接下來(lái)是對(duì)這個(gè)主要優(yōu)點(diǎn)的展開(kāi),Besides, also 句型——輕松地應(yīng)用函數(shù)完成GO富集分析、容易地執(zhí)行新的統(tǒng)計(jì)學(xué)檢測(cè)或新算法、對(duì)比不同的GO富集研究方法。

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("topGO")
library(topGO)
ls("package:topGO")
# [1] "algorithm"           "algorithm<-"        
# [3] "allGenes"            "allMembers"         
# [5] "allMembers<-"        "allParents"         
# [7] "allScore"            "annFUN"             
# [9] "annFUN.db"           "annFUN.file"        
# [11] "annFUN.gene2GO"      "annFUN.GO2genes"    
# [13] "annFUN.org"          "attrInTerm"         
# [15] "buildLevels"         "combineResults"     
# [17] "contTable"           "countGenesInTerm"   
# [19] "cutOff"              "cutOff<-"           
# [21] "depth"               "depth<-"            
# [23] "description"         "description<-"      
# [25] "elim"                "elim<-"             
# [27] "expressionMatrix"    "feasible"           
# [29] "feasible<-"          "geneData"           
# [31] "geneData<-"          "genes"              
# [33] "geneScore"           "geneSelectionFun"   
# [35] "geneSelectionFun<-"  "genesInTerm"        
# [37] "GenTable"            "getGraphRoot"       
# [39] "getNoOfLevels"       "getPvalues"         
# [41] "getSigGroups"        "getSigRatio"        
# [43] "GOBPTerm"            "GOCCTerm"           
# [45] "GOFisherTest"        "GOglobalTest"       
# [47] "GOKSTest"            "GOKSTiesTest"       
# [49] "GOMFTerm"            "GOplot"             
# [51] "GOSumTest"           "GOtTest"            
# [53] "graph"               "graph<-"            
# [55] "groupGOTerms"        "inducedGraph"       
# [57] "initialize"          "inverseList"        
# [59] "joinFun"             "members"            
# [61] "members<-"           "membersExpr"        
# [63] "membersScore"        "Name"               
# [65] "Name<-"              "nodesInInducedGraph"
# [67] "numAllMembers"       "numGenes"           
# [69] "numMembers"          "numSigAll"          
# [71] "numSigGenes"         "numSigMembers"      
# [73] "ontology"            "ontology<-"         
# [75] "penalise"            "permSumStats"       
# [77] "permSumStats.all"    "phenotype"          
# [79] "print"               "printGenes"         
# [81] "printGraph"          "pType"              
# [83] "pType<-"             "rankMembers"        
# [85] "readMappings"        "reverseArch"        
# [87] "runTest"             "score"              
# [89] "score<-"             "scoreOrder"         
# [91] "scoresInTerm"        "show"               
# [93] "showGroupDensity"    "showSigOfNodes"     
# [95] "sigAllMembers"       "sigGenes"           
# [97] "sigMembers"          "sigMembers<-"       
# [99] "sigRatio<-"          "termStat"           
# [101] "testName"            "testName<-"         
# [103] "testStatistic"       "testStatPar"        
# [105] "updateGenes"         "updateGroup"        
# [107] "updateTerm<-"        "usedGO"             
# [109] "Weights"             "Weights<-"          
# [111] "whichAlgorithms"     "whichTests"   

2. 數(shù)據(jù)準(zhǔn)備

調(diào)用 ALL 包的內(nèi)置數(shù)據(jù)。

library(ALL)
data("ALL")
data(geneList) ## geneList, a list of genes and the respective p-values for differential expression
affyLib <- paste(annotation(ALL), "db", sep = ".") 
library(package = affyLib, character.only = TRUE)
# 載入需要的程輯包:org.Hs.eg.db
## 載入多個(gè)包時(shí),加上 character.only = TRUE 參數(shù),可以確保正確載入

函數(shù) topDiffGenes() 可以在 p-values=0.01 的顯著水平上篩選 geneList 中的差異表達(dá)基因。

sum(topDiffGenes(geneList)) 
# [1] 50

萬(wàn)事俱備,開(kāi)始 構(gòu)建 topGOdata ,這個(gè)對(duì)象包含全部的基因指標(biāo) (gene identifiers) 和其數(shù)值、GO注釋、GO 層次結(jié)構(gòu) (GO hierarchical structure) 以及其他用于富集分析的信息。

sampleGOdata <- new("topGOdata",
                    description = "Simple session",
                    ontology = "BP",
                    allGenes = geneList,
                    geneSel = topDiffGenes, ## 設(shè)定篩選函數(shù)
                    nodeSize = 10, ## 刪去少于10個(gè)注釋基因的GO term
                    annot = annFUN.db, ## maps genes identifiers to GO terms from the affyLib object
                    affyLib = affyLib)
sampleGOdata
# 
# ------------------------- topGOdata object -------------------------
# 
#  Description:
#    -  Simple session 
# 
#  Ontology:
#    -  BP 
# 
#  323 available genes (all genes from the array):
#    - symbol:  1095_s_at 1130_at 1196_at 1329_s_at 1340_s_at  ...
#    - score :  1 1 0.62238 0.541224 1  ...
#    - 50  significant genes. 
# 
#  309 feasible genes (genes that can be used in the analysis):
#    - symbol:  1095_s_at 1130_at 1196_at 1329_s_at 1340_s_at  ...
#    - score :  1 1 0.62238 0.541224 1  ...
#    - 45  significant genes. 
# 
#  GO graph (nodes with at least  10  genes):
#    - a graph with directed edges
#    - number of nodes = 1072 
#    - number of edges = 2319 
# 
# ------------------------- topGOdata object -------------------------

3. 富集檢測(cè)

  • 用到的兩種統(tǒng)計(jì)學(xué)方法:Fisher's exact test, Kolmogorov-Smirnov like test.
  • 用到的函數(shù):runTest(),可將特定算法和T檢驗(yàn)方法應(yīng)用于輸入的對(duì)象,并返回一個(gè) topGOresult 對(duì)象。

3.1 經(jīng)典 over-representation GO富集分析

resultFisher <- runTest(sampleGOdata, algorithm = "classic", statistic = "fisher") 
# 
#            -- Classic Algorithm -- 
# 
#        the algorithm is scoring 964 nontrivial nodes
#        parameters: 
#            test statistic: fisher
resultFisher
# 
# Description: Simple session 
# Ontology: BP 
# 'classic' algorithm with the 'fisher' test
# 1072 GO terms scored: 45 terms with p < 0.01
# Annotation data:
#     Annotated genes: 309 
#     Significant genes: 45 
#     Min. no. of genes annotated to a GO: 10 
#     Nontrivial nodes: 964 
class(resultFisher)
# [1] "topGOresult"
# attr(,"package")
# [1] "topGO"

3.2 利用 Kolmogorov-Smirnov like test 的富集分析

兩種方法:classic, elim

'elim' : this method processes the GO terms by traversing the GO hierarchy from bottom to top.

'classic': each GO term is tested independently, not taking the GO hierarchy into account.

resultKS.classic <- runTest(sampleGOdata, algorithm = "classic", statistic = "ks") 
# 
#            -- Classic Algorithm -- 
# 
#        the algorithm is scoring 1072 nontrivial nodes
#        parameters: 
#            test statistic: ks
#            score order: increasing
resultKS.classic
# 
# Description: Simple session 
# Ontology: BP 
# 'classic' algorithm with the 'ks' test
# 1072 GO terms scored: 83 terms with p < 0.01
# Annotation data:
#     Annotated genes: 309 
#     Significant genes: 45 
#     Min. no. of genes annotated to a GO: 10 
#     Nontrivial nodes: 1072 
resultKS.elim <- runTest(sampleGOdata, algorithm = "elim", statistic = "ks")
# 
#            -- Elim Algorithm -- 
# 
#        the algorithm is scoring 1072 nontrivial nodes
#        parameters: 
#            test statistic: ks
#            cutOff: 0.01
#            score order: increasing
resultKS.elim
# 
# Description: Simple session 
# Ontology: BP 
# 'elim' algorithm with the 'ks : 0.01' test
# 1072 GO terms scored: 24 terms with p < 0.01
# Annotation data:
#     Annotated genes: 309 
#     Significant genes: 45 
#     Min. no. of genes annotated to a GO: 10 
#     Nontrivial nodes: 1072 

ps. 查看所有允許的算法和T檢驗(yàn)方法

whichAlgorithms() 
# [1] "classic"     "elim"        "weight"      "weight01"   
# [5] "lea"         "parentchild"
whichTests() 
# [1] "fisher"     "ks"         "t"          "globaltest"
# [5] "sum"        "ks.ties"   

4. 結(jié)果分析

4.1 得到富集顯著 GO term 的 data frame

函數(shù) GenTable() 可用于分析富集最為顯著的 GO term 和相對(duì)應(yīng)的p值。

allRes <- GenTable(sampleGOdata,classicFisher = resultFisher,
                   classicKS = resultKS.classic,
                   elimKS = resultKS.elim,
                   orderBy = "elimKS",ranksOf = "classicFisher",
                   topNodes = 10)

4.2 不同算法所得p值的對(duì)比及可視化

函數(shù) score() 可以得到 topGOresult 對(duì)象中 GO term 的p值。

pValue.classic <- score(resultKS.classic) 
pValue.elim <- score(resultKS.elim)[names(pValue.classic)] 
gstat <- termStat(sampleGOdata, names(pValue.classic))  ## to summarise the statistics
colMap <- function(x) {
  .col <- rep(rev(heat.colors(length(unique(x)))), time = table(x))
  return(.col[match(1:length(x), order(x))])
}
gCol <- colMap(gstat$Significant)
plot(pValue.classic, pValue.elim, 
     xlab = "p-value classic", 
     ylab = "p-value elim", 
     pch = 19, cex = gSize, col = gCol)

plot(log10(pValue.classic), log10(pValue.elim), 
     xlab = "lg(p-value) classic", 
     ylab = "lg(p-value) elim", 
     pch = 19, cex = gSize, col = gCol)

可以看出兩種方法得到的p值確實(shí)是有差異的,所以顯然存在一些被A方法認(rèn)定顯著——而在B方法下不顯著的 GO term. 可以通過(guò)下面的方法找到它們:

sel.go <- names(pValue.classic)[pValue.elim < pValue.classic] 
cbind(termStat(sampleGOdata, sel.go), 
      elim = pValue.elim[sel.go],
      classic = pValue.classic[sel.go])
#            Annotated Significant Expected       elim    classic
# GO:0006807       224          36    32.62 0.44604086 0.53079040
# GO:0019538       164          23    23.88 0.37042531 0.47406305
# GO:0043170       206          35    30.00 0.05327402 0.06576399
# GO:1901564       189          26    27.52 0.73230230 0.90604207

4.3 GO Graph

方形為顯著性前五的 GO term.

showSigOfNodes(sampleGOdata, score(resultKS.elim), 
               firstSigNodes = 5, useInfo = 'all')

5. 其他的可視化技能

5.1 p值的直方圖

hist(pValue.classic, 50, xlab = "p-values_KSclassic")

5.2 分析單個(gè)GO term

函數(shù) showGroupDensity() 可以解釋顯著富集到某個(gè)GO term的基因是否比其他基因有更高的分值。

goID <- allRes[1, "GO.ID"]
showGroupDensity(sampleGOdata, goID, ranks = TRUE)

References

悄咪咪吐槽一下作者

最后,向大家隆重推薦生信技能樹(shù)的一系列干貨!

  1. 生信技能樹(shù)全球公益巡講:https://mp.weixin.qq.com/s/E9ykuIbc-2Ja9HOY0bn_6g
  2. B站公益74小時(shí)生信工程師教學(xué)視頻合輯:https://mp.weixin.qq.com/s/IyFK7l_WBAiUgqQi8O7Hxw
  3. 招學(xué)徒:https://mp.weixin.qq.com/s/KgbilzXnFjbKKunuw7NVfw
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容