快捷查找KEGG里的通路和基因

需求

1.快捷查找ID對(duì)應(yīng)的description,知道通路對(duì)應(yīng)的編號(hào)是多少。
2.找出某一個(gè)/幾個(gè)通路里的全部基因,用來(lái)做單獨(dú)的下游分析。

如果是要做KEGG的富集分析,clusterProfiler可以搞定:https://bioconductor.org/packages/release/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html

想看kegg通路圖的話(huà),用R包pathview來(lái)看,看函數(shù)的幫助文檔就行。

1.找通路ID與description的對(duì)應(yīng)關(guān)系

1.1網(wǎng)站搜索

不批量找的話(huà),直接網(wǎng)站搜最簡(jiǎn)單 https://www.genome.jp/kegg/kegg2.html

1.2.借助msigdbr

需要找全部的對(duì)應(yīng)關(guān)系,基于前面講的msigdbr可以完成:http://www.itdecent.cn/p/0098baf2df46

msigdb里面本來(lái)就包括了kegg,而且挺齊全的,ID,description,基因,全都有啦。

library(msigdbr)
KEGG_df = msigdbr(species = "Homo sapiens",category = "C2",subcategory = "CP:KEGG") %>% 
  dplyr::select(gs_exact_source,gene_symbol,gs_description)
head(KEGG_df)
## # A tibble: 6 x 3
##   gs_exact_source gene_symbol gs_description  
##   <chr>           <chr>       <chr>           
## 1 hsa02010        ABCA1       ABC transporters
## 2 hsa02010        ABCA10      ABC transporters
## 3 hsa02010        ABCA12      ABC transporters
## 4 hsa02010        ABCA13      ABC transporters
## 5 hsa02010        ABCA2       ABC transporters
## 6 hsa02010        ABCA3       ABC transporters
kegg1 = split(KEGG_df$gene_symbol,KEGG_df$gs_exact_source)
lapply(kegg1[1:6],head)
## $hsa00010
## [1] "ACSS1" "ACSS2" "ADH1A" "ADH1B" "ADH1C" "ADH4" 
## 
## $hsa00020
## [1] "ACLY" "ACO1" "ACO2" "CS"   "DLAT" "DLD" 
## 
## $hsa00030
## [1] "ALDOA" "ALDOB" "ALDOC" "DERA"  "FBP1"  "FBP2" 
## 
## $hsa00040
## [1] "AKR1B1" "CRYL1"  "DCXR"   "DHDH"   "GUSB"   "RPE"   
## 
## $hsa00051
## [1] "AKR1B1"  "AKR1B10" "ALDOA"   "ALDOB"   "ALDOC"   "FBP1"   
## 
## $hsa00052
## [1] "AKR1B1"  "B4GALT1" "B4GALT2" "G6PC"    "G6PC2"   "GAA"

2.通路ID與基因之間的對(duì)應(yīng)關(guān)系

在org.Hs.eg.db包里有:

library(clusterProfiler)
library(org.Hs.eg.db)
kegg <- org.Hs.egPATH2EG
mapped <- mappedkeys(kegg)
kegg2 <- as.list(kegg[mapped])
lapply(kegg2[1:6],head)
## $`04610`
## [1] "2"   "462" "623" "624" "629" "710"
## 
## $`00232`
## [1] "9"    "10"   "1544" "1548" "1549" "1553"
## 
## $`00983`
## [1] "9"    "10"   "978"  "1066" "1548" "1549"
## 
## $`01100`
## [1] "9"  "10" "15" "18" "28" "30"
## 
## $`00380`
## [1] "15"  "26"  "38"  "39"  "217" "219"
## 
## $`00970`
## [1] "16"   "833"  "1615" "2058" "2193" "2617"

看起來(lái)像一堆密碼?這個(gè)列表,名字是通路的id,只是省略了hsa,內(nèi)容是基因的entrizid。

舉個(gè)栗子,提取hsa03030里的基因,并且轉(zhuǎn)換成symbol。

genes = unlist(kegg2["03030"])
length(genes)
## [1] 36
#想讓他變成symbol直接bitr即可
genes = bitr(genes,
             fromType = "ENTREZID",
             toType = "SYMBOL",
             OrgDb = "org.Hs.eg.db")$SYMBOL
head(genes)
## [1] "DNA2" "FEN1" "LIG1" "MCM2" "MCM3" "MCM4"
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容