構(gòu)建自己物種的orgDb

1.首先安裝eggnog-mapper軟件

注釋所需要的物種數(shù)據(jù)庫網(wǎng)址如下,同時也可以用里面的腳本download_eggnog_data.py下載你所需要的數(shù)據(jù)庫:

http://eggnogdb.embl.de/download/
python download_eggnog_data.py euk 下載euk數(shù)據(jù)庫

eggnog-mapper有兩種比對方式(直接調(diào)用emapper.py腳本即可):

  1. 基于hmmer的比對:建議序列少于1000條
$python /data1/spider/ytbiosoft/soft/eggnog-mapper-1.0.3/emapper.py -m hmmer -i test.fasta -d euk -o test_euk(輸出文件前綴)

  1. 基于diamond的比對:序列大于1000條(不需要指定數(shù)據(jù)庫)
$python /data1/spider/ytbiosoft/soft/eggnog-mapper-1.0.3/emapper.py -m diamond -i 你的物種所有蛋白序列 -o sesame(輸出文件前綴)

2. 對生成的文件修改

結(jié)果會生成一個sesame.emapper.annotations的文件。查看文件會發(fā)現(xiàn)有許多以#開頭的行,要刪掉這些沒用的行。注意別刪掉表頭。


sesame.emapper.annotations

所以需要刪掉#開頭的行以及表頭的#,但不要刪表頭

$sed -i 's/#//' sesame.emapper.annotations  -i就在源文件修改 s替換 /空字符

此時的sesame.emapper.annotations就可以拿來構(gòu)建orgDb了。

3. 根據(jù)eggnog-mapper注釋結(jié)果構(gòu)建orgDb

  1. 安裝R包
library(tidyverse)
library(stringr)
library(KEGGREST)
library(AnnotationForge)

除了KEGGREST以外的三個都可以用install.packages()安裝

>if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")


>BiocManager::install("KEGGREST")

安裝好之后就可以構(gòu)建自己的orgDb了

  1. 構(gòu)建orgDb
library(tidyverse)
library(stringr)
library(KEGGREST)
library(AnnotationForge)


#' Title
#'
#' @param f_emapper_anno eggnog-mapper annotation result
#' @param author Who is the creator of this package? like "xxx <xxx@xxx.xx>"
#' @param tax_id The Taxonomy ID that represents your organism. (NCBI has a nice online browser for finding the one you need)
#' @param genus Single string indicating the genus
#' @param species Single string indicating the species
#'
#' @return OrgDb name
#' @export
#'
#' @examples
makeOrgPackageFromEmapper <- function(f_emapper_anno,
                                      author,
                                      tax_id = "0",
                                      genus = "default",
                                      species = "default") {
  
  # read emapper result
  emapper <- read_delim(f_emapper_anno,
                        "\t", escape_double = FALSE, trim_ws = TRUE)
  
  # extract gene name from emapper
  gene_info <- emapper %>%
    dplyr::select(GID = query_name, GENENAME = `eggNOG annot`) %>%
    na.omit()
  
  # extract go annotation from emapper
  gos <- emapper %>%
    dplyr::select(query_name, GO_terms) %>%
    na.omit()
  
  gene2go = data.frame(GID = character(),
                       GO = character(),
                       EVIDENCE = character())
  
  for (row in 1:nrow(gos)) {
    the_gid <- gos[row, "query_name"][[1]]
    the_gos <- str_split(gos[row,"GO_terms"], ",", simplify = FALSE)[[1]]
    
    df_temp <- data_frame(GID = rep(the_gid, length(the_gos)),
                          GO = the_gos,
                          EVIDENCE = rep("IEA", length(the_gos)))
    gene2go <- rbind(gene2go, df_temp)
  }
  
  # extract kegg pathway annotation from emapper
  gene2ko <- emapper %>%
    dplyr::select(GID = query_name, Ko = KEGG_KOs) %>%
    na.omit()
  
  load(file = "kegg_info.RData")
  gene2pathway <- gene2ko %>% left_join(ko2pathway, by = "Ko") %>%
    dplyr::select(GID, Pathway) %>%
    na.omit()


  # make OrgDb
  makeOrgPackage(gene_info=gene_info,
                 go=gene2go,
                 ko=gene2ko,
                 pathway=gene2pathway,
                 # gene2pathway=gene2pathway,
                 version="0.0.2",
                 maintainer=author,
                 author=author,
                 outputDir = ".",
                 tax_id=tax_id,
                 genus=genus,
                 species=species,
                 goTable="go")
  
  my_orgdb <- str_c("org.", str_to_upper(str_sub(genus, 1, 1)) , species, ".eg.db", sep = "")
  return(my_orgdb)
}


my_orgdb <- makeOrgPackageFromEmapper("input/sesame.emapper.annotations",
                                      "zhangxudong <zhangxudong@genek.tv>",
                                      tax_id = "4182",
                                      genus = "Sesamum",
                                      species = "indicum")

跑完代碼就會生成一個org.Sindicum.eg.db的文件夾。此時就可以在Rstiduo里面安裝這個包了。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容