R語(yǔ)言里做基因組共線性可視化R包~GENESPACE

論文

GENESPACE: syntenic pan-genome annotations for eukaryotes

https://www.biorxiv.org/content/10.1101/2022.03.09.483468v1

還沒有發(fā)表

github主頁(yè)

https://github.com/jtlovell/GENESPACE

詳細(xì)介紹

https://htmlpreview.github.io/?https://github.com/jtlovell/GENESPACE/blob/master/doc/genespaceOverview.html

windows系統(tǒng)還不能用 只能在MacOS或者在Linux系統(tǒng)下使用,我試試在linux下使用

首先安裝orthofinder

conda install -c bioconda orthofinder 

安裝MCScanX

https://github.com/wyp1125/MCScanX

git clone https://github.com/wyp1125/MCScanX.git
cd MCScanX
make
image.png

這里出現(xiàn)了三個(gè)error,但是也出現(xiàn)了三個(gè)可執(zhí)行程序,試了一下可以運(yùn)行,不知道后面會(huì)不會(huì)有影響

image.png

安裝依賴的R包

conda install r-data.table r-dbscan r-R.utils r-devtools
conda install bioconductor-Biostrings bioconductor-rtracklayer

安裝GENESPAE

# 啟動(dòng)R radian
devtools::install_github("jtlovell/GENESPACE", upgrade = F)

運(yùn)行示例數(shù)據(jù)

library(GENESPACE)
runwd<-file.path("./testGenespace/")
make_exampleDataDir(writeDir = runwd) ## 這一步會(huì)下載示例數(shù)據(jù)

gids<-c("human","chimp","rhesus") 
gpar<-init_genespace(genomeIDs = gids,speciesIDs = gids,versionIDs = gids,ploidy = rep(1,3),wd = runwd,gffString = "gff",pepString = "pep",path2orthofinder = "orthofinder",path2mcscanx = "/home/myan/scratch/apps/mingyan/Biotools/MCScanX",path2diamond = "diamond",diamondMode = "fast",orthofinderMethod = "fast",rawGenomeDir = file.path(runwd,"rawGenomes")) 

parse_annotations(gsParam = gpar,gffEntryType = "gene",gffIdColumn ="locus",gffStripText = "locus=",headerEntryIndex = 1,headerSep = " ",headerStripText = "locus=") 
# 上面這行代碼沒有看懂是在干啥

gpar<-run_orthofinder(gsParam = gpar)  

## 運(yùn)行這行代碼出現(xiàn)警告信息
Warning message:
In system2(gsParam$paths$orthofinderCall, com, stdout = TRUE, stderr = TRUE) :
  running command ''orthofinder' -b ./testGenespace//orthofinder -t 4 -a 1 -X -og 2>&1' had status 120 and error message 'Interrupted system call'
## 不知道時(shí)候?qū)罄m(xù)有影響 有可能是 runwd<-file.path("./testGenespace/") 這行代碼最后多了一個(gè)斜線 重新運(yùn)行了一遍沒有問題了

gpar<-synteny(gsParam = gpar)

## 畫圖展示

pdf(file="abc.pdf",width = 10,height = 8)
plot_riparianHits(gpar)
dev.off()
image.png

畫圖更多的參數(shù)

pdf(file="abc.pdf",width = 9.6,height = 4)
plot_riparianHits(gpar, refGenome = "chimp",invertTheseChrs = data.frame(genome = "rhesus", chr = 2),genomeIDs = c("chimp", "human", "rhesus"),labelTheseGenomes = c("chimp", "rhesus"),gapProp = .001,refChrCols = c("#BC4F43", "#F67243"),blackBg = FALSE,returnSourceData = T, verbose = F)
dev.off()
image.png

還可以自定義感興趣的區(qū)域

regs <- data.frame(genome = c("human", "human", "chimp", "rhesus"),chr = c(3, 3, 4, 5),start = c(0, 50e6, 0, 60e6),end = c(10e6, 70e6, 50e6, 90e6),cols = c("pink", "gold", "cyan", "dodgerblue"))
 pdf(file = "abc2.pdf",width = 9.6,height = 4)
plot_riparianHits(gpar, onlyTheseRegions = regs,blackBg = FALSE)
dev.off()

image.png

構(gòu)建泛基因組組

pg <- pangenome(gpar)

輸出一個(gè)文件 results/human_pangenomeDB.txt.gz

打開這個(gè)文件,部分結(jié)果如下

image.png

這個(gè)結(jié)果怎么看暫時(shí)沒看懂

幫助文檔里寫道

This is the source data that can be manipulated programatically to extract your regions of interest. Future GENESPACE releases will have auxilary functions that let the user access the pan-genome by rules (e.g. contains these genes, in these regions etc.). For now, we’ll leave this work to scripting by the user.

接下來(lái)就是研究研究如何準(zhǔn)備自己的數(shù)據(jù)

歡迎大家關(guān)注我的公眾號(hào)

小明的數(shù)據(jù)分析筆記本

小明的數(shù)據(jù)分析筆記本 公眾號(hào) 主要分享:1、R語(yǔ)言和python做數(shù)據(jù)分析和數(shù)據(jù)可視化的簡(jiǎn)單小例子;2、園藝植物相關(guān)轉(zhuǎn)錄組學(xué)、基因組學(xué)、群體遺傳學(xué)文獻(xiàn)閱讀筆記;3、生物信息學(xué)入門學(xué)習(xí)資料及自己的學(xué)習(xí)筆記!

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容