一連試了幾天用集群的R構(gòu)建EVIDENCE,結(jié)果并不能讓我滿意,基本上一天只能跑一個(gè),再運(yùn)行第二個(gè)的時(shí)候就會(huì)報(bào)錯(cuò)。嘗試到現(xiàn)在就只剩下馬鈴薯的還未成功了,暫時(shí)先把問(wèn)題撂在這,我要去做ppt,下周二輪到我匯報(bào)。
R運(yùn)行的腳本如下:
library(org.Hs.eg.db)
swiss_id <- read.delim('/vol3/agis/zhoushaoqun_group/wangyantao/GO/swiss_go.stu1',header = F)
colnames(swiss_id) <- c('gene_id','GO')
ev_id <- select(org.Hs.eg.db,keys = as.vector(swiss_id$GO),columns = c('EVIDENCE'),keytype = "GO")
library(dplyr)
swiss_goev <- left_join(swiss_id,ev_id[,1:2])
write.csv(swiss_goev,'/vol3/agis/zhoushaoqun_group/wangyantao/GO/swiss_goev_stu.csv',row.names = F,quote = F)
運(yùn)行結(jié)果:

今天生信大神過(guò)來(lái)視察工作,我逮住機(jī)會(huì)問(wèn)了一下,大神要求看一下我的原始文件大小,我仔細(xì)看了一下,這馬鈴薯的swiss_go.stu1確實(shí)要比之前幾個(gè)物種的文件都要大上不少,于是乎,打開一個(gè),整整將近30萬(wàn)行,而其他的頂多也就15萬(wàn)行。看來(lái)找到報(bào)錯(cuò)的原因了,文件太大,集群都搞不定。
又于是乎,一拍腦袋,我把這個(gè)拆成兩個(gè)來(lái)搞不就好了嘛!說(shuō)干就干:
文件1:前半部分(150000行)
library(org.Hs.eg.db)
swiss_id <- read.delim('/vol3/agis/zhoushaoqun_group/wangyantao/GO/swiss_go.stu1',header = F)
colnames(swiss_id) <- c('gene_id','GO')
ev_id <- select(org.Hs.eg.db,keys = as.vector(swiss_id$GO),columns = c('EVIDENCE'),keytype = "GO")
library(dplyr)
swiss_goev <- left_join(swiss_id,ev_id[,1:2])
write.csv(swiss_goev,'/vol3/agis/zhoushaoqun_group/wangyantao/GO/swiss_goev_stu1.csv',row.names = F,quote = F)
后半部分(149999行)
library(org.Hs.eg.db)
swiss_id <- read.delim('/vol3/agis/zhoushaoqun_group/wangyantao/GO/swiss_go.stu2',header = F)
colnames(swiss_id) <- c('gene_id','GO')
ev_id <- select(org.Hs.eg.db,keys = as.vector(swiss_id$GO),columns = c('EVIDENCE'),keytype = "GO")
library(dplyr)
swiss_goev <- left_join(swiss_id,ev_id[,1:2])
write.csv(swiss_goev,'/vol3/agis/zhoushaoqun_group/wangyantao/GO/swiss_goev_stu2.csv',row.names = F,quote = F)
果然如我所料,兩個(gè)文件順利拿到,平均39個(gè)GB。下面就是要把得到的文件進(jìn)行合并:
合并文件(提交任務(wù))
merge_stu.sh
cat /vol3/agis/zhoushaoqun_group/wangyantao/GO/swiss_goev_stu1.csv /vol3/agis/zhoushaoqun_group/wangyantao/GO/swiss_goev_stu2.csv > /vol3/agis/zhoushaoqun_group/wangyantao/GO/swiss_goev_stu.csv
qsub -l mem=10G,nodes=1:ppn=4 /vol3/agis/zhoushaoqun_group/wangyantao/GO/merge_stu.sh
得到的文件竟然達(dá)到了77個(gè)GB:

還不知道這貨后面能不能讀取,不行的話可能還得接著拆。。。。