下載NCBI的數(shù)據(jù)
下載網(wǎng)址如下:ftp://ftp.ncbi.nih.gov/pub/HomoloGene/
下載最新的homologene.data數(shù)據(jù)
讀取數(shù)據(jù),我放到excel轉(zhuǎn)換為csv格式后進(jìn)行讀取
rt<-read.csv("~/CFJiang/Annotation/homologene.csv",header = F)
觀察數(shù)據(jù)
rt[1:20,1:6]
V1 V2 V3 V4 V5 V6
1 3 9606 34 ACADM 4557231 NP_000007.1
2 3 9598 469356 ACADM 160961497 NP_001104286.1
3 3 9544 705168 ACADM 109008502 XP_001101274.1
4 3 9615 490207 ACADM 545503811 XP_005622188.1
5 3 9913 505968 ACADM 115497690 NP_001068703.1
6 3 10090 11364 Acadm 6680618 NP_031408.1
7 3 10116 24158 Acadm 292494885 NP_058682.2
8 3 7955 406283 acadm 390190229 NP_998175.2
9 3 7227 38864 CG12262 24660351 NP_648149.1
10 3 7165 1276346 AgaP_AGAP005662 58387602 XP_315683.2
11 3 6239 173979 acdh-8 17534899 NP_495142.1
12 3 6239 181758 acdh-7 17570075 NP_510789.1
13 3 8364 100494748 acadm 512837304 XP_002936129.2
14 5 9606 37 ACADVL 4557235 NP_000009.1
15 5 9598 455237 ACADVL 332847152 XP_003315394.1
16 5 9615 489463 ACADVL 345800108 XP_546581.3
17 5 9913 282130 ACADVL 27806205 NP_776919.1
18 5 10090 11370 Acadvl 23956084 NP_059062.1
19 5 10116 25363 Acadvl 6978435 NP_037023.1
20 5 7955 573723 acadvl 47086807 NP_997776.1
可以看到第一列(V1)可以歸納為一個(gè)簇,即不同物種的該基因理論是同源的。V2為種屬的編號(hào),如human為9906,鼠為10090。V3 為基因的NCBI ID,V4為NCBI的基因名,V5為GI編號(hào),GI編號(hào)具體參考這個(gè)網(wǎng)址。V6為refseq編號(hào),編號(hào)含義可參考這個(gè)網(wǎng)址。
分別提取鼠與人的信息
human<-rt[rt[,2]==9606,]
mouse<-rt[rt[,2]==10090,]
write.csv(human,"human_id.csv")
write.csv(mouse,"mouse_id.csv")
所以說根據(jù)上述信息可以分別查看mouse和human對(duì)應(yīng)的ID信息。
從jax實(shí)驗(yàn)室MGI數(shù)據(jù)庫下載
我使用的是HMD_HumanPhenotype.rpt文件
suppressMessages(library(tidyverse))
human_mouse_id<-read.table("HMD_HumanPhenotype.rpt",sep = "\t")
human_mouse_id<-human_mouse_id[,c(1,5,4)]
colnames(human_mouse_id)<-c("Human_Symbol","Mouse_Symbol","Match_or_not")
human_mouse_id<- human_mouse_id %>% filter(Match_or_not=="yes")
human_mouse_id<-human_mouse_id[!duplicated(human_mouse_id$Mouse_Symbol),]
row.names(human_mouse_id)<-as.character(human_mouse_id$Mouse_Symbol)
write.csv(human_mouse_id,"~/Desktop/GoogleDrive/Annotation/human_mouse_homologene/HMD_HumanPhenotype.csv")