一定要夸夸R,可真是太棒棒啦!以下操作均基于R.
首先安裝biomaRt:BiocManager::install('biomaRt')
加載:library(biomaRt)
主要利用ensembl的gene id(ENSG)作為中間轉換,SNP-ensembl gene id-gene name
需要用到兩個庫:hsapiens_snp和hsapiens_gene_ensembl
但是需要的結果的稱呼再這兩個庫中有所不同,可以用listAttributes(dbsnp)函數(shù)列出
這是我的語句:SCdbsnp =
listAttributes(dbsnp)
write.table(SCdbsnp,file="listAttributessnp.csv",sep="\t",
col.names=T,row.names=T, append = F, quote=FALSE)
SC = listAttributes(mart)
write.table(SC,file="listAttributes.csv",sep="\t",
col.names=T,row.names=T, append = F,quote=FALSE)
確定好自己想要哪些信息就可以開動啦?。?!
以下是我自己的例子:
dbsnp <-useMart("ENSEMBL_MART_SNP",
dataset = "hsapiens_snp")#將這個庫命名為dnsnp
snps<-read.csv("EPUrs.csv",header=T,sep=",")[,1]#加載我的csv格式的文件(第一列是rs,取第一列,命名為snps
getsnps
=getBM(attributes=c("refsnp_id","ensembl_gene_stable_id","consequence_type_tv","study_type"
,"study_external_ref","study_description","associated_gene","phenotype_name",
"phenotype_description","doi"),filters
= "snp_filter", values = snps, mart = dbsnp)#利用getBM檢索函數(shù)輸出rs號,gene id 等等,命名為getsnps
write.table(getsnps,file="EPUsnpstd.csv",sep="\t",
col.names=T, row.names=T,append = F, quote=FALSE)#將得到的結果寫入cvs格式文件命名為EPUsnpstd.csv
以上會得到很多snp相關信息,接著利用gene id得到gene信息,基本操作同上
!??!注意,得到了csv格式的結果后需要分列保存在繼續(xù)下一步?。?!
mart <-useMart("ensembl",
"hsapiens_gene_ensembl")
genes<-read.csv("EPUsnpstd.csv",header=T,sep=",")[,3]#此時geng id在第三列,所以讀取第三列。
getgenes =
getBM(attributes=c("ensembl_gene_id","external_gene_name","description","gene_biotype",
"study_external_id","source","external_synonym","phenotype_description"),filters
= "ensembl_gene_id",values = genes, mart = mart)#注意兩個庫中一些稱呼的命名不一致,所以需要仔細看listAttributes(dbsnp)和listAttributes(mart).尤其是兩個庫ensembl gene id 稱呼不一致,一個是ensembl_gene_stable_id,一個是ensembl_gene_id。
write.table(getgenes, file="EPUgenetd.csv",sep="\t",
col.names=T, row.names=T, append = F, quote=FALSE)
以下是我的部分結果

