突然間發(fā)現(xiàn)大名鼎鼎的R包limma居然有一個函數(shù)是alias2Symbol,我看了看它的描述,其功能是Convert Gene Aliases to Official Gene Symbols
雖然我還沒有使用過它,但是卻一直期盼著這樣的功能,它 包括以下3個函數(shù):
alias2Symbol(alias, species = "Hs", expand.symbols = FALSE)
alias2SymbolTable(alias, species = "Hs")
alias2SymbolUsingNCBI(alias, gene.info.file,
required.columns = c("GeneID","Symbol","description"))
Details
Aliases are mapped via NCBI Entrez Gene identity numbers using Bioconductor organism packages.
alias2Symbolmaps a set of aliases to a set of symbols, without necessarily preserving order. The output vector may be longer or shorter than the original vector, because some aliases might not be found and some aliases may map to more than one symbol.
alias2SymbolTablereturns of vector of the same length as the vector of aliases. If an alias maps to more than one symbol, then the one with the lowest Entrez ID number is returned. If an alias can't be mapped, thenNAis returned.
給出來的示例是:
alias2Symbol(c("PUMA","NOXA","BIM"), species="Hs")
alias2Symbol("RS1", expand=TRUE)
確實是非常實用,尤其是如果一個生信工程師跟濕實驗科學(xué)家合作的時候,濕實驗科學(xué)家很喜歡給出自認(rèn)為很正常的基因名字,比如 PD1 和 PDL1 ,然后我們就需要轉(zhuǎn)換它們,如下所示:
> alias2Symbol("PD1", expand=TRUE)
[1] "PDCD1" "SNCA" "SPATA2"
> alias2Symbol("PDL1", expand=TRUE)
[1] "CD274"
當(dāng)然了,它不僅僅是支持人類這個物種, 其實它這個函數(shù)主要是從 org 系列的包里面去摘取信息 ,包括:
Package Species
org.Ag.eg.db Anopheles
org.Bt.eg.db Bovine
org.Ce.eg.db Worm
org.Cf.eg.db Canine
org.Dm.eg.db Fly
org.Dr.eg.db Zebrafish
org.EcK12.eg.db E coli strain K12
org.EcSakai.eg.db E coli strain Sakai
org.Gg.eg.db Chicken
org.Hs.eg.db Human
org.Mm.eg.db Mouse
org.Mmu.eg.db Rhesus
org.Pt.eg.db Chimp
org.Rn.eg.db Rat
org.Ss.eg.db Pig
org.Xl.eg.db Xenopus
其實歸根結(jié)底是 ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO 里面的信息,比如:
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Mus_musculus.gene_info.gz.
是不是很方便??!