前言
GDCRNATools 是一個(gè)用于下載、整理和綜合分析GDC中IncRNA、mRNA和miRNA數(shù)據(jù)的R/Bioconductor包。主要功能包括:差異基因分析、生存分析、功能富集分析、內(nèi)源競(jìng)爭(zhēng)性RNA分析、lncRNA分析以及pseudogene分析等。另外,還可以進(jìn)行結(jié)果可視化,比如常規(guī)的火山圖,柱狀圖,散點(diǎn)圖,富集分析氣泡圖,生存曲線等。具體使用說明詳見: 說明文檔。

安裝及使用
環(huán)境要求:R (>= 3.5.0)
1. GDCRNATools 安裝方法一(詳見)
最簡(jiǎn)單的安裝方式(需要聯(lián)網(wǎng)):
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("GDCRNATools", version = "3.8")
安裝成功后,測(cè)試一下:
> library(GDCRNATools)
##############################################################################
Pathview is an open source software package distributed under GNU General
Public License version 3 (GPLv3). Details of GPLv3 is available at
http://www.gnu.org/licenses/gpl-3.0.html. Particullary, users are required to
formally cite the original Pathview paper (not just mention it) in publications
or products. For details, do citation("pathview") within R.
The pathview downloads and uses KEGG data. Non-academic uses may require a KEGG
license agreement (details at http://www.kegg.jp/kegg/legal.html).
##############################################################################
2. GDCRNATools 安裝方法二(詳見)
在無法正常聯(lián)網(wǎng)的時(shí)候,那只好選擇離線安裝了:
install.packages("GDCRNATools",contriburl=paste("file:","/work/software/R/contrib",sep=''), type="source")
如果沒有出現(xiàn)報(bào)錯(cuò),那么安裝就應(yīng)該沒什么問題了。
3. 出現(xiàn)報(bào)錯(cuò)了怎么辦?
偶爾可能會(huì)遇到類似 “l(fā)ibudunits2.so not found!” 的報(bào)錯(cuò),這說明udunits 庫未正確安裝,需要進(jìn)行安裝:
$ wget -c ftp://ftp.unidata.ucar.edu/pub/udunits/udunits-2.2.26.tar.gz
$ tar zxf udunits-2.2.26.tar.gz
$ cd udunits-2.2.26
$ ./configure
$ make
$ make install
$ make install-info install-html install-pdf
$ make clean
安裝好udunits 庫了之后,再進(jìn)行GDCRNATools的安裝即可。
使用示例
最近安裝完GDCRNATools之后,按照官網(wǎng)上的教程,進(jìn)行了簡(jiǎn)單的測(cè)試,代碼和結(jié)果如下:
1)數(shù)據(jù)下載、整理:
library(GDCRNATools)
library(DT)
project <- 'TCGA-CHOL'
rnadir <- paste(project, 'RNAseq', sep='/')
#1) load RNA counts data
data(rnaCounts)
rnaExpr <- gdcVoomNormalization(counts = rnaCounts, filter = FALSE) ### Normalization of RNAseq data
#2) Parse metadata
metaMatrix.RNA <- gdcParseMetadata(project.id = 'TCGA-CHOL',
data.type = 'RNAseq',
write.meta = T)
metaMatrix.RNA <- gdcFilterDuplicate(metaMatrix.RNA)
metaMatrix.RNA <- gdcFilterSampleType(metaMatrix.RNA)
datatable(as.data.frame(metaMatrix.RNA[1:5,]), extensions = 'Scroller',
options = list(scrollX = TRUE, deferRender = TRUE, scroller = TRUE))
#3) Merge RNAseq data
rnaCounts <- gdcRNAMerge(metadata = metaMatrix.RNA,
path = rnadir, # the folder in which the data stored
organized = T, # if the data are in separate folders
data.type = 'RNAseq')

2)RNAseq 差異分析:
#4) Differential gene expression analysis
data(DEGAll)
DEGAll <- gdcDEAnalysis(counts = rnaCounts,
group = metaMatrix.RNA$sample_type,
comparison = 'PrimaryTumor-SolidTissueNormal',
method = 'limma')
### All DEGs
deALL <- gdcDEReport(deg = DEGAll, gene.type = 'all')
### DE long-noncoding
deLNC <- gdcDEReport(deg = DEGAll, gene.type = 'long_non_coding')
### DE protein coding genes
dePC <- gdcDEReport(deg = DEGAll, gene.type = 'protein_coding')
3)結(jié)果可視化:
#5) DEG visualization
## Volcano plot
gdcVolcanoPlot(DEGAll)
### Barplot
gdcBarPlot(deg = DEGAll, angle = 45, data.type = 'RNAseq')
degName = rownames(deALL)
gdcHeatmap(deg.id = degName, metadata = metaMatrix.RNA, rna.expr = rnaExpr)
data(enrichOutput)
gdcEnrichPlot(enrichOutput, type = 'bar', category = 'GO', num.terms = 10)
### Bubble plot
gdcEnrichPlot(enrichOutput, type='bubble', category='GO', num.terms = 10)





4)代謝通路展示:
### View pathway maps on a local webpage
library(pathview)
deg <- deALL$logFC
names(deg) <- rownames(deALL)
pathways <- as.character(enrichOutput$Terms[enrichOutput$Category=='KEGG'])
shinyPathview(deg, pathways = pathways, directory = 'pathview')

結(jié)語
經(jīng)過簡(jiǎn)單測(cè)試之后,發(fā)現(xiàn)GDCRNATools的功能確實(shí)很強(qiáng)大,不過要想將其完全掌握,還得仔細(xì)鉆研一番,后續(xù)再進(jìn)行補(bǔ)充。如有疑問,可以留言給出郵箱地址,方便進(jìn)行交流。
參考
Bioconductor : GDCRNATools