视频 97 在线,大香av一区二区三区

轉(zhuǎn)自 TCGA數(shù)據(jù)下載—TCGAbiolinks包參數(shù)詳解

原創(chuàng) hls 組學(xué)大講堂 2019-10-22

Install tcgabiolink

if(!requireNamespace("BiocManager",quietly=TRUE)){

install.packages("BiocManager")

}

options(BioC_mirror="https://mirrors.tuna.tsinghua.edu.cn/bioconductor")

BiocManager::install("TCGAbiolinks")

TCGAbiolink-Download

1.GDCquery()? #查詢data

2.getResults()? #下載data

3.GDCprepare() #整理data

##說明書http://www.bioconductor.org/packages/release/bioc/vignettes/TCGAbiolinks/inst/doc/query.html

GDCquery參數(shù)

1.Project

getGDCprojects()$project_id ,獲取TCGA 中最新的不同癌的項目號

2.data.category

TCGAbiolinks:::getProjectSummary(project)查看project中有哪些數(shù)據(jù)類型，如查詢"TCGA-ACC"，

exsample:

TCGAbiolinks:::getProjectSummary('TCGA-ESCA')

TCGAbiolinks:::getProjectSummary('TCGA-ESCA')$file_count[1] 5657

$data_categories? file_count case_count? ? ? ? ? ? ? data_category1? ? ? ? 919? ? ? ? 184? ? Transcriptome Profiling2? ? ? 1486? ? ? ? 184 Simple Nucleotide Variation3? ? ? ? 962? ? ? ? 185? ? ? ? ? ? ? ? Biospecimen4? ? ? ? 207? ? ? ? 185? ? ? ? ? ? ? ? ? ? Clinical5? ? ? ? 202? ? ? ? 185? ? ? ? ? ? DNA Methylation6? ? ? 1115? ? ? ? 185? ? ? Copy Number Variation7? ? ? ? 766? ? ? ? 185? ? ? ? ? ? Sequencing Reads

$case_count[1] 185

$file_size[1] 8.198261e+12

3.data.type

參數(shù)受到熵一個參數(shù)的影響，不同的data.category，會有不同的data.type

4.Workflow.type

這個參數(shù)受到上兩個參數(shù)的影響，不同的data.category和不同的data.type，會有不同的workflow.type，如下表所示：https://www.omicsclass.com/article/1059

legacy這個參數(shù)主要是設(shè)置TCGA數(shù)據(jù)有兩不同入口可以下載，GDC Legacy Archive 和 GDC Data Portal，以下是官方的解釋兩種數(shù)據(jù)Legacy or Harmonized區(qū)別：大致意思為：Legacy 數(shù)據(jù)hg19和hg18為參考基因組（老數(shù)據(jù)）而且已經(jīng)不再更新了，Harmonized數(shù)據(jù)以hg38為參考基因組的數(shù)據(jù)（新數(shù)據(jù)），現(xiàn)在一般選擇Harmonized?？梢栽O(shè)置為TRUE或者FALSE：

access

Filter by access type. Possible values: controlled, open，篩選數(shù)據(jù)是否開放，這個一般不用設(shè)置，不開放的數(shù)據(jù)也沒必要了，所以都設(shè)置成：access=“open"

7.platform

涉及到數(shù)據(jù)來源的平臺，如芯片數(shù)據(jù)，甲基化數(shù)據(jù)等等平臺的篩選，一般不做設(shè)置，除非要篩選特定平臺的數(shù)據(jù)：

8. file.type

如果是在GDC Legacy Archive（legacy=TRUE）下載數(shù)據(jù)的時候使用，可以參考官網(wǎng)說明：http://www.bioconductor.org/packages/release/bioc/vignettes/TCGAbiolinks/inst/doc/query.html

如果在GDC Data Portal，這個參數(shù)不用設(shè)置

9. barcode

A list of barcodes to filter the files to download，可以指定要下載的樣品，例如：

barcode =c"TCGA-14-0736-02A-01R-2005-01""TCGA-06-0211-02A-02R-2005-01"

10. data.format

可以設(shè)置的選項為不同格式的文件：("VCF", "TXT", "BAM","SVS","BCR XML","BCR SSF XML", "TSV", "BCR Auxiliary XML", "BCR OMF XML", "BCR Biotab", "MAF", "BCR PPS XML",? "XLSX")，通常情況下不用設(shè)置，默認就行；

11. experimental.strategy

用于過濾不同的實驗方法得到的數(shù)據(jù)：

Harmonized: WXS, RNA-Seq, miRNA-Seq, Genotyping Array.

Legacy: WXS, RNA-Seq, miRNA-Seq, Genotyping Array, DNA-Seq, Methylation array, Protein expression array, WXS,CGH array, VALIDATION, Gene expression array,WGS, MSI-Mono-Dinucleotide Assay, miRNA expression array, Mixed strategies, AMPLICON, Exon array, Total RNA-Seq, Capillary sequencing, Bisulfite-Seq

12. sample.type

對樣本的類型進行過濾，例如，原發(fā)癌組織，復(fù)發(fā)癌等等；

學(xué)習(xí)完成了所有的參數(shù)，這里也有舉例使用：

query <- GDCquery(project = "TCGA-ACC",

? ? ? ? ? ? ? ?? data.category = "Copy Number Variation",

? ? ? ? ? ? ? ?? data.type = "Copy Number Segment")

## Not run:

query <- GDCquery(project = "TARGET-AML",

? ? ? ? ? ? ? ?? data.category = "Transcriptome Profiling",

? ? ? ? ? ? ? ?? data.type = "miRNA Expression Quantification",

? ? ? ? ? ? ? ?? workflow.type = "BCGSC miRNA Profiling",

? ? ? ? ? ? ? ?? barcode = c("TARGET-20-PARUDL-03A-01R","TARGET-20-PASRRB-03A-01R"))

query <- GDCquery(project = "TARGET-AML",

? ? ? ? ? ? ? ?? data.category = "Transcriptome Profiling",

? ? ? ? ? ? ? ?? data.type = "Gene Expression Quantification",

? ? ? ? ? ? ? ?? workflow.type = "HTSeq - Counts",

? ? ? ? ? ? ? ?? barcode = c("TARGET-20-PADZCG-04A-01R","TARGET-20-PARJCR-09A-01R"))

query <- GDCquery(project = "TCGA-ACC",

? ? ? ? ? ? ? ?? data.category =? "Copy Number Variation",

? ? ? ? ? ? ? ?? data.type = "Masked Copy Number Segment",

? ? ? ? ? ? ? ?? sample.type = c("Primary solid Tumor"))

query.met <- GDCquery(project = c("TCGA-GBM","TCGA-LGG"),

? ? ? ? ? ? ? ? ? ?? legacy = TRUE,

? ? ? ? ? ? ? ? ? ?? data.category = "DNA methylation",

? ? ? ? ? ? ? ? ? ?? platform = "Illumina Human Methylation 450")

query <- GDCquery(project = "TCGA-ACC",

? ? ? ? ? ? ? ?? data.category =? "Copy number variation",

? ? ? ? ? ? ? ?? legacy = TRUE,

? ? ? ? ? ? ? ?? file.type = "hg19.seg",

? ? ? ? ? ? ? ?? barcode = c("TCGA-OR-A5LR-01A-11D-A29H-01"))

下載數(shù)據(jù)? GDCdownload()

上面的GDCquery()命令完成之后我們就可以用GDCdownload()函數(shù)下載數(shù)據(jù)了，如果數(shù)據(jù)很多，如果中間中斷可以重復(fù)運行GDCdownload()函數(shù)繼續(xù)下載，直到所有的數(shù)據(jù)下載完成，使用舉例如下：

query <-GDCquery(project = "TCGA-GBM",

? ? ? ? ? ? ? ? ? ? ? ? ? data.category = "Gene expression",

? ? ? ? ? ? ? ? ? ? ? ? ? data.type = "Gene expression quantification",

? ? ? ? ? ? ? ? ? ? ? ? ? platform = "Illumina HiSeq",

? ? ? ? ? ? ? ? ? ? ? ? ? file.type? = "normalized_results",

? ? ? ? ? ? ? ? ? ? ? ? ? experimental.strategy = "RNA-Seq",

? ? ? ? ? ? ? ? ? ? ? ? ? barcode = c("TCGA-14-0736-02A-01R-2005-01", "TCGA-06-0211-02A-02R-2005-01"),

? ? ? ? ? ? ? ? ? ? ? ? ? legacy = TRUE)

GDCdownload(query, method = "client", files.per.chunk = 10, directory="D:/data")

具體參數(shù)說明如下，主要設(shè)置的參數(shù)：query，為GDCquery查詢的結(jié)果，files.per.chunk = 10,設(shè)置同時下載的數(shù)量，如果網(wǎng)速慢建議設(shè)置的小一些， directory="D:/data" 數(shù)據(jù)存儲的路徑；

整理數(shù)據(jù)? GDCprepare()

GDCprepare可以自動的幫我們獲得基因表達數(shù)據(jù)：

data <- GDCprepare(query = query,

? ? ? ? ? ? ? ? ? save = TRUE,

? ? ? ? ? ? ? ? ? directory =? "D:/data", ? #注意和GDCdownload設(shè)置的路徑一致GDCprepare才可以找到下載的數(shù)據(jù)然后去處理。 ? ?

? ? ? ? ? ? ? ? ? save.filename = "GBM.RData") ? #存儲一下，方便下載直接讀取

獲得了data數(shù)據(jù)之后，就可以往下進行數(shù)據(jù)挖掘了。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

TCGA數(shù)據(jù)下載，TCGAbiolinks(搬運)

TCGA數(shù)據(jù)下載，TCGAbiolinks(搬運)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

TCGA數(shù)據(jù)下載，TCGAbiolinks(搬運)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

TCGA數(shù)據(jù)下載，TCGAbiolinks(搬運)