學(xué)習(xí)GEO芯片數(shù)據(jù)下載時踩了各種坑。記錄如下:
跟從老師講解,嘗試使用GEOquery下載:
library('GEOquery')
library(dplyr)
library(tidyverse)
gset <- getGEO(GEO='GSE87211', destdir=".", getGPL = F)
### destdir存儲目錄位置,getGPL=F為拒絕下載注釋文件
報錯。下載龜速,且報錯 Timeout of 60 seconds was reached
Found 3 file(s)
GSE12417-GPL570_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE12nnn/GSE12417/matrix/GSE12417-GPL570_series_matrix.txt.gz'
Content type 'application/x-gzip' length 23572020 bytes (22.5 MB)
========================
> options(timeout=60)
> gset <- getGEO(GEO='GSE87211', destdir=".",getGPL = F)
Found 1 file(s)
GSE87211_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz'
Content type 'application/x-gzip' length 35235899 bytes (33.6 MB)
downloaded 688 KB
Error in download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", :
download from 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz' failed
In addition: Warning messages:
1: In download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", :
downloaded length 704512 != reported length 35235899
2: In download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", :
URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz': Timeout of 60 seconds was reached
解決Timeout of 60 seconds was reached(我的Rstudio server原先設(shè)定等待時間僅為60s)
#查看timout時間
> getOption('timeout')
[1] 60
#設(shè)定timeout時間
> options(timeout=100000)
##確認(rèn)一下
> getOption('timeout')
[1] 1e+05
再次運行GEOquery的getGEO。代碼順利運行,但因某些原因仍下載龜速。

image.png
有人提出解決方案:
options( 'download.file.method.GEOquery' = 'libcurl' )
## libcurl LibCurl是免費的URL傳輸庫
僅有一點點改善,依然龜速。
求助百度,嘗試使用geoChina代碼。此代碼基于AnnoProbe包。先安裝AnnoProbe。
> install.packages('AnnoProbe')
> library(AnnoProbe)
#更新鏡像庫
> devtools::install_git("https://gitee.com/jmzeng/GEOmirror")
#使用中國鏡像下載GEO數(shù)據(jù)
> gset <- AnnoProbe::geoChina(gse='GSE87211', mirror = 'tencent', destdir = '.')
#此處mirror僅有企鵝源
下載成功。
Found 1 file(s)
GSE87211_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz'
Content type 'application/x-gzip' length 35235899 bytes (33.6 MB)
==
> gset <- AnnoProbe::geoChina(gse='GSE87211', mirror = 'tencent', destdir = '.')
trying URL 'http://49.235.27.111/GEOmirror/GSE87nnn/GSE87211_eSet.Rdata'
Content type 'application/octet-stream' length 31922908 bytes (30.4 MB)
==================================================
downloaded 30.4 MB
file downloaded in .
you can also use getGEO from GEOquery, by
getGEO('GSE87211', destdir=".", AnnotGPL = F, getGPL = F)
>

image.png
經(jīng)比對,與getGEO代碼下載所得數(shù)據(jù)沒有差異。