搜索GSEXXX
利用prefetch下載數(shù)據(jù)
- prefetch安裝與使用
prefetch -h # 可以顯示幫助文檔就說(shuō)明安裝成功
# 如果要下載數(shù)據(jù)比如SRR文件,直接加ID號(hào),指定輸出目錄就好
prefetch SRRxxxxxxx -O PATH
- aspera安裝
wget http://download.asperasoft.com/download/sw/connect/3.7.4/aspera-connect-3.7.4.147727-linux-64.tar.gz
tar zxvf aspera-connect-3.7.4.147727-linux-64.tar.gz
#安裝
bash aspera-connect-3.7.4.147727-linux-64.sh
# 然后cd到根目錄下看看是不是存在了.aspera文件夾,有的話表示安裝成功
cd && ls -a
# 將aspera軟件加入環(huán)境變量,并激活
echo 'export PATH=~/.aspera/connect/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
# 最后檢查ascp是不是能用了
ascp --help
3.數(shù)據(jù)下載
wkd=/home/project/single-cell/MCC
cd $wkd/raw
# for patient 2586-4
cat >SRR_Acc_List-2586-4.txt
SRR7722937
SRR7722938
SRR7722939
SRR7722940
SRR7722941
SRR7722942
cat SRR_Acc_List-2586-4.txt |while read i
do prefetch $i -O `pwd` && echo "** ${i}.sra done **"
done
其中Acssesion list 可在GEO-SRA中下載
如果作者將數(shù)據(jù)上傳在EBI中
詳見(jiàn)http://www.itdecent.cn/p/9040b7573380
理解測(cè)序原始數(shù)據(jù)的幾個(gè)參數(shù):
I1:library barcode(sample index)文件大小最小
used to multiple samples on one sequencing lane(8bp)
R1:cell barcode
used to identify the cell the read come from (16bp) +
to identify reads that arise during PCR replication
R2:sequencing reads 文件大小最大
to identify the gene a read came from(91 - 98bp)
sra文件轉(zhuǎn)為fastq
time fastq-dump --gzip --split-3 -A $i ${i}.sra && echo "** ${i}.sra to fastq done **"
cat命令
結(jié)束用法:ctrl + D