很久沒有更新,被老板抓走做別的去了。
主要數(shù)據(jù)集下載
閱讀使用手冊上,會有一些field,bulk, individuals這樣的詞,比較難以對應網(wǎng)站上的標簽。下面我來講解一下網(wǎng)站標簽的對應。進去點擊“Catalogues”(https://biobank.ndph.ox.ac.uk/showcase/catalogs.cgi?tk=Sg3qJFY27r3WRu6KR4GdRTqN6W3KEW5T130717)會有上述的類別。Fields指個體數(shù)據(jù),Categories指對應表型特征,Returns表述研究反饋結果,Resources是數(shù)據(jù)采集時一些資料,Schema是biobank數(shù)據(jù)內部結構。

例如,找到blood biochemistry 中的數(shù)據(jù)對應項:https://biobank.ndph.ox.ac.uk/showcase/label.cgi?tk=Sg3qJFY27r3WRu6KR4GdRTqN6W3KEW5T130717&id=17518,接著推薦UKBB關聯(lián)的幾個軟件,用來讀入UKBB數(shù)據(jù),進行下載和格式轉換。
1.根據(jù)手冊首先需要獲取以下文件
UKB data download
wget? -nd? biobank.ndph.ox.ac.uk/showcase/util/ukbmd5
chmod 755 ukbmd5
wget? -nd? biobank.ndph.ox.ac.uk/showcase/util/ukbconv
chmod 755 ukbconv
wget? -nd? biobank.ndph.ox.ac.uk/showcase/util/ukbunpack
chmod 755 ukbunpack
wget? -nd? biobank.ndph.ox.ac.uk/showcase/util/ukbfetch
chomd 755 ukbfetch
wget? -nd? biobank.ndph.ox.ac.uk/showcase/util/ukblink
chmod 755 ukblink
wget? -nd? biobank.ndph.ox.ac.uk/showcase/util/ukbgene
chmod 755 ukbgene
2.基因型數(shù)據(jù)獲取
#!/bin/sh
#SBATCH --account=nn9769k? --job-name=imp
#SBATCH --partition=bigmem
#SBATCH --time=7-0:0:0
#SBATCH --ntasks=2 --cpus-per-task=4
#SBATCH --mem-per-cpu=32G
if [[ $1 != "cal" && $1 != "con" && $1 != "int" && $1 != "baf"
? ? ? ? && $1 != "l2r" && $1 != "imp" && $1 != "hap" ]]
then
? echo "First param must be one of cal,con,int,baf,l2r,imp,hap"
? exit
fi
#
# Loop through chromosomes
#
for CHR in {1..26}
do
? ukbgene $1 -c$CHR -a.ukbkey
done
3.獲取R文件
+++++++++++++++++++++Converting to a R file(tab)++++++++++++++++++++++
../ukbunpack ukbXXXXX.enc ../kxxxxx.key
home/UKBiobank/ukbconv ukbxxxxx.enc_ukb txt
4.表型文件獲取
這里推薦ukbhelper
python3 ./ukb_helper.py pheno --input "../ukbxxxx.csv" --fields 31 21003 34 52 54 53 21000 189 --out home/UKBiobank/phenotype_data/primary_demographics/primary_demographics
5.其實在UKBB賦予權限后,掌握上述基因型表型獲取方式,基本就可以完全拿到UKBB原始數(shù)據(jù)
我目前手邊工作忙完,可能會接入UKBB原始數(shù)據(jù)質控等項目,我會繼續(xù)更新。