九这里只有精品,欧美日韩永久精品婷婷,九九热视频在线观看2

參考github GWAS教學(xué)：https://github.com/MareesAT/GWA_tutorial/

總流程分為4部分

1. QC

2. Population Stratification,

3. Association Analyses

4. Polygenic Risk Score (PRS) analyses.

首先是下載數(shù)據(jù)集以及相關(guān)文件


mkdir GWAS_test

cd GWAS_test

git clone https://github.com/MareesAT/GWA_tutorial.git

unzip 1_QC_GWAS.zip

cd 1_QC_GWAS

1. QC

qc的內(nèi)容可以參考文獻(xiàn)

Marees AT, de Kluiver H, Stringer S, et al. A tutorial on conducting genome-wide association studies: Quality control and statistical analysis. Int J Methods Psychiatr Res. 2018;27(2):e1608. doi:10.1002/mpr.1608

QC一般分為7步

(1) snp 缺失

(2) 性別篩選

(3) MAF 頻率篩選

(4) 哈代溫伯格平衡測試

(5) 雜合率

(6) 相關(guān)性

(7) 人口分層

1.1 查看是否有SNP，individual缺失。


plink --bfile HapMap_3_r3_1 --missing 

Rscript --no-save hist_miss.R

截屏2020-05-13 下午12.34.04.png

截屏2020-05-13 下午12.34.17.png

根據(jù)SNPs，individuals缺失值設(shè)置閾值進(jìn)行篩選。


# Delete SNPs with missingness >0.2.

plink --bfile HapMap_3_r3_1 --geno 0.2 --make-bed --out HapMap_3_r3_2

# Delete individuals with missingness >0.2.

plink --bfile HapMap_3_r3_2 --mind 0.2 --make-bed --out HapMap_3_r3_3

# Delete SNPs with missingness >0.02.

plink --bfile HapMap_3_r3_3 --geno 0.02 --make-bed --out HapMap_3_r3_4

# Delete individuals with missingness >0.02.

plink --bfile HapMap_3_r3_4 --mind 0.02 --make-bed --out HapMap_3_r3_5

1.2 性別差異檢驗(yàn)

根據(jù)男性和女性在X染色體上SNPs分布頻率進(jìn)行性別信息檢驗(yàn)，女性F值<0.2；男性F值>0.8將不符合的樣本標(biāo)記為“PROBLEM”.


plink --bfile HapMap_3_r3_5 --check-sex

Rscript --no-save gender_check.R

截屏2020-05-13 下午12.45.26.png

截屏2020-05-13 下午12.45.33.png

從結(jié)果看，女性樣本中有一個(gè)F值>0.2.

處理性別異常個(gè)體有兩個(gè)方法：刪除或根據(jù)F值校正。這里選擇刪除的方式。

刪除


grep "PROBLEM" plink.sexcheck| awk '{print$1,$2}'> sex_discrepancy.txt

# This command generates a list of individuals with the status ìPROBLEM?.

plink --bfile HapMap_3_r3_5 --remove sex_discrepancy.txt --make-bed --out HapMap_3_r3_6

# This command removes the list of individuals with the status ìPROBLEM?.

校正


plink --bfile HapMap_3_r3_5 --impute-sex --make-bed --out HapMap_3_r3_6

# This imputes the sex based on the genotype information into your data set.

1.3 生成僅包含常染色體且去除low MAF的bfile


# Select autosomal SNPs only (i.e., from chromosomes 1 to 22).

awk '{ if ($1 >= 1 && $1 <= 22) print $2 }' HapMap_3_r3_6.bim > snp_1_22.txt

plink --bfile HapMap_3_r3_6 --extract snp_1_22.txt --make-bed --out HapMap_3_r3_7

# Generate a plot of the MAF distribution.

plink --bfile HapMap_3_r3_7 --freq --out MAF_check

Rscript --no-save MAF_check.R

截屏2020-05-13 下午12.58.13.png


# Remove SNPs with a low MAF frequency.

plink --bfile HapMap_3_r3_7 --maf 0.05 --make-bed --out HapMap_3_r3_8

# 1073226 SNPs are left

# A conventional MAF threshold for a regular GWAS is between 0.01 or 0.05, depending on sample size.

1.4 去除不符合哈代溫伯格平衡（HWE）的SNPs


# Check the distribution of HWE p-values of all SNPs.

plink --bfile HapMap_3_r3_8 --hardy

# Selecting SNPs with HWE p-value below 0.00001, required for one of the two plot generated by the next Rscript, allows to zoom in on strongly deviating SNPs.

awk '{ if ($9 <0.00001) print $0 }' plink.hwe>plinkzoomhwe.hwe

Rscript --no-save hwe.R

截屏2020-05-13 下午1.06.09.png

截屏2020-05-13 下午1.06.17.png


# By default the --hwe option in plink only filters for controls.

# Therefore, we use two steps, first we use a stringent HWE threshold for controls, followed by a less stringent threshold for the case data.

plink --bfile HapMap_3_r3_8 --hwe 1e-6 --make-bed --out HapMap_hwe_filter_step1

# The HWE threshold for the cases filters out only SNPs which deviate extremely from HWE.

# This second HWE step only focusses on cases because in the controls all SNPs with a HWE p-value < hwe 1e-6 were already removed

plink --bfile HapMap_hwe_filter_step1 --hwe 1e-10 --hwe-all --make-bed --out HapMap_3_r3_9

# Theoretical background for this step is given in our accompanying article: https://www.ncbi.nlm.nih.gov/pubmed/29484742 .

1.5 計(jì)算雜合率

進(jìn)行雜合率檢驗(yàn)，刪除不在mean±3SD范圍內(nèi)的樣本


# Checks for heterozygosity are performed on a set of SNPs which are not highly correlated.

# Therefore, to generate a list of non-(highly)correlated SNPs, we exclude high inversion regions (inversion.txt [High LD regions]) and prune the SNPs using the command --indep-pairwiseí.

# The parameters ?50 5 0.2í stand respectively for: the window size, the number of SNPs to shift the window at each step, and the multiple correlation coefficient for a SNP being regressed on all other SNPs simultaneously.

plink --bfile HapMap_3_r3_9 --exclude inversion.txt --range --indep-pairwise 50 5 0.2 --out indepSNP

# Note, don't delete the file indepSNP.prune.in, we will use this file in later steps of the tutorial.

plink --bfile HapMap_3_r3_9 --extract indepSNP.prune.in --het --out R_check

# This file contains your pruned data set.

# Plot of the heterozygosity rate distribution

Rscript --no-save check_heterozygosity_rate.R

截屏2020-05-13 下午1.17.28.png


# The following code generates a list of individuals who deviate more than 3 standard deviations from the heterozygosity rate mean.

# For data manipulation we recommend using UNIX. However, when performing statistical calculations R might be more convenient, hence the use of the Rscript for this step:

Rscript --no-save heterozygosity_outliers_list.R

# Output of the command above: fail-het-qc.txt .

# When using our example data/the HapMap data this list contains 2 individuals (i.e., two individuals have a heterozygosity rate deviating more than 3 SD's from the mean).

# Adapt this file to make it compatible for PLINK, by removing all quotation marks from the file and selecting only the first two columns.

sed 's/"http:// g' fail-het-qc.txt | awk '{print$1, $2}'> het_fail_ind.txt

# Remove heterozygosity rate outliers.

plink --bfile HapMap_3_r3_9 --remove het_fail_ind.txt --make-bed --out HapMap_3_r3_10

1.6相關(guān)性檢驗(yàn)

Assuming a random population sample we are going to exclude all individuals above the pihat threshold of 0.2 in this tutorial.


# Check for relationships between individuals with a pihat > 0.2.

plink --bfile HapMap_3_r3_10 --extract indepSNP.prune.in --genome --min 0.2 --out pihat_min0.2

# The HapMap dataset is known to contain parent-offspring relations.

# The following commands will visualize specifically these parent-offspring relations, using the z values.

awk '{ if ($8 >0.9) print $0 }' pihat_min0.2.genome>zoom_pihat.genome

# Generate a plot to assess the type of relationship.

Rscript --no-save Relatedness.R

截屏2020-05-13 下午1.30.11.png


# The generated plots show a considerable amount of related individuals (explentation plot; PO = parent-offspring, UN = unrelated individuals) in the Hapmap data, this is expected since the dataset was constructed as such.

# Normally, family based data should be analyzed using specific family based methods. In this tutorial, for demonstrative purposes, we treat the relatedness as cryptic relatedness in a random population sample.

# In this tutorial, we aim to remove all 'relatedness' from our dataset.

# To demonstrate that the majority of the relatedness was due to parent-offspring we only include founders (individuals without parents in the dataset).

plink --bfile HapMap_3_r3_10 --filter-founders --make-bed --out HapMap_3_r3_11

# Now we will look again for individuals with a pihat >0.2.

plink --bfile HapMap_3_r3_11 --extract indepSNP.prune.in --genome --min 0.2 --out pihat_min0.2_in_founders

# The file 'pihat_min0.2_in_founders.genome' shows that, after exclusion of all non-founders, only 1 individual pair with a pihat greater than 0.2 remains in the HapMap data.

# This is likely to be a full sib or DZ twin pair based on the Z values. Noteworthy, they were not given the same family identity (FID) in the HapMap data.

# For each pair of 'related' individuals with a pihat > 0.2, we recommend to remove the individual with the lowest call rate.

plink --bfile HapMap_3_r3_11 --missing

# Use an UNIX text editor (e.g., vi(m) ) to check which individual has the highest call rate in the 'related pair'.

# Generate a list of FID and IID of the individual(s) with a Pihat above 0.2, to check who had the lower call rate of the pair.

# In our dataset the individual 13291  NA07045 had the lower call rate.

vi 0.2_low_call_rate_pihat.txt

I

13291  NA07045

# Press esc on keyboard!

:x

# Press enter on keyboard

# In case of multiple 'related' pairs, the list generated above can be extended using the same method as for our lone 'related' pair.

# Delete the individuals with the lowest call rate in 'related' pairs with a pihat > 0.2

plink --bfile HapMap_3_r3_11 --remove 0.2_low_call_rate_pihat.txt --make-bed --out HapMap_3_r3_12

################################################################################################################################

# CONGRATULATIONS!! You've just succesfully completed the first tutorial! You are now able to conduct a proper genetic QC.

# For the next tutorial, using the script: 2_Main_script_MDS.txt, you need the following files:

# - The bfile HapMap_3_r3_12 (i.e., HapMap_3_r3_12.fam,HapMap_3_r3_12.bed, and HapMap_3_r3_12.bim

# - indepSNP.prune.in

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

GWAS 操作流程1

GWAS 操作流程1

總流程分為4部分

1. QC

2. Population Stratification,

3. Association Analyses

4. Polygenic Risk Score (PRS) analyses.

1. QC

1.1 查看是否有SNP，individual缺失。

1.2 性別差異檢驗(yàn)

1.3 生成僅包含常染色體且去除low MAF的bfile

1.4 去除不符合哈代溫伯格平衡（HWE）的SNPs

1.5 計(jì)算雜合率

1.6相關(guān)性檢驗(yàn)

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

GWAS 操作流程1

總流程分為4部分

1. QC

2. Population Stratification,

3. Association Analyses

4. Polygenic Risk Score (PRS) analyses.

1. QC

1.1 查看是否有SNP，individual缺失。

1.2 性別差異檢驗(yàn)

1.3 生成僅包含常染色體且去除low MAF的bfile

1.4 去除不符合哈代溫伯格平衡（HWE）的SNPs

1.5 計(jì)算雜合率

1.6相關(guān)性檢驗(yàn)

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

1.1 查看是否有SNP，individual缺失。