blupf90根據G矩陣和H矩陣構建PCA分析以及與Plink以及GCTA的結果對比

模擬一套數據, 5個世代, 最后三代有基因型數據, 每個世代400個個體, SNP為50K

1. blupf90構建G矩陣的PCA

blupf90如果想要進行GBLUP分析, 不寫系譜信息即可, 示例par文件:

DATAFILE
dat_f90.txt
TRAITS
10         # This is column 10 (phenotype) from QMSim data file
FIELDS_PASSED TO OUTPUT
1          # This will copy the ID number to the renf90.dat data file
WEIGHT(S)  # WARNING: ALWAYS PUT AN EMPTY LINE AFTER THIS!!!!!

RESIDUAL_VARIANCE
0.9                # add starting values for residual variance
EFFECT
4 cross alpha    # Fit generation as a fixed effect, 'cross alpha' is a class in SAS
EFFECT
1 cross alpha    # Fit animal effect
RANDOM
animal           # Fit animal effect (A matrix) for the effect directly above it (column 1, animal)
#FILE
#ped_f90.txt  #  pedigree file (animal, sire, dam), 0's are missing always!!!
#FILE_POS
#1 2 3 0 0        # indicates that column 1 = Animal, column 2 = Sire, column 3 = Dam
SNP_FILE
yM.txt
(CO)VARIANCES    
0.1               # add starting values for additive animal effect
OPTION alpha_size 25            # Equal to the max number of characters within a column
OPTION max_string_readline 800  # maximum number of characters in one line of data file
OPTION max_field_readline 100   # maximum number of columns in the dataset
#OPTION saveHinvOrig
#OPTION saveHinv
#OPTION sol se
#OPTION use_yams
#OPTION missing -999
OPTION plotpca

運行preGSf90后, 會生成pc1vspc2文件, 里面包括PC1和PC2兩列, 增加世代為pop, 然后使用R畫圖:

pca = read.table(("pc1vspc2"))
head(pca)
names(pca) = c("PC1","PC2")
pca$pop = rep(c("A","B","C"),each=400)
library(ggplot2)
p <- ggplot(pca, aes(x=PC1, y=PC2, colour=pop)) 
p <- p + geom_point(size=2)
p <- p + stat_ellipse(level = 0.95, size = 1)
p <- p + geom_hline(yintercept = 0) 
p <- p + geom_vline(xintercept = 0) 
p <- p + theme_bw()
p

結果:

image

2. blupf90構建H矩陣的PCA

需要定義系譜和基因型, 示例par文件:

DATAFILE
dat_f90.txt
TRAITS
10         # This is column 10 (phenotype) from QMSim data file
FIELDS_PASSED TO OUTPUT
1          # This will copy the ID number to the renf90.dat data file
WEIGHT(S)  # WARNING: ALWAYS PUT AN EMPTY LINE AFTER THIS!!!!!

RESIDUAL_VARIANCE
0.9                # add starting values for residual variance
EFFECT
4 cross alpha    # Fit generation as a fixed effect, 'cross alpha' is a class in SAS
EFFECT
1 cross alpha    # Fit animal effect
RANDOM
animal           # Fit animal effect (A matrix) for the effect directly above it (column 1, animal)
FILE
ped_f90.txt  #  pedigree file (animal, sire, dam), 0's are missing always!!!
FILE_POS
1 2 3 0 0        # indicates that column 1 = Animal, column 2 = Sire, column 3 = Dam
SNP_FILE
yM.txt
(CO)VARIANCES    
0.1               # add starting values for additive animal effect
OPTION alpha_size 25            # Equal to the max number of characters within a column
OPTION max_string_readline 800  # maximum number of characters in one line of data file
OPTION max_field_readline 100   # maximum number of columns in the dataset
#OPTION saveHinvOrig
#OPTION saveHinv
#OPTION sol se
#OPTION use_yams
#OPTION missing -999
OPTION plotpca

運行preGSf90后, 會生成pc1vspc2文件, 里面包括PC1和PC2兩列, 增加世代為pop, 然后使用R畫圖:

pca = read.table(("pc1vspc2"))
head(pca)
names(pca) = c("PC1","PC2")
pca$pop = rep(c("A","B","C"),each=400)
library(ggplot2)
p <- ggplot(pca, aes(x=PC1, y=PC2, colour=pop)) 
p <- p + geom_point(size=2)
p <- p + stat_ellipse(level = 0.95, size = 1)
p <- p + geom_hline(yintercept = 0) 
p <- p + geom_vline(xintercept = 0) 
p <- p + theme_bw()
p
image

3. plink根據G矩陣做PCA

代碼:

plink --file b --pca 3

結果生成:

plink.eigenval  plink.eigenvec  plink.log  plink.nosex

R語言作圖:

library(ggplot2)
head(dd)
p <- ggplot(dd, aes(x=PC1, y=PC2, colour=pop)) 
p <- p + geom_point(size=2)
p <- p + stat_ellipse(level = 0.95, size = 1)
# p <- p + scale_color_manual(values = cols)
p <- p + geom_hline(yintercept = 0) 
p <- p + geom_vline(xintercept = 0) 
p <- p + theme_bw()
p
image

4. gcta64根據G矩陣做PCA

將ped文件轉化為bed文件

plink --file b --make-bed --out c

生成grm文件

gcta64 --bfile c --autosome --make-grm --out grm

生成pca文件

gcta64 --grm grm --pca 3

根據PCA信息作圖

image

結論

blupf90的G矩陣, H矩陣, plink的PCA結果一致.
GCTA構建的PCA結果不太一致, 懷疑是參數默認的有問題, 回頭查看一下.

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

友情鏈接更多精彩內容