PCA原理解讀和繪制方法

??該教程的PCA繪圖基于ggplot2,可以根據(jù)ggplot2語(yǔ)法對(duì)圖片進(jìn)行額外的修改和保存。

1. 基礎(chǔ)

PCA:全稱(chēng)Principal Component Methods,也就是主成分分析。

主成分分析是一種通過(guò)協(xié)方差分析來(lái)對(duì)數(shù)據(jù)進(jìn)行降維處理的統(tǒng)計(jì)方法。首先利用線(xiàn)性變換,將數(shù)據(jù)變換到一個(gè)新的坐標(biāo)系統(tǒng)中;然后再利用降維的思想,使得任何數(shù)據(jù)投影的第一大方差在第一個(gè)坐標(biāo)(稱(chēng)為第一主成分)上,第二大方差在第二個(gè)坐標(biāo)(第二主成分)上。這種降維的思想首先減少數(shù)據(jù)集的維數(shù),同時(shí)還保持?jǐn)?shù)據(jù)集的對(duì)方差貢獻(xiàn)最大的特征,最終使數(shù)據(jù)直觀呈現(xiàn)在二維坐標(biāo)系。主要適用于以下情況:已獲得一定數(shù)目的變量的觀測(cè)值,并希望能夠構(gòu)造出少數(shù)幾個(gè)綜合變量,在最大程度上反映原始數(shù)據(jù)的消息。

主要目的:
發(fā)現(xiàn)數(shù)據(jù)的內(nèi)在模式
對(duì)高維數(shù)據(jù)進(jìn)行降維,去除數(shù)據(jù)的噪音和冗余
識(shí)別相關(guān)變量

2. 計(jì)算

2.1 R包

很多R包中的functions可以用來(lái)計(jì)算PCA:
· prcomp()princomp() [built-in R stats package],
· PCA() [FactoMineR package],
· dudi.pca() [ade4 package],
· epPCA() [ExPosition package]

在這個(gè)教程里主要是用兩個(gè)R包:FactoMineR (for the analysis) 和factoextra (for ggplot2-based visualization)。

安裝和加載

install.packages(c("FactoMineR", "factoextra"))
library("FactoMineR")
library("factoextra")
2.2 數(shù)據(jù)格式

載入演示數(shù)據(jù)集

data(decathlon2)
View(decathlon2)
在這個(gè)表中,1:23行是Active individuals,也就是主成分分析需要是用的Individuals;24:27行是Supplementary individuals;1:10列是Active variables,也就是主成分分析需要是用的variables;11:12行是Supplementary variables。

把主成分分析所需要用的Active individuals和Active variables提取出來(lái)

decathlon2.active <- decathlon2[1:23, 1:10]
head(decathlon2.active[, 1:6], 4)
2.3 數(shù)據(jù)標(biāo)準(zhǔn)化

在主成分分析中,變量通常需要scaled(比如standardized),尤其當(dāng)變量是使用不同的尺度的單位來(lái)衡量的時(shí)候。
標(biāo)準(zhǔn)化的目標(biāo)是使變量之間具有可比性。通常scale后的數(shù)據(jù)要求i)標(biāo)準(zhǔn)差為1 ii)均值為0。
在基因表達(dá)數(shù)據(jù)的分析中,數(shù)據(jù)標(biāo)準(zhǔn)化是在PCA和聚類(lèi)前的一個(gè)常用方法。當(dāng)變量的均值和/或標(biāo)準(zhǔn)差差異非常大的時(shí)候,我們通常需要去做scale。
當(dāng)對(duì)變量進(jìn)行scale的時(shí)候,數(shù)據(jù)可以做這樣的變換:

R的基礎(chǔ)功能scale()可以被用于對(duì)數(shù)據(jù)進(jìn)行標(biāo)準(zhǔn)化。這個(gè)函數(shù)輸入的是一個(gè)numeric matrix,并且會(huì)對(duì)列進(jìn)行標(biāo)準(zhǔn)化。

注意:在使用函數(shù)PCA() [in FactoMineR]的時(shí)候,數(shù)據(jù)會(huì)自動(dòng)進(jìn)行標(biāo)準(zhǔn)化,無(wú)需額外標(biāo)準(zhǔn)化處理。

2.4 R code

函數(shù)PCA()[in FactoMineR]的用法如下:

PCA(X, scale.unit = TRUE, ncp = 5, ind.sup = NULL, 
    quanti.sup = NULL, quali.sup = NULL,  graph = TRUE)
  • X: a data frame. Rows are individuals and columns are numeric variables
  • scale.unit: a logical value. If TRUE, the data are scaled to unit variance before the analysis. This standardization to the same scale avoids some variables to become dominant just because of their large measurement units. It makes variable comparable.
  • ncp: number of dimensions kept in the final results.
  • graph: a logical value. If TRUE a graph is displayed.
  • ind.sup : 指定Supplementary individuals所在的列
  • quanti.sup, quali.sup : 指定定性和定量觀測(cè)值所在的列

實(shí)操:

library("FactoMineR")
res.pca <- PCA(decathlon2.active, graph = FALSE)

PCA()返回的是一個(gè)list,包含"$eig", "$var","$ind"和"$call"四類(lèi)數(shù)據(jù)。四類(lèi)數(shù)據(jù)的取法將在下一個(gè)部分介紹。

print(res.pca)
# **Results for the Principal Component Analysis (PCA)**
# The analysis was performed on 23 individuals, described by 10 variables
# *The results are available in the following objects:

#    name               description                          
# 1  "$eig"             "eigenvalues"                        
# 2  "$var"             "results for the variables"          
# 3  "$var$coord"       "coord. for the variables"           
# 4  "$var$cor"         "correlations variables - dimensions"
# 5  "$var$cos2"        "cos2 for the variables"             
# 6  "$var$contrib"     "contributions of the variables"     
# 7  "$ind"             "results for the individuals"        
# 8  "$ind$coord"       "coord. for the individuals"         
# 9  "$ind$cos2"        "cos2 for the individuals"           
# 10 "$ind$contrib"     "contributions of the individuals"   
# 11 "$call"            "summary statistics"                 
# 12 "$call$centre"     "mean of the variables"              
# 13 "$call$ecart.type" "standard error of the variables"    
# 14 "$call$row.w"      "weights for the individuals"        
# 15 "$call$col.w"      "weights for the variables" 

3. Visualization and Interpretation

3.1 Eigenvalues(特征值) / Variances ("$eig")

定義:參考主成分分析PCA以及特征值和特征向量的意義

PC1這個(gè)軸是原點(diǎn)到每個(gè)變量在軸上投影的距離平方和最大時(shí)的軸,每個(gè)變量在軸上的投影點(diǎn)到原點(diǎn)距離平方和就是這個(gè)PC的特征值。Eigenvalue for PC1開(kāi)方=Singular value for PC1(圖片出自StatQuest: Principal Component Analysis (PCA), Step-by-Step)

每個(gè)PC的Variation是特征值/n-1。該P(yáng)C的Variation/所有PC的Variation的和=variance.percent

The eigenvalues measure the amount of variation retained by each principal component. 靠前的PCs的Eigenvalues比較大,后面的PCs的Eigenvalues比較小。That is, the first PCs corresponds to the directions with the maximum amount of variation in the data set.

#調(diào)取Eigenvalues信息
library("factoextra")
eig.val <- get_eigenvalue(res.pca)
eig.val
##        eigenvalue variance.percent cumulative.variance.percent
## Dim.1       4.124            41.24                        41.2
## Dim.2       1.839            18.39                        59.6
## Dim.3       1.239            12.39                        72.0
## Dim.4       0.819             8.19                        80.2
## Dim.5       0.702             7.02                        87.2
## Dim.6       0.423             4.23                        91.5
## Dim.7       0.303             3.03                        94.5
## Dim.8       0.274             2.74                        97.2
## Dim.9       0.155             1.55                        98.8
## Dim.10      0.122             1.22                       100.0

Eigenvalues的總和是10(total variance)。The proportion of variation explained by each eigenvalue is given in the second column.

Eigenvalues可用于決定我們后續(xù)分析是用多少個(gè)PC數(shù)(Kaiser 1961)。

但不幸的是,沒(méi)有公認(rèn)的客觀的方法去決定多少PC是足夠的。不同研究領(lǐng)域和不同數(shù)據(jù)集的情況是不同的。 In practice, we tend to look at the first few principal components in order to find interesting patterns in the data.

在上面的數(shù)據(jù)中,前三個(gè)PC可以解釋72%的variation。這個(gè)數(shù)值是可以接受的。
另一個(gè)替代的方法來(lái)決定要使用多少PC是查看碎石圖Scree Plotfviz_eig() or fviz_screeplot() [factoextra package])。

fviz_eig(res.pca, addlabels = TRUE, ylim = c(0, 50))
87%的信息(variances)被前五個(gè)PC保留
3.2 Graph of variables ("$var")
  • 3.2.1 結(jié)果查看
    我們 可以使用get_pca_var() [factoextra package]取出variables的結(jié)果。
var <- get_pca_var(res.pca)
var
## Principal Component Analysis Results for variables
##  ===================================================
##   Name       Description                                    
## 1 "$coord"   "Coordinates for the variables"                
## 2 "$cor"     "Correlations between variables and dimensions"
## 3 "$cos2"    "Cos2 for the variables"                       
## 4 "$contrib" "contributions of the variables"

get_pca_var()的結(jié)果包含如下變量(可以被用于 plot)
· var$coord: 繪制scatter plot的變量坐標(biāo)
· var$cos2: 是var$coord值的平方。represents the quality of representation for variables on the factor map.
· var$contrib: 是cos2/所有cos2的和再乘100(因?yàn)閱挝皇?)。contains the contributions (in percentage) of the variables to the principal components.

# Coordinates
head(var$coord)
#                   Dim.1       Dim.2      Dim.3       Dim.4      Dim.5
# X100m        -0.8506257 -0.17939806  0.3015564  0.03357320 -0.1944440
# Long.jump     0.7941806  0.28085695 -0.1905465 -0.11538956  0.2331567
# Shot.put      0.7339127  0.08540412  0.5175978  0.12846837 -0.2488129
# High.jump     0.6100840 -0.46521415  0.3300852  0.14455012  0.4027002
# X400m        -0.7016034  0.29017826  0.2835329  0.43082552  0.1039085
# X110m.hurdle -0.7641252 -0.02474081  0.4488873 -0.01689589  0.2242200

# Cos2: quality on the factore map
head(var$cos2)
#                  Dim.1        Dim.2      Dim.3        Dim.4      Dim.5
# X100m        0.7235641 0.0321836641 0.09093628 0.0011271597 0.03780845
# Long.jump    0.6307229 0.0788806285 0.03630798 0.0133147506 0.05436203
# Shot.put     0.5386279 0.0072938636 0.26790749 0.0165041211 0.06190783
# High.jump    0.3722025 0.2164242070 0.10895622 0.0208947375 0.16216747
# X400m        0.4922473 0.0842034209 0.08039091 0.1856106269 0.01079698
# X110m.hurdle 0.5838873 0.0006121077 0.20149984 0.0002854712 0.05027463

# Contributions to the principal components
head(var$contrib)
#                  Dim.1      Dim.2     Dim.3       Dim.4     Dim.5
# X100m        17.544293  1.7505098  7.338659  0.13755240  5.389252
# Long.jump    15.293168  4.2904162  2.930094  1.62485936  7.748815
# Shot.put     13.060137  0.3967224 21.620432  2.01407269  8.824401
# High.jump     9.024811 11.7715838  8.792888  2.54987951 23.115504
# X400m        11.935544  4.5799296  6.487636 22.65090599  1.539012
# X110m.hurdle 14.157544  0.0332933 16.261261  0.03483735  7.166193
  • 3.2.2 Correlation circle
    變量和主成分(PC)之間的correlation被用于每個(gè)PC上變量的坐標(biāo)。
    The correlation between a variable and a principal component (PC) is used as the coordinates of the variable on the PC. The representation of variables differs from the plot of the observations: The observations are represented by their projections, but the variables are represented by their correlations (Abdi and Williams 2010).
# Coordinates of variables
head(var$coord, 4)
##            Dim.1   Dim.2  Dim.3   Dim.4  Dim.5
## X100m     -0.851 -0.1794  0.302  0.0336 -0.194
## Long.jump  0.794  0.2809 -0.191 -0.1154  0.233
## Shot.put   0.734  0.0854  0.518  0.1285 -0.249
## High.jump  0.610 -0.4652  0.330  0.1446  0.403
fviz_pca_var(res.pca, col.var = "black")

這個(gè)圖就是variable correlation plots,它展示出所有變量間的關(guān)系。它的解讀如下:
· 正相關(guān)的變量會(huì)聚在一起
· 負(fù)相關(guān)的變量會(huì)處于相反的方向,位于相反的象限。
· 變量和原點(diǎn)之間的距離衡量factor map上變量的質(zhì)量。離原點(diǎn)越遠(yuǎn)的變量在factor map上被展示的更好。

  • 3.2.3 Quality of representation
head(var$cos2, 4)
##           Dim.1   Dim.2  Dim.3   Dim.4  Dim.5
## X100m     0.724 0.03218 0.0909 0.00113 0.0378
## Long.jump 0.631 0.07888 0.0363 0.01331 0.0544
## Shot.put  0.539 0.00729 0.2679 0.01650 0.0619
## High.jump 0.372 0.21642 0.1090 0.02089 0.1622

可以使用corrplot包可視化

library("corrplot")
corrplot(var$cos2, is.corr=FALSE)


也可以使用fviz_cos2()[in factoextra]來(lái)畫(huà)條圖

# Total cos2 of variables on Dim.1 and Dim.2
fviz_cos2(res.pca, choice = "var", axes = 1:2)

對(duì)于給定的變量,所有PCs的cos2之和=1。如果一個(gè)變量可以完美的被兩個(gè)主成分所代表(如Dim.1 & Dim.2),這兩個(gè)PCs的cos2之和=1。這樣的話(huà)這個(gè)變量就會(huì)被畫(huà)在variable factor map的圈上。對(duì)于一些變量來(lái)說(shuō),想要比較好的代表數(shù)據(jù),需要超過(guò)兩個(gè)PCs,這樣的話(huà)變量就會(huì)被畫(huà)在圈內(nèi)。

# Color by cos2 values: quality on the factor map
fviz_pca_var(res.pca, col.var = "cos2",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), 
             repel = TRUE # Avoid text overlapping
             )
  • 3.2.4 Contributions of variables to PCs
library("corrplot")
corrplot(var$contrib, is.corr=FALSE) 
# Contributions of variables to PC1
fviz_contrib(res.pca, choice = "var", axes = 1, top = 10)
# Contributions of variables to PC2
fviz_contrib(res.pca, choice = "var", axes = 2, top = 10)
紅線(xiàn)是expected average contribution。如果變量的貢獻(xiàn)是一致的,那么貢獻(xiàn)值應(yīng)該是1/length(variables) = 1/10 = 10%。對(duì)于給定的PC,貢獻(xiàn)度超過(guò)紅線(xiàn)的variable 被認(rèn)為對(duì)這個(gè)PC貢獻(xiàn)較大。
fviz_contrib(res.pca, choice = "var", axes = 1:2, top = 10)

變量對(duì)某個(gè)PC的貢獻(xiàn)度可以通過(guò)顏色深淺展示出來(lái)

fviz_pca_var(res.pca, col.var = "contrib",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07")
             )

根據(jù)貢獻(xiàn)度改變透明度

fviz_pca_var(res.pca, alpha.var = "contrib")

自定義顏色

# Create a random continuous variable of length 10
set.seed(123)
my.cont.var <- rnorm(10)
# Color variables by the continuous variable
fviz_pca_var(res.pca, col.var = my.cont.var,
             gradient.cols = c("blue", "yellow", "red"),
             legend.title = "Cont.Var")

根據(jù)分組繪制顏色

# Create a grouping variable using kmeans
# Create 3 groups of variables (centers = 3)
set.seed(123)
res.km <- kmeans(var$coord, centers = 3, nstart = 25)
grp <- as.factor(res.km$cluster)
# Color variables by groups
fviz_pca_var(res.pca, col.var = grp, 
             palette = c("#0073C2FF", "#EFC000FF", "#868686FF"),
             legend.title = "Cluster")
3.3 Dimension description
res.desc <- dimdesc(res.pca, axes = c(1,2), proba = 0.05)
# Description of dimension 1
res.desc$Dim.1
## $quanti
##              correlation  p.value
## Long.jump          0.794 6.06e-06
## Discus             0.743 4.84e-05
## Shot.put           0.734 6.72e-05
## High.jump          0.610 1.99e-03
## Javeline           0.428 4.15e-02
## X400m             -0.702 1.91e-04
## X110m.hurdle      -0.764 2.20e-05
## X100m             -0.851 2.73e-07

# Description of dimension 2
res.desc$Dim.2
## $quanti
##            correlation  p.value
## Pole.vault       0.807 3.21e-06
## X1500m           0.784 9.38e-06
## High.jump       -0.465 2.53e-02

## $quanti means results for quantitative variables. Note that, variables are sorted by the p-value of the correlation.
3.4 Graph of individuals ("$ind")
  • 3.4.1 Results
ind <- get_pca_ind(res.pca)
ind
## Principal Component Analysis Results for individuals
##  ===================================================
##   Name       Description                       
## 1 "$coord"   "Coordinates for the individuals" 
## 2 "$cos2"    "Cos2 for the individuals"        
## 3 "$contrib" "contributions of the individuals"

To get access to the different components, use this:

# Coordinates of individuals
head(ind$coord)
# Quality of individuals
head(ind$cos2)
# Contributions of individuals
head(ind$contrib)
  • 3.4.2 Plots: quality and contribution
# fviz_pca_ind(res.pca)
fviz_pca_ind(res.pca, col.ind = "cos2", 
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE # Avoid text overlapping (slow if many points)
             )
相似的individuals會(huì)被聚在一起

改變點(diǎn)的大小

fviz_pca_ind(res.pca, pointsize = "cos2", 
             pointshape = 21, fill = "#E7B800",
             repel = TRUE # Avoid text overlapping (slow if many points)
             )
#同時(shí)改變點(diǎn)的大小和顏色by cos2
fviz_pca_ind(res.pca, col.ind = "cos2", pointsize = "cos2",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE # Avoid text overlapping (slow if many points)
             )

畫(huà)條圖

fviz_cos2(res.pca, choice = "ind")
# Total contribution on PC1 and PC2
fviz_contrib(res.pca, choice = "ind", axes = 1:2)
  • 3.4.3 Color by a custom continuous variable
# Create a random continuous variable of length 23,
# Same length as the number of active individuals in the PCA
set.seed(123)
my.cont.var <- rnorm(23)
# Color individuals by the continuous variable
fviz_pca_ind(res.pca, col.ind = my.cont.var,
             gradient.cols = c("blue", "yellow", "red"),
             legend.title = "Cont.Var")
  • 3.4.4 Color by groups
head(iris, 3)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa

# The variable Species (index = 5) is removed
# before PCA analysis
iris.pca <- PCA(iris[,-5], graph = FALSE)

fviz_pca_ind(iris.pca,
             geom.ind = "point", # show points only (nbut not "text")
             col.ind = iris$Species, # color by groups
             palette = c("#00AFBB", "#E7B800", "#FC4E07"),
             addEllipses = TRUE, # Concentration ellipses
             legend.title = "Groups"
             )
# Add confidence ellipses
fviz_pca_ind(iris.pca, geom.ind = "point", col.ind = iris$Species, 
             palette = c("#00AFBB", "#E7B800", "#FC4E07"),
             addEllipses = TRUE, ellipse.type = "confidence",
             legend.title = "Groups"
             )
3.5 Graph customization

更多圖形個(gè)性化繪制

  • 3.5.1 Dimensions
    默認(rèn)情況下, variables/individuals代表的是PC1和PC2,如果想看其他PC比如PC2和PC3也是可以的。
# Variables on dimensions 2 and 3
fviz_pca_var(res.pca, axes = c(2, 3))
# Individuals on dimensions 2 and 3
fviz_pca_ind(res.pca, axes = c(2, 3))
  • 3.5.2 Plot elements: point, text, arrow

geom.var參數(shù):
· geom.var = "point", to show only points;
· geom.var = "text" to show only text labels;
· geom.var = c("point", "text") to show both points and text labels
· geom.var = c("arrow", "text") to show arrows and labels (default).

geom.ind參數(shù):
· geom.ind = "point", to show only points;
· geom.ind = "text" to show only text labels;
· geom.ind = c("point", "text") to show both point and text labels (default)

# Show variable points and text labels
fviz_pca_var(res.pca, geom.var = c("point", "text"))
# Show individuals text labels only
fviz_pca_ind(res.pca, geom.ind =  "text")
  • 3.5.3 Size and shape of plot elements
    · labelsize: font size for the text labels, e.g.: labelsize = 4.
    · pointsize: the size of points, e.g.: pointsize = 1.5.
    · arrowsize: the size of arrows. Controls the thickness of arrows, e.g.: arrowsize = 0.5.
    · pointshape: the shape of points, pointshape = 21. Type ggpubr::show_point_shapes() to see available point shapes.
# Change the size of arrows an labels
fviz_pca_var(res.pca, arrowsize = 1, labelsize = 5, 
             repel = TRUE)
# Change points size, shape and fill color
# Change labelsize
fviz_pca_ind(res.pca, 
             pointsize = 3, pointshape = 21, fill = "lightblue",
             labelsize = 5, repel = TRUE)
  • 3.5.4 Ellipses
    添加Ellipses:addEllipses = TRUE
    ellipse.type可以設(shè)置為"convex","confidence","t","norm","euclid"
    ellipse.level:設(shè)置橢圓的集中度。如設(shè)置ellipse.level = 0.95 or ellipse.level = 0.66。

  • 3.5.5 Group mean points

fviz_pca_ind(iris.pca,
             geom.ind = "point", # show points only (but not "text")
             group.ind = iris$Species, # color by groups
             legend.title = "Groups",
             mean.point = FALSE)
  • 3.5.6 Axis lines
fviz_pca_var(res.pca, axes.linetype = "blank")
  • 3.5.7 Graphical parameters
    使用ggpar()[ggpubr package]函數(shù)進(jìn)行更精細(xì)的調(diào)整
ind.p <- fviz_pca_ind(iris.pca, geom = "point", col.ind = iris$Species)
ggpubr::ggpar(ind.p,
              title = "Principal Component Analysis",
              subtitle = "Iris data set",
              caption = "Source: factoextra",
              xlab = "PC1", ylab = "PC2",
              legend.title = "Species", legend.position = "top",
              ggtheme = theme_gray(), palette = "jco"
              )
3.6 Biplot雙標(biāo)圖

繪制一張簡(jiǎn)單的雙標(biāo)圖

fviz_pca_biplot(res.pca, repel = TRUE,
                col.var = "#2E9FDF", # Variables color
                col.ind = "#696969"  # Individuals color
                )
fviz_pca_biplot(iris.pca, 
                col.ind = iris$Species, palette = "jco", 
                addEllipses = TRUE, label = "var",
                col.var = "black", repel = TRUE,
                legend.title = "Species") 
fviz_pca_biplot(iris.pca, 
                # Fill individuals by groups
                geom.ind = "point",
                pointshape = 21,
                pointsize = 2.5,
                fill.ind = iris$Species,
                col.ind = "black",
                # Color variable by groups
                col.var = factor(c("sepal", "sepal", "petal", "petal")),
                
                legend.title = list(fill = "Species", color = "Clusters"),
                repel = TRUE        # Avoid label overplotting
             )+
  ggpubr::fill_palette("jco")+      # Indiviual fill color
  ggpubr::color_palette("npg")      # Variable colors
fviz_pca_biplot(iris.pca, 
                # Individuals
                geom.ind = "point",
                fill.ind = iris$Species, col.ind = "black",
                pointshape = 21, pointsize = 2,
                palette = "jco",
                addEllipses = TRUE,
                # Variables
                alpha.var ="contrib", col.var = "contrib",
                gradient.cols = "RdYlBu",
                
                legend.title = list(fill = "Species", color = "Contrib",
                                    alpha = "Contrib")
                )

4. Filtering results

# Visualize variable with cos2 >= 0.6
fviz_pca_var(res.pca, select.var = list(cos2 = 0.6))
# Top 5 active variables with the highest cos2
fviz_pca_var(res.pca, select.var= list(cos2 = 5))
# Select by names
name <- list(name = c("Long.jump", "High.jump", "X100m"))
fviz_pca_var(res.pca, select.var = name)
# top 5 contributing individuals and variable
fviz_pca_biplot(res.pca, select.ind = list(contrib = 5), 
               select.var = list(contrib = 5),
               ggtheme = theme_minimal())

5. Exporting results

5.1 Export plots to PDF/PNG files
# Print the plot to a pdf file
pdf("myplot.pdf")
print(myplot)
dev.off()

# Scree plot
scree.plot <- fviz_eig(res.pca)
# Plot of individuals
ind.plot <- fviz_pca_ind(res.pca)
# Plot of variables
var.plot <- fviz_pca_var(res.pca)
pdf("PCA.pdf") # Create a new pdf device
print(scree.plot)
print(ind.plot)
print(var.plot)
dev.off() # Close the pdf device

# Print scree plot to a png file
png("pca-scree-plot.png")
print(scree.plot)
dev.off()
# Print individuals plot to a png file
png("pca-variables.png")
print(var.plot)
dev.off()
# Print variables plot to a png file
png("pca-individuals.png")
print(ind.plot)
dev.off()

使用ggexport()

library(ggpubr)
ggexport(plotlist = list(scree.plot, ind.plot, var.plot), 
         filename = "PCA.pdf")

ggexport(plotlist = list(scree.plot, ind.plot, var.plot), 
         nrow = 2, ncol = 2,
         filename = "PCA.pdf")

ggexport(plotlist = list(scree.plot, ind.plot, var.plot),
         filename = "PCA.png")
5.2 Export results to txt/csv files
write.infile(res.pca, "pca.txt", sep = "\t")
# Export into a CSV file
write.infile(res.pca, "pca.csv", sep = ";")

6. 總結(jié)

  • 進(jìn)行PCA的方法
    Using prcomp() [stats]
res.pca <- prcomp(iris[, -5], scale. = TRUE)

Read more: http://www.sthda.com/english/wiki/pca-using-prcomp-and-princomp

  • Using princomp() [stats]
res.pca <- princomp(iris[, -5], cor = TRUE)

Read more: http://www.sthda.com/english/wiki/pca-using-prcomp-and-princomp

  • Using dudi.pca() [ade4]
library("ade4")
res.pca <- dudi.pca(iris[, -5], scannf = FALSE, nf = 5)

Read more: http://www.sthda.com/english/wiki/pca-using-ade4-and-factoextra

  • Using epPCA() [ExPosition]
library("ExPosition")
res.pca <- epPCA(iris[, -5], graph = FALSE)
  • 可視化
fviz_eig(res.pca)     # Scree plot
fviz_pca_ind(res.pca) # Graph of individuals
fviz_pca_var(res.pca) # Graph of variables

7. Further reading

For the mathematical background behind CA, refer to the following video courses, articles and books:

  • Principal component analysis (article) (Abdi and Williams 2010). https://goo.gl/1Vtwq1.
  • Principal Component Analysis Course Using FactoMineR (Video courses). https://goo.gl/VZJsnM
  • Exploratory Multivariate Analysis by Example Using R (book) (Husson, Le, and Pagès 2017).
  • Principal Component Analysis (book) (Jollife 2002).

See also:


參考:

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。
禁止轉(zhuǎn)載,如需轉(zhuǎn)載請(qǐng)通過(guò)簡(jiǎn)信或評(píng)論聯(lián)系作者。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容