【R圖千言】主成分分析之3D繪圖

主成分分析 (PCA, principal component analysis)是一種數(shù)學降維方法。

PCA降維過程;
1)數(shù)據(jù)標準化
2)求協(xié)方差矩陣
3)特征向量排序
4)投影矩陣
5)數(shù)據(jù)轉(zhuǎn)換

將樣本數(shù)據(jù)求一個維度的協(xié)方差矩陣,然后求解這個協(xié)方差矩陣的特征值和對應的特征向量,將這些特征向量按照對應的特征值從大到小排列,組成新的矩陣,被稱為特征向量矩陣,也可以稱為投影矩陣,然后用改投影矩陣將樣本數(shù)據(jù)轉(zhuǎn)換。取前K維數(shù)據(jù)即可,實現(xiàn)對數(shù)據(jù)的降維。

案例1

創(chuàng)建數(shù)據(jù)集

  1. 用R模擬芯片數(shù)據(jù)矩陣,矩陣為10000行(10000個基因),100列(100個樣本),生成均值為0的正態(tài)分布的隨機數(shù)據(jù)。
    chip.data<-matrix(rnorm(10000*100,mean=0),nrow=10000,ncol=100)
    顯示結果:
1.jpg

2,在10000個基因中,假定有100個基因在兩組間存在差異,前50個上調(diào),另50個下調(diào);

1)創(chuàng)建1000個1~1000的隨機數(shù),作為索引
2)創(chuàng)建50*10的正態(tài)分布矩陣,均值為2,通過sha上一步的隨機數(shù)讀取1:50的數(shù)字作為行號,前10列,賦值給chip.data,作為上調(diào)數(shù)據(jù)集。
3)相同方法得到50個下調(diào)的數(shù)據(jù)集

diff.index<-sample(1:1000,1000)

chip.data[diff.index[1:50],1:10]<-rnorm(50*10,mean=2)
chip.data[diff.index[1:50],1:10]<-rnorm(50*10,mean=-2)
  1. PCA作圖

princomp函數(shù)使用方法

Description
princomp performs a principal components analysis on the given numeric data matrix and returns the results as an object of class princomp.
## Default S3 method:
princomp(x, cor = FALSE, scores = TRUE, covmat = NULL,
         subset = rep_len(TRUE, nrow(as.matrix(x))), ...)

PCA統(tǒng)計
chip.data<-princomp(chip.data)
顯示chip.data的數(shù)據(jù)

> chip.data
                  [,1]          [,2]          [,3]          [,4]          [,5]          [,6]
    [1,] -8.764830e-01 -2.585436e+00  1.7486665932  0.6825088090  0.8905718598  2.2543743674
    [2,]  2.756559e+00  9.191507e-01  1.7224333465  2.5164729313  0.3655551313  0.3940460436
    [3,]  9.754316e-01 -9.121371e-01 -0.0534088859  0.4711108467 -0.6567994543 -0.9404594391
    [4,] -1.443449e+00  6.328793e-01  0.7067575122 -2.0083705142 -0.0641474431  0.5404051953
    [5,] -1.678596e+00 -4.086325e-01 -0.6946972480  0.9941794052  1.9677986393  0.4281278343
    [6,]  2.318705e+00  2.574536e+00  2.4483722951  3.7352614791  0.6849518201  2.5269332706
    [7,]  1.368299e+00 -6.396757e-01 -0.3016863422 -0.9881343210  0.7250075490 -1.1474935276
    [8,]  4.547110e-01 -1.388434e+00  0.5724884590  1.3446862438  0.2708813623  0.0768302649
    [9,] -3.320154e-01  1.015236e+00  0.0524039788  0.8327729956  1.5803932962 -1.1469311968
   [10,]  1.442150e+00 -1.005228e+00  0.9377764607  1.5061633084 -0.7742683227 -1.9687078752

顯示統(tǒng)計結果

> summary(chip.data)
Importance of components:
                         Comp.1    Comp.2    Comp.3    Comp.4    Comp.5     Comp.6     Comp.7     Comp.8     Comp.9    Comp.10
Standard deviation     3.240085 3.2099856 3.1956557 3.1691590 3.1505363 3.13960683 3.11757677 3.10222437 3.07273039 3.05572866
Proportion of Variance 0.105799 0.1038424 0.1029174 0.1012178 0.1000317 0.09933886 0.09794967 0.09698734 0.09515192 0.09410186
Cumulative Proportion  0.105799 0.2096414 0.3125588 0.4137765 0.5138082 0.61314710 0.71109677 0.80808411 0.90323603 0.99733790

Standard deviation # 標準方差
Proportion of Variance # 貢獻度
Cumulative Proportion # 累計貢獻度

前10個主成分已可以dad達到解析0.99733790的數(shù)據(jù)

  1. 畫圖
    1)設置兩組100個差異基因的顏色??梢酝ㄟ^更改,“2”“7”的1:10范圍的數(shù)字,更改兩組的顏色
    2)plot3d(xlab,ylab,zlab三維數(shù)據(jù)集,分組顏色,圖形類型,半徑)
    以下為type:s,代表圖形為球星
colour<-c(rep(2,50),rep(7,50))
library(rgl)
plot3d(chip.data.pca$loadings[,1:3],col=colour,type="s",radius = 0.025)

顯示結果3D圖,可以使用鼠標進行旋轉(zhuǎn)和方法縮小,直到最清晰角度為止。

2.jpg
plot3d(chip.data.pca$loadings[,1:3],col=colour,type="l",radius = 0.025)

顯示線性結果:

3.jpg


案例2
加載包和數(shù)據(jù)集

rm(list=ls())
library(pca3d)
library(rgl)

data(metabo)
head(metabo)

數(shù)據(jù)集介紹


4.jpg
Metabolic profiles in tuberculosis. # 肺結核代謝數(shù)據(jù)集

Description

Relative abundances of metabolites from serum samples of three groups of individuals
# 三組血清樣本的相對豐度
Details

A data frame with 136 observations on 425 metabolic variables.
136個觀測值,425ge個daixie個代謝變量


Serum samples from three groups of individuals were compared: tuberculin skin test negative (NEG), positive (POS) and clinical tuberculosis (TB).
#比較三組患者的血清樣本:結核菌素皮膚試驗陰性(NEG)、陽性(POS)和臨床結核(TB)。
PCA計算

prcomp函數(shù)使用方法

Principal Components Analysis

Description

Performs a principal components analysis on the given data matrix and returns the results as an object of class prcomp.

## Default S3 method:
prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE,
       tol = NULL, rank. = NULL, ...)

1)去除數(shù)據(jù)集的第一列行名作為數(shù)據(jù)集,標準化數(shù)據(jù)
2)以數(shù)據(jù)集的第一列行名作為分組因子

metabo.pca <- prcomp(metabo[,-1], scale.=TRUE)
groups  <- factor(metabo[,1])

統(tǒng)計計算結果

> summary(metabo.pca)
Importance of components:
                           PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8    PC9   PC10    PC11    PC12    PC13    PC14
Standard deviation     5.86992 5.38923 4.74978 4.11434 3.88969 3.81589 3.30208 3.09675 2.9872 2.9157 2.80259 2.71364 2.60341 2.56392
Proportion of Variance 0.08146 0.06866 0.05333 0.04002 0.03577 0.03442 0.02578 0.02267 0.0211 0.0201 0.01857 0.01741 0.01602 0.01554
Cumulative Proportion  0.08146 0.15012 0.20345 0.24347 0.27924 0.31366 0.33944 0.36211 0.3832 0.4033 0.42187 0.43928 0.45530 0.47084

作圖

pca3d使用方法

pca2d {pca3d}   R Documentation
Show a three- or two-dimensional plot of a prcomp object

Description

Show a three- two-dimensional plot of a prcomp object or a matrix, using different symbols and colors for groups of data

Usage
pca3d(pca, components = 1:3, col = NULL, title = NULL, new = FALSE,
  axes.color = "grey", bg = "white", radius = 1, group = NULL,
  shape = NULL, palette = NULL, fancy = FALSE, biplot = FALSE,
  biplot.vars = 5, legend = NULL, show.scale = FALSE,
  show.labels = FALSE, labels.col = "black", show.axes = TRUE,
  show.axe.titles = TRUE, axe.titles = NULL, show.plane = TRUE,
  show.shadows = FALSE, show.centroids = FALSE, show.group.labels = FALSE,
  show.shapes = TRUE, show.ellipses = FALSE, ellipse.ci = 0.95)

pca3d(數(shù)據(jù)集,分組,是否顯示置信區(qū)間,顯示默認值是0.95,而橢圓的大小為95。是否實現(xiàn)分隔平面)
pca3d(metabo.pca, group=groups, show.ellipses=TRUE, elle.ci=0.75, show.plane=FALSE)

顯示結果3D圖,可以使用鼠標進行旋轉(zhuǎn)和方法縮小,直到最清晰角度為止。


5.jpg

取消外包圍分隔平面

pca3d(metabo.pca, group=groups, show.ellipses=TRUE, ellipse.ci=0.75, show.plane=FALSE)

顯示結果:

6.jpg
最后編輯于
?著作權歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

友情鏈接更多精彩內(nèi)容