高清乱码麻豆,一区二三区四区不卡

上周的暑期生信黑馬培訓有老師提出要做SOM分析，最后卡在code plot只能出segment plot卻出不來line plot。查了下，沒看到解決方案。今天看了下源碼，設(shè)置了一個參數(shù)，得到趨勢圖。也順便學習了SOM分析的整個過程，整理下來，以備以后用到。

SOM分析基本理論

SOM (Self-Organizing Feature Map,自組織特征圖)是基于神經(jīng)網(wǎng)絡(luò)方式的數(shù)據(jù)矩陣和可視化方式。與其它類型的中心點聚類算法如K-means等相似，SOM也是找到一組中心點 (又稱為codebook vector)，然后根據(jù)最相似原則把數(shù)據(jù)集的每個對象映射到對應(yīng)的中心點。在神經(jīng)網(wǎng)絡(luò)術(shù)語中，每個神經(jīng)元對應(yīng)于一個中心點。

與K-means類似，數(shù)據(jù)集中的每個對象每次處理一個，判斷最近的中心點，然后更新中心點。與K-means不同的是，SOM中中心點之間存在拓撲形狀順序，在更新一個中心點的同時，鄰近的中心點也會隨著更新，直到達到設(shè)定的閾值或中心點不再有顯著變化。最終獲得一系列的中心點 (codes)隱式地定義多個簇，與這個中心點最近的對象歸為同一個簇。

SOM強調(diào)簇中心點之間的鄰近關(guān)系，相鄰的簇之間相關(guān)性更強，更有利于解釋結(jié)果，常用于可視化網(wǎng)絡(luò)數(shù)據(jù)或基因表達數(shù)據(jù)。

Even though SOM is similar to K-means, there is a fundamental difference. Centroids used in SOM have a predetermined topographic ordering relationship. During the training process, SOM uses each data point to update the closest centroid and centroids that are nearby in the topographic ordering. In this way, SOM produces an ordered set of centroids for any given data set. In other words, the centroids that are close to each other in the SOM grid are more closely related to each other than to the centroids that are farther away. Because of this constraint, the centroids of a two-dimensional SOM can be viewed as lying on a two-dimensional surface that tries to fit the n-dimensional data as well as possible. The SOM centroids can also be thought of as the result of a nonlinear regression with respect to the data points. At a high level, clustering using the SOM technique consists of the steps described in Algorithm below:

1: Initialize the centroids.
2: repeat
  3:    Select the next object.
  4:    Determine the closest centroid to the object.
  5:    Update this centroid and the centroids that are close, i.e., in a specified neighborhood.
6: until The centroids don't change much or a threshold is exceeded.
7: Assign each object to its closest centroid and return the centroids and clusters.

SOM分析實戰(zhàn)

下面是R中用kohonen包進行基因表達數(shù)據(jù)的SOM分析。

加載或安裝包

### LOAD LIBRARIES - install with:
#install.packages(c("kohonen")
library(kohonen)

讀入數(shù)據(jù)并進行標準化

data <- read.table("ehbio_trans.Count_matrix.xls", row.names=1, header=T, sep="\t")

# now train the SOM using the Kohonen method
# 標準化數(shù)據(jù)
data_train_matrix <- as.matrix(t(scale(t(data))))
names(data_train_matrix) <- names(data)

head(data_train_matrix)

                untrt_N61311 untrt_N052611 untrt_N080611 untrt_N061011 trt_N61311
ENSG00000223972    1.6201852    -0.5400617    -0.5400617    -0.5400617 -0.5400617
ENSG00000227232   -1.0711639     1.0274429     0.6776751     0.8525590 -1.2460478
ENSG00000278267   -1.6476479     1.3480756     0.1497862     0.7489309 -0.4493585
ENSG00000237613    2.4748737    -0.3535534    -0.3535534    -0.3535534 -0.3535534
ENSG00000238009   -0.3535534    -0.3535534    -0.3535534    -0.3535534  2.4748737
ENSG00000268903   -0.7020086     0.9025825    -0.7020086    -0.7020086 -0.7020086
                trt_N052611 trt_N080611 trt_N061011
ENSG00000223972   1.6201852  -0.5400617  -0.5400617
ENSG00000227232  -1.2460478   0.5027912   0.5027912
ENSG00000278267   0.7489309   0.1497862  -1.0485032
ENSG00000237613  -0.3535534  -0.3535534  -0.3535534
ENSG00000238009  -0.3535534  -0.3535534  -0.3535534
ENSG00000268903   0.9025825  -0.7020086   1.7048781

訓練SOM模型

# 定義網(wǎng)絡(luò)的大小和形狀  
som_grid <- somgrid(xdim = 10, ydim=10, topo="hexagonal")  

# Train the SOM model!
som_model <- supersom(data_train_matrix, grid=som_grid, keep.data = TRUE)

可視化SOM結(jié)果

# Plot of the training progress - how the node distances have stabilised over time.
# 展示訓練過程，距離隨著迭代減少的趨勢，判斷迭代是否足夠；最后趨于平穩(wěn)比較好
plot(som_model, type = "changes")

SOM

計量每個SOM中心點包含的基因的數(shù)目

## custom palette as per kohonen package (not compulsory)
coolBlueHotRed <- function(n, alpha = 0.7) {
  rainbow(n, end=4/6, alpha=alpha)[n:1]
}

# shows the number of objects mapped to the individual units. 
# Empty units are depicted in gray.
plot(som_model, type = "counts", main="Node Counts", palette.name=coolBlueHotRed)

SOM

計量SOM中心點的內(nèi)斂性和質(zhì)量

# map quality
# shows the mean distance of objects mapped to a unit to 
# the codebook vector of that unit. 
# The smaller the distances, the better the objects are 
# represented by the codebook vectors.
plot(som_model, type = "quality", main="Node Quality/Distance", palette.name=coolBlueHotRed)

SOM

鄰居距離-查看潛在邊界點

# 顏色越深表示與周邊點差別越大，越是分界點
# neighbour distances
# shows the sum of the distances to all immediate neighbours. 
# This kind of visualization is also known as a U-matrix plot. 
# Units near a class boundary can be expected to have higher average distances to their neighbours. 
# Only available for the "som" and "supersom" maps, for the moment.
plot(som_model, type="dist.neighbours", main = "SOM neighbour distances", palette.name=grey.colors)

SOM

查看SOM中心點的變化趨勢

#code spread
plot(som_model, type = "codes", codeRendering="lines")

SOM

獲取每個SOM中心點相關(guān)的基因

table(som_model$unit.classif)

# 只顯示一部分
  1   2   3   4   5   6 
197 172 434 187 582 249
 95  96  97  98  99 100 
168 919 226 419 193 241

# code是從左至右，從下至上進行編號的
som_model_code_class = data.frame(name=rownames(data_train_matrix), code_class=som_model$unit.classif)
head(som_model_code_class)

             name code_class
1 ENSG00000223972         81
2 ENSG00000227232         37
3 ENSG00000278267         93
4 ENSG00000237613         51
5 ENSG00000238009         11
6 ENSG00000268903          4

SOM結(jié)果進一步聚類

# 選擇合適的聚類數(shù)目
# show the WCSS metric for kmeans for different clustering sizes.
# Can be used as a "rough" indicator of the ideal number of clusters
mydata <- as.matrix(as.data.frame(som_model$codes))
wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var))
for (i in 2:15) wss[i] <- sum(kmeans(mydata, centers=i)$withinss)
par(mar=c(5.1,4.1,4.1,2.1))
plot(1:15, wss, type="b", xlab="Number of Clusters",
     ylab="Within groups sum of squares", main="Within cluster sum of squares (WCSS)")

SOM

# Form clusters on grid
## use hierarchical clustering to cluster the codebook vectors
som_cluster <- cutree(hclust(dist(mydata)), 6)
# Colour palette definition
cluster_palette <- function(x, alpha = 0.6) {
  n = length(unique(x)) * 2
  rainbow(n, start=2/6, end=6/6, alpha=alpha)[seq(n,0,-2)]
}

cluster_palette_init = cluster_palette(som_cluster)
bgcol = cluster_palette_init[som_cluster]

#show the same plot with the codes instead of just colours
plot(som_model, type="codes", bgcol = bgcol, main = "Clusters", codeRendering="lines")
add.cluster.boundaries(som_model, som_cluster)

SOM

有一些類的模式不太明顯，以后再看怎么優(yōu)化。

SOM獲取基因所在的新類

som_model_code_class_cluster = som_model_code_class
som_model_code_class_cluster$cluster = som_cluster[som_model_code_class$code_class]
head(som_model_code_class_cluster)

             name code_class cluster
1 ENSG00000223972         81       2
2 ENSG00000227232         37       8
3 ENSG00000278267         93       8
4 ENSG00000237613         51       7
5 ENSG00000238009         11       4
6 ENSG00000268903          4       3

映射某個屬性到SOM圖

# 此處選擇一個樣本作為示例，可以關(guān)聯(lián)很多信息，
# 比如基因通路，只要在矩陣后增加新的屬性就可以。
color_by_var = names(data_train_matrix)[1]
color_by = data_train_matrix[,color_by_var]
unit_colors <- aggregate(color_by, by=list(som_model$unit.classif), FUN=mean, simplify=TRUE)
plot(som_model, type = "property", property=unit_colors[,2], main=color_by_var, palette.name=coolBlueHotRed)

SOM

更多聚類方法見：

WGCNA分析，簡單全面的最新教程
基因共表達聚類分析和可視化
ggplot2高效實用指南 (可視化腳本、工具、套路、配色)
在R中贊揚下努力工作的你，獎勵一份CheatShet
別人的電子書，你的電子書，都在bookdown
R語言 - 入門環(huán)境Rstudio
R語言 - 熱圖繪制 (heatmap)
R語言 - 基礎(chǔ)概念和矩陣操作
R語言 - 熱圖簡化
R語言 - 熱圖美化
R語言 - 線圖繪制
R語言 - 線圖一步法
R語言 - 箱線圖（小提琴圖、抖動圖、區(qū)域散點圖）
R語言 - 箱線圖一步法
R語言 - 火山圖
R語言 - 富集分析泡泡圖
R語言 - 散點圖繪制
R語言 - 韋恩圖
R語言 - 柱狀圖
R語言 - 圖形設(shè)置中英字體
R語言 - 非參數(shù)法生存分析
R語言 - 繪制seq logo圖
一文看懂PCA主成分分析
富集分析DotPlot，可以服
R中1010個熱圖繪制方法
還在用PCA降維？快學學大牛最愛的t-SNE算法吧, 附Python/R代碼
一個函數(shù)抓取代謝組學權(quán)威數(shù)據(jù)庫HMDB的所有表格數(shù)據(jù)
文章用圖的修改和排版
network3D: 交互式?；鶊D
network3D 交互式網(wǎng)絡(luò)生成
Seq logo 在線繪制工具——Weblogo
生物AI插圖素材獲取和拼裝指導

Ref:

https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Self-Organizing_Maps_(SOM)
http://www.shanelynn.ie/
http://www.slideshare.net/shanelynn/2014-0117-dublin-r-selforganising-maps-for-customer-segmentation-shane-lynn
https://rpubs.com/erblast/SOM
http://www.pspc.unige.it/~drivsco/Papers/VanHulle_Springer.pdf
https://pastebin.com/fqKzgHd9
https://stackoverflow.com/questions/19858729/r-package-kohonen-how-to-plot-hexagons-instead-of-circles-as-in-matlab-som-too

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

SOM基因表達聚類分析初探

SOM基因表達聚類分析初探

SOM分析基本理論

SOM分析實戰(zhàn)

加載或安裝包

讀入數(shù)據(jù)并進行標準化

訓練SOM模型

可視化SOM結(jié)果

計量每個SOM中心點包含的基因的數(shù)目

計量SOM中心點的內(nèi)斂性和質(zhì)量

鄰居距離-查看潛在邊界點

查看SOM中心點的變化趨勢

獲取每個SOM中心點相關(guān)的基因

SOM結(jié)果進一步聚類

SOM獲取基因所在的新類

映射某個屬性到SOM圖

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

SOM基因表達聚類分析初探

SOM分析基本理論

SOM分析實戰(zhàn)

加載或安裝包

讀入數(shù)據(jù)并進行標準化

訓練SOM模型

可視化SOM結(jié)果

計量每個SOM中心點包含的基因的數(shù)目

計量SOM中心點的內(nèi)斂性和質(zhì)量

鄰居距離-查看潛在邊界點

查看SOM中心點的變化趨勢

獲取每個SOM中心點相關(guān)的基因

SOM結(jié)果進一步聚類

SOM獲取基因所在的新類

映射某個屬性到SOM圖

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av