大麥WGCNA分析

參考文獻(xiàn)
High-resolution spatiotemporal transcriptome mapping of tomato fruit development and ripening


image.png

關(guān)于文章中WGCNA方法參數(shù)

image.png

image.png

文章中參數(shù)主要包括一下內(nèi)容
① 轉(zhuǎn)錄組數(shù)據(jù)基因輸入?yún)?shù)
選取變異系數(shù)CV值大于1的基因作為輸入數(shù)據(jù)進(jìn)行分析
② 計(jì)算軟閾值參數(shù)
設(shè)定的軟閾值soft power為16,TOM結(jié)構(gòu)類型為signed,也是推薦的結(jié)構(gòu)類型。
③ 設(shè)定最小模塊基因數(shù)目參數(shù)
最小模塊基因數(shù)目為30,模塊設(shè)定的多少,決定分出來(lái)模塊的多少
④ 相似模塊合并參數(shù)
前面設(shè)定最小模塊數(shù)目后,基因會(huì)分成不同的模塊,計(jì)算模塊之間的相關(guān)性后,相關(guān)性高的模塊進(jìn)行合并,文章中的參數(shù)設(shè)定為0.25,也就是說(shuō)相關(guān)性高于0.75的就合并在一起。
⑤ 模塊內(nèi)hub基因篩選參數(shù)
模塊內(nèi)基因的篩選主要從兩個(gè)參數(shù)進(jìn)行篩選,第一個(gè)參數(shù)是模塊內(nèi)的基因跟模塊的相關(guān)性高于0.9,然后利用FDR 相關(guān)性計(jì)算基因跟性狀的相關(guān)性。

大麥WGCNA分析流程

一、轉(zhuǎn)錄組數(shù)據(jù)輸入和質(zhì)控

##導(dǎo)入表達(dá)量TPM數(shù)據(jù)
rm(list=ls())
library(dplyr)
library(Mfuzz)
library(WGCNA)
library(impute)
library(preprocessCore)
library(MASS)
library(class)
library(cluster)
library(impute)
library(Hmisc)
library(matrixStats)
library(flashClust)
library(dplyr)
setwd("D:\\mywork\\mywork\\10.大麥轉(zhuǎn)錄組\\2022年大麥分析\\")
data <- read.csv("1.TPM_data_aver_use.csv",header=T,row.names = 1,check.names = F)
##表達(dá)量數(shù)據(jù)是三個(gè)重復(fù)進(jìn)行矯正過(guò)離群值后,離群值用剩余兩個(gè)重復(fù)求平均值后的最終TPM結(jié)果
##基因型數(shù)據(jù)預(yù)處理
##由于soft設(shè)定R2一直小于閾值0.8,查看問(wèn)題所在是
# 無(wú)向網(wǎng)絡(luò)在power小于15或有向網(wǎng)絡(luò)power小于30內(nèi),沒(méi)有一個(gè)power值可以使無(wú)標(biāo)度網(wǎng)絡(luò)圖譜結(jié)構(gòu)R^2達(dá)到0.8或平均連接度降到100以下,可能是由于部分樣品與其他樣品差別太大造成的。這可能由批次效應(yīng)、樣品異質(zhì)性或?qū)嶒?yàn)條件對(duì)表達(dá)影響太大等造成
datExprFemale2 <- log2(data+1)
cor_T <- cor(datExprFemale2)
##繪制熱圖
library(pheatmap)
pheatmap(cor_T,show_rownames = T,show_colnames = T,cluster_rows=T,cluster_cols = T)
##結(jié)果展示相關(guān)性好
###數(shù)據(jù)質(zhì)控
m.mad <- apply(datExprFemale2,1,mad)
dataExprVar <- datExprFemale2[which(m.mad > 
                                max(quantile(m.mad, probs=seq(0, 1, 0.25))[2],0.01)),]
dim(dataExprVar)

###去除變異系數(shù)cv<1的基因

cal_cv=function(x){  # 自定義函數(shù) 標(biāo)準(zhǔn)差/平均值
  y=na.omit(x)
  return(sd(y)/mean(y))
}

m.cv <- apply(dataExprVar, 1, cal_cv)
dataExprVar2 <- dataExprVar[which(m.cv > 1),]


datExprFemale = as.data.frame(t(dataExprVar2 ))


datExprFemale_D_1 = as.data.frame(datExprFemale)

# ##檢驗(yàn)數(shù)據(jù)質(zhì)量
# # 樣本聚類檢查離群值(就是樹(shù)上特別不接近的
gsg = goodSamplesGenes(datExprFemale_D_1, verbose = 3)
head(gsg)
gsg$allOK

## 是true的話說(shuō)明所有g(shù)enes都通過(guò)了篩選
if (!gsg$allOK){
  # Optionally, print the gene and sample names that were removed:
  if (sum(!gsg$goodGenes)>0) 
    printFlush(paste("Removing genes:", 
                     paste(names(datExprFemale_D_1)[!gsg$goodGenes], collapse = ",")));
  if (sum(!gsg$goodSamples)>0) 
    printFlush(paste("Removing samples:", 
                     paste(rownames(datExprFemale_D_1)[!gsg$goodSamples], collapse = ",")));
  # Remove the offending genes and samples from the data:
  datExprFemale_D_1 =datExprFemale_D_1[gsg$goodSamples, gsg$goodGenes]
}
## 畫(huà)圖來(lái)看是否有特別離群的
dim(datExprFemale_D_1)
##共有5210個(gè)基因進(jìn)行WGCNA分析
sampleTree = hclust(dist(datExprFemale_D_1), method = "average")
##設(shè)置畫(huà)板大小

sizeGrWindow(12, 9)
par(cex=0.6)
par(mar=c(0,4,2,0))

plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="")

##沒(méi)有發(fā)現(xiàn)離群值,不需要剔除樣本



二、設(shè)定軟閾值

image.png

設(shè)定軟閾值,

# 清理,釋放內(nèi)存
collectGarbage()
gene <- datExprFemale_D_1
powers = c(c(1:10), seq(from = 12, to=20, by=2))
type = "unsigned" ##自己試驗(yàn)設(shè)計(jì)的按照unsigned
sft <- pickSoftThreshold(gene, powerVector = powers, verbose=5 )
# 推薦值。如果是NA,就需要畫(huà)圖來(lái)自己挑選
setwd("./7.WGCNA_result")
pdf("soft.pdf",width = 15,  height = 8)
par(mfrow = c(1, 2))
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2], type = 'n', 
     xlab = 'Soft Threshold (power)', ylab = 'Scale Free Topology Model Fit,signed R^2', 
     main = paste('Scale independence'))
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     labels = powers, col = 'red');
abline(h = 0.8, col = 'red')

#平均連通性與 power 值散點(diǎn)圖
plot(sft$fitIndices[,1], sft$fitIndices[,5],
     xlab = 'Soft Threshold (power)', ylab = 'Mean Connectivity', type = 'n',
     main = paste('Mean connectivity'))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels = powers, col = 'red')
dev.off()
##構(gòu)建網(wǎng)絡(luò)
#上一步估計(jì)的最佳 power 值
write.csv(sft,"1.sft.csv")
sft$powerEstimate
###推薦的軟閾值設(shè)定為9



image.png

三、構(gòu)建TOM矩陣

 獲得鄰接矩陣:
softPower <- sft$powerEstimate
adjacency = adjacency(gene, power = softPower)
# 將臨近矩陣轉(zhuǎn)為 Tom 矩陣
TOM = TOMsimilarity(adjacency)
# 計(jì)算基因之間的相異度
dissTOM = 1-TOM
hierTOM = hclust(as.dist(dissTOM),method="average")

ADJ1_cor <- abs(WGCNA::cor( gene,use = "p" ))^softPower
# 基因少(<5000)的時(shí)候使用下面的代碼:
k <- as.vector(apply(ADJ1_cor,2,sum,na.rm=T))
# 基因多的時(shí)候使用下面的代碼:
k <- softConnectivity(datE=gene,power=softPower) 
sizeGrWindow(10, 5)
par(mfrow=c(1,2))
hist(k)
scaleFreePlot(k,main="Check Scale free topology\n")
# 可以看出k與p(k)成負(fù)相關(guān)(相關(guān)性系數(shù)0.84),說(shuō)明選擇的β值能夠建立基因無(wú)尺度網(wǎng)絡(luò)。

# 使用相異度來(lái)聚類為gene tree(聚類樹(shù)):
geneTree = hclust(as.dist(dissTOM), method = "average")
# Plot the resulting clustering tree (dendrogram)
windows()
sizeGrWindow(12,9)
plot(geneTree, xlab="", sub="", main = "Gene clustering on TOM-based dissimilarity",
     labels = FALSE, hang = 0.04)
# 使用動(dòng)態(tài)剪切樹(shù)挖掘模塊:
minModuleSize = 50;
# 動(dòng)態(tài)切割樹(shù):
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM,
                            deepSplit = 2, pamRespectsDendro = FALSE,
                            minClusterSize = minModuleSize);
table(dynamicMods)

dynamicColors <- labels2colors(dynamicMods)
table(dynamicColors)
pdf("Gene dendrogram and module colors.pdf")

#模塊顏色指代
dynamicColors <- labels2colors(dynamicMods)
table(dynamicColors)
plotDendroAndColors(geneTree, dynamicColors, 'Dynamic Tree Cut',
                    dendroLabels = FALSE, addGuide = TRUE, hang = 0.03, guideHang = 0.05,
                    main = 'Gene dendrogram and module colors')
dev.off()



image.png

四、隨機(jī)挑選400個(gè)基因查看TOM結(jié)構(gòu)

# 拓?fù)錈釄D:
nSelect = 400 
# For reproducibility, we set the random seed 
set.seed(10); 
nGenes
select = sample(5210, size = nSelect); 
selectTOM = dissTOM[select, select]; 
# There's no simple way of restricting a clustering tree to a subset of genes, so we must re-cluster. 
selectTree = hclust(as.dist(selectTOM), method = "average") 
selectColors = dynamicColors[select]; 
# Open a graphical window 
sizeGrWindow(9,9) 
# Taking the dissimilarity to a power, say 10, makes the plot more informative by effectively changing 
# the color palette; setting the diagonal to NA also improves the clarity of the plot 
plotDiss = selectTOM^softPower; 
diag(plotDiss) = NA; 
TOMplot(plotDiss, 
        selectTree, 
        selectColors, 
        main = "Network heatmap plot, selected genes")

5、計(jì)算根據(jù)模塊特征向量基因計(jì)算模塊相異度

MEList = moduleEigengenes(gene, colors = dynamicColors)
MEs = MEList$eigengenes
# 計(jì)算根據(jù)模塊特征向量基因計(jì)算模塊相異度:
MEDiss = 1-cor(MEs);
# Cluster module eigengenes
METree = hclust(as.dist(MEDiss), method = "average")
# Plot the result
plotEigengeneNetworks(MEs, 
                      "Eigengene adjacency heatmap", 
                      marHeatmap = c(3,4,2,2), 
                      plotDendrograms = FALSE, 
                      xLabelsAngle = 90) 

plot(METree, 
     main = "Clustering of module eigengenes",
     xlab = "", 
     sub = "")
# 在聚類圖中畫(huà)出剪切線
abline(h=MEDissThres, col = "red")
MEDissThres = 0.2
# 在聚類圖中畫(huà)出剪切線
abline(h=MEDissThres, col = "red")
# 合并模塊:
merge_modules = mergeCloseModules(gene, dynamicColors, cutHeight = MEDissThres, verbose = 3)
# 合并后的顏色:
mergedColors = merge_modules$colors;
# 新模塊的特征向量基因:
mergedMEs = merge_modules$newMEs;
plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors),
                    c("Dynamic Tree Cut", "Merged dynamic"),
                    dendroLabels = FALSE, hang = 0.03,
                    addGuide = TRUE, guideHang = 0.05)

probes = colnames(gene)
dimnames(TOM) <- list(probes, probes)

六、計(jì)算表型和模塊的相關(guān)性

trait_data <- read.table("1.trait_input.txt",header=T,sep="\t",row.names = 1)
trait <- trait_data 

#使用上一步新組合的共表達(dá)模塊的結(jié)果
module <- mergedMEs

#共表達(dá)模塊和表型的相關(guān)性分析
moduleTraitCor <- cor(module, trait, use = 'p')
moduleTraitCor[1:6,1:3]  #相關(guān)矩陣

#相關(guān)系數(shù)的 p 值矩陣
moduleTraitPvalue <- corPvalueStudent(moduleTraitCor, nrow(module))

#輸出相關(guān)系數(shù)矩陣或 p 值矩陣
write.table(moduleTraitCor, 'moduleTraitCor.txt', sep = '\t', col.names = NA, quote = FALSE)
write.table(moduleTraitPvalue, 'moduleTraitPvalue.txt', sep = '\t', col.names = NA, quote = FALSE)

#相關(guān)圖繪制
textMatrix <- paste(signif(moduleTraitCor, 2), '\n(', signif(moduleTraitPvalue, 1), ')', sep = '')
dim(textMatrix) <- dim(moduleTraitCor)
pdf("Module-trait relationships.pdf")
par(mar = c(5, 10, 1,1))
labeledHeatmap(Matrix = moduleTraitCor, main = paste('Module-trait relationships'), 
               xLabels = names(trait), yLabels = names(module), ySymbols = names(module),
               colorLabels = FALSE, colors = blueWhiteRed(50), cex.text = 1, zlim = c(-1,1), 
               textMatrix = textMatrix, setStdMargins = FALSE)
dev.off()

gene_module <- data.frame(gene_name = colnames(gene), module = mergedColors, stringsAsFactors = FALSE)
head(gene_module)
write.csv(gene_module,"gene_module.csv")


七、感興趣模塊基因分析

#提取模塊內(nèi)的基因名稱
gene_module_select <- subset(gene_module, module == 'yellow')$gene_name

#“black”模塊內(nèi)基因在各樣本中的表達(dá)值矩陣(基因表達(dá)值矩陣的一個(gè)子集)
gene_select <- t(gene[,gene_module_select])


#“black”模塊內(nèi)基因的共表達(dá)相似度(在 TOM 矩陣中提取子集)
tom_select <- TOM[gene_module_select,gene_module_select]

#輸出
write.table(gene_select, 'blue_gene_select.txt', sep = '\t', col.names = NA, quote = FALSE)
write.table(tom_select, 'blue_tom_select.txt', sep = '\t', col.names = NA, quote = FALSE)

##重要基因的獲得,以與 TNM 顯著相關(guān)的 black 模塊為例
#選擇 black 模塊內(nèi)的基因
gene_yellow <- gene[ ,gene_module_select]

#基因的模塊成員度(module membership)計(jì)算
#即各基因表達(dá)值與相應(yīng)模塊特征基因的相關(guān)性,其衡量了基因在全局網(wǎng)絡(luò)中的位置
#基因的模塊成員度(module membership)計(jì)算
#即各基因表達(dá)值與相應(yīng)模塊特征基因的相關(guān)性,其衡量了基因在全局網(wǎng)絡(luò)中的位置
geneModuleMembership <- signedKME(gene_black, module['MEyellow'], outputColumnName = 'MM')
MMPvalue <- as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nrow(module)))

#各基因表達(dá)值與臨床表型的相關(guān)性分析
geneTraitSignificance <- as.data.frame(cor(gene_black, trait['spike_weight'], use = 'p'))
GSPvalue <- as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance), nrow(trait)))

##選擇顯著(p<0.01)、高 black 模塊成員度(MM>=0.8),與 TNM 表型高度相關(guān)(r>=0.8)的基因
geneModuleMembership[geneModuleMembership<0.8 | MMPvalue>0.01] <- 0
geneTraitSignificance[geneTraitSignificance<0.8 | MMPvalue>0.01] <- 0

select <- cbind(geneModuleMembership, geneTraitSignificance)
select <- subset(select, geneModuleMembership>=0.8 & geneTraitSignificance>=0.8)
head(select)
write.csv(select,"2.yellow_select基因模塊相關(guān)性和表型相關(guān)性都高于0.8.csv")

###對(duì)模塊內(nèi)選擇的基因進(jìn)行GO分析
library(clusterProfiler)
library(enrichplot)
library(ggplot2)
library(stringr)
#######重新嘗試進(jìn)行GO和KEGG提取
#讀取數(shù)據(jù)
term2gene <- read.csv("D:\\mywork\\mywork\\13.申大麥轉(zhuǎn)錄組分析_VRS突變體整穗\\2020.12.24_重新計(jì)算Bowman和突變體的轉(zhuǎn)錄組DESeq2和GO結(jié)果等\\Bowman_V1a\\GO結(jié)果\\term2gene.csv",header = T,row.names = 1)
head(term2gene)
go2term <- read.csv("D:\\mywork\\mywork\\13.申大麥轉(zhuǎn)錄組分析_VRS突變體整穗\\2020.12.24_重新計(jì)算Bowman和突變體的轉(zhuǎn)錄組DESeq2和GO結(jié)果等\\Bowman_V1a\\GO結(jié)果\\go2term.csv",header = T,row.names = 1)
go2ont <- read.csv("D:\\mywork\\mywork\\13.申大麥轉(zhuǎn)錄組分析_VRS突變體整穗\\2020.12.24_重新計(jì)算Bowman和突變體的轉(zhuǎn)錄組DESeq2和GO結(jié)果等\\Bowman_V1a\\GO結(jié)果\\go2ont.csv",header = T,row.names = 1)

df <- enricher(gene = row.names(select), TERM2GENE = term2gene, TERM2NAME = go2term, pvalueCutoff =1, qvalueCutoff = 1)
dim(df)
head(df)
p1 <- dotplot(df, showCategory=30)+scale_y_discrete(labels=function(x) str_wrap(x, width=50))
go<-as.data.frame(df)
names(go)
go<-as.data.frame(df)
  names(go)
  # go <- read.csv("D:\\mywork\\mywork\\申大麥轉(zhuǎn)錄組分析\\UPset圖\\BOWMAN比較\\GO分析\\18.V1a_V5_V4K_V3int.GO_result.csv.csv",header=T,row.names=1)
  #  go<-as.data.frame(go)
  # names(go)
  colnames(go) <- c("go_id","Description" ,"GeneRatio"  , "BgRatio"    , "pvalue"  ,    "p.adjust"  ,  "qvalue"   ,   "geneID","Count")
  
  go1 <- merge(go,go2ont,by="go_id")
  
  
  go1 <-  go1[order(-go1$Count),] 
  go_MF<-go1[go1$Ontology=="MF",][1:10,]
  go_CC<-go1[go1$Ontology=="CC",][1:10,]
  go_BP<-go1[go1$Ontology=="BP",][1:10,]
  go_enrich_df<-data.frame(ID=c(go_BP$go_id, go_CC$go_id, go_MF$go_id),
                           Description=c(go_BP$Description, go_CC$Description, go_MF$Description),
                           GeneNumber=c(go_BP$Count, go_CC$Count, go_MF$Count),
                           type=factor(c(rep("biological process", 10), rep("cellular component", 10),rep("molecular function",10)),levels=c("molecular function", "cellular component", "biological process")))
  
  # go_enrich_df <- rbind(go_BP,go_CC,go_MF)
  
  # go_enrich_df <-go_enrich_df [,c(1,2,9,10)]
  # go_enrich_df$type <-c(rep("biological process", 10), rep("cellular component", 10),rep("molecular function",10)) 
  # ## numbers as data on x axis
  go_enrich_df$number <- factor(rev(1:nrow(go_enrich_df)))
  ## shorten the names of GO terms
  shorten_names <- function(x, n_word=4, n_char=40){
    if (length(strsplit(x, " ")[[1]]) > n_word || (nchar(x) > 40))
    {
      if (nchar(x) > 40) x <- substr(x, 1, 40)
      x <- paste(paste(strsplit(x, " ")[[1]][1:min(length(strsplit(x," ")[[1]]), n_word)],
                       collapse=" "),"...", sep="")
      return(x)
    } 
    else
    {
      return(x)
    }
  }
  
  labels=(sapply(
    levels(go_enrich_df$Description)[as.numeric(go_enrich_df$Description)],
    shorten_names))
  names(labels) = rev(1:nrow(go_enrich_df))
  
  ## colors for bar // green, blue, orange
  CPCOLS <- c("#8DA1CB", "#FD8D62", "#66C3A5")
  library(ggplot2)
  
  p2 <- ggplot(data=go_enrich_df, aes(x=number, y=GeneNumber, fill=type)) +
    geom_bar(stat="identity", width=0.8) + coord_flip() + 
    scale_fill_manual(values = CPCOLS) + theme_test() + 
    scale_x_discrete(labels=go_enrich_df$Description) +
    xlab("GO term") + 
    theme(axis.text=element_text(face = "bold", color="gray50")) +
    labs(title = "The Most Enriched GO Terms")
  p2

filename1 <- paste("select_yellow.csv",sep=".")
 
  
  write.csv(go1,filename1)
  filename2 <- paste("select_yellow", "png", sep = ".")
  filename3 <- paste("select_yellow", "png", sep = ".")
  ggsave(filename2, p2, width = 10, height = 8)
  ggsave(filename3, p1, width = 10, height = 8)

八、感興趣的模塊繪制網(wǎng)絡(luò)圖

##所有模塊結(jié)果輸出
#輸出各模塊子網(wǎng)絡(luò)的邊列表和節(jié)點(diǎn)列表,后續(xù)可導(dǎo)入 cytoscape 繪制網(wǎng)絡(luò)圖
#其中,邊的權(quán)重為 TOM 矩陣中的值,記錄了基因間共表達(dá)相似性
dir.create('cytoscape', recursive = TRUE)
 
for (mod in 1:nrow(table(mergedColors))) {
    modules <- names(table(mergedColors))[mod]
    probes <- colnames(gene)
    inModule <- (mergedColors == modules)
    modProbes <- probes[inModule]
    modGenes <- modProbes
    modtom_sim <- tom_sim[inModule, inModule]
    dimnames(modtom_sim) <- list(modProbes, modProbes)
    outEdge <- paste('cytoscape/', modules, '.edge_list.txt',sep = '')
    outNode <- paste('cytoscape/', modules, '.node_list.txt', sep = '')
    exportNetworkToCytoscape(modtom_sim,
        edgeFile = outEdge,
        nodeFile = outNode,
        weighted = TRUE,
        threshold = 0.3,  #該參數(shù)可控制輸出的邊數(shù)量,過(guò)濾低權(quán)重的邊
        nodeNames = modProbes,
        altNodeNames = modGenes,
        nodeAttr = mergedColors[inModule])
}



參考教程:

加權(quán)基因共表達(dá)網(wǎng)絡(luò)分析(WGCNA) - pome24的個(gè)人空間 - OSCHINA - 中文開(kāi)源技術(shù)交流社區(qū)

WGCNA(二)WGCNA的步驟和原理 - 簡(jiǎn)書(shū) (jianshu.com)
soft軟閾值設(shè)定的原理

(領(lǐng)接矩陣) 鄰接矩陣diviner_s的博客-CSDN博客鄰接矩陣
(23條消息) 鄰接矩陣與關(guān)聯(lián)矩陣的轉(zhuǎn)換及實(shí)現(xiàn)hukai7190的博客-CSDN博客鄰接矩陣轉(zhuǎn)關(guān)聯(lián)矩陣

強(qiáng)烈推薦做WGCNA把下邊這個(gè)博文看一遍

WGCNA原理簡(jiǎn)介 - 簡(jiǎn)書(shū) (jianshu.com)

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容