參考文獻(xiàn)
High-resolution spatiotemporal transcriptome mapping of tomato fruit development and ripening

關(guān)于文章中WGCNA方法參數(shù)


文章中參數(shù)主要包括一下內(nèi)容
① 轉(zhuǎn)錄組數(shù)據(jù)基因輸入?yún)?shù)
選取變異系數(shù)CV值大于1的基因作為輸入數(shù)據(jù)進(jìn)行分析
② 計(jì)算軟閾值參數(shù)
設(shè)定的軟閾值soft power為16,TOM結(jié)構(gòu)類型為signed,也是推薦的結(jié)構(gòu)類型。
③ 設(shè)定最小模塊基因數(shù)目參數(shù)
最小模塊基因數(shù)目為30,模塊設(shè)定的多少,決定分出來(lái)模塊的多少
④ 相似模塊合并參數(shù)
前面設(shè)定最小模塊數(shù)目后,基因會(huì)分成不同的模塊,計(jì)算模塊之間的相關(guān)性后,相關(guān)性高的模塊進(jìn)行合并,文章中的參數(shù)設(shè)定為0.25,也就是說(shuō)相關(guān)性高于0.75的就合并在一起。
⑤ 模塊內(nèi)hub基因篩選參數(shù)
模塊內(nèi)基因的篩選主要從兩個(gè)參數(shù)進(jìn)行篩選,第一個(gè)參數(shù)是模塊內(nèi)的基因跟模塊的相關(guān)性高于0.9,然后利用FDR 相關(guān)性計(jì)算基因跟性狀的相關(guān)性。
大麥WGCNA分析流程
一、轉(zhuǎn)錄組數(shù)據(jù)輸入和質(zhì)控
##導(dǎo)入表達(dá)量TPM數(shù)據(jù)
rm(list=ls())
library(dplyr)
library(Mfuzz)
library(WGCNA)
library(impute)
library(preprocessCore)
library(MASS)
library(class)
library(cluster)
library(impute)
library(Hmisc)
library(matrixStats)
library(flashClust)
library(dplyr)
setwd("D:\\mywork\\mywork\\10.大麥轉(zhuǎn)錄組\\2022年大麥分析\\")
data <- read.csv("1.TPM_data_aver_use.csv",header=T,row.names = 1,check.names = F)
##表達(dá)量數(shù)據(jù)是三個(gè)重復(fù)進(jìn)行矯正過(guò)離群值后,離群值用剩余兩個(gè)重復(fù)求平均值后的最終TPM結(jié)果
##基因型數(shù)據(jù)預(yù)處理
##由于soft設(shè)定R2一直小于閾值0.8,查看問(wèn)題所在是
# 無(wú)向網(wǎng)絡(luò)在power小于15或有向網(wǎng)絡(luò)power小于30內(nèi),沒(méi)有一個(gè)power值可以使無(wú)標(biāo)度網(wǎng)絡(luò)圖譜結(jié)構(gòu)R^2達(dá)到0.8或平均連接度降到100以下,可能是由于部分樣品與其他樣品差別太大造成的。這可能由批次效應(yīng)、樣品異質(zhì)性或?qū)嶒?yàn)條件對(duì)表達(dá)影響太大等造成
datExprFemale2 <- log2(data+1)
cor_T <- cor(datExprFemale2)
##繪制熱圖
library(pheatmap)
pheatmap(cor_T,show_rownames = T,show_colnames = T,cluster_rows=T,cluster_cols = T)
##結(jié)果展示相關(guān)性好
###數(shù)據(jù)質(zhì)控
m.mad <- apply(datExprFemale2,1,mad)
dataExprVar <- datExprFemale2[which(m.mad >
max(quantile(m.mad, probs=seq(0, 1, 0.25))[2],0.01)),]
dim(dataExprVar)
###去除變異系數(shù)cv<1的基因
cal_cv=function(x){ # 自定義函數(shù) 標(biāo)準(zhǔn)差/平均值
y=na.omit(x)
return(sd(y)/mean(y))
}
m.cv <- apply(dataExprVar, 1, cal_cv)
dataExprVar2 <- dataExprVar[which(m.cv > 1),]
datExprFemale = as.data.frame(t(dataExprVar2 ))
datExprFemale_D_1 = as.data.frame(datExprFemale)
# ##檢驗(yàn)數(shù)據(jù)質(zhì)量
# # 樣本聚類檢查離群值(就是樹(shù)上特別不接近的
gsg = goodSamplesGenes(datExprFemale_D_1, verbose = 3)
head(gsg)
gsg$allOK
## 是true的話說(shuō)明所有g(shù)enes都通過(guò)了篩選
if (!gsg$allOK){
# Optionally, print the gene and sample names that were removed:
if (sum(!gsg$goodGenes)>0)
printFlush(paste("Removing genes:",
paste(names(datExprFemale_D_1)[!gsg$goodGenes], collapse = ",")));
if (sum(!gsg$goodSamples)>0)
printFlush(paste("Removing samples:",
paste(rownames(datExprFemale_D_1)[!gsg$goodSamples], collapse = ",")));
# Remove the offending genes and samples from the data:
datExprFemale_D_1 =datExprFemale_D_1[gsg$goodSamples, gsg$goodGenes]
}
## 畫(huà)圖來(lái)看是否有特別離群的
dim(datExprFemale_D_1)
##共有5210個(gè)基因進(jìn)行WGCNA分析
sampleTree = hclust(dist(datExprFemale_D_1), method = "average")
##設(shè)置畫(huà)板大小
sizeGrWindow(12, 9)
par(cex=0.6)
par(mar=c(0,4,2,0))
plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="")
##沒(méi)有發(fā)現(xiàn)離群值,不需要剔除樣本
二、設(shè)定軟閾值

設(shè)定軟閾值,
# 清理,釋放內(nèi)存
collectGarbage()
gene <- datExprFemale_D_1
powers = c(c(1:10), seq(from = 12, to=20, by=2))
type = "unsigned" ##自己試驗(yàn)設(shè)計(jì)的按照unsigned
sft <- pickSoftThreshold(gene, powerVector = powers, verbose=5 )
# 推薦值。如果是NA,就需要畫(huà)圖來(lái)自己挑選
setwd("./7.WGCNA_result")
pdf("soft.pdf",width = 15, height = 8)
par(mfrow = c(1, 2))
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2], type = 'n',
xlab = 'Soft Threshold (power)', ylab = 'Scale Free Topology Model Fit,signed R^2',
main = paste('Scale independence'))
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
labels = powers, col = 'red');
abline(h = 0.8, col = 'red')
#平均連通性與 power 值散點(diǎn)圖
plot(sft$fitIndices[,1], sft$fitIndices[,5],
xlab = 'Soft Threshold (power)', ylab = 'Mean Connectivity', type = 'n',
main = paste('Mean connectivity'))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels = powers, col = 'red')
dev.off()
##構(gòu)建網(wǎng)絡(luò)
#上一步估計(jì)的最佳 power 值
write.csv(sft,"1.sft.csv")
sft$powerEstimate
###推薦的軟閾值設(shè)定為9

三、構(gòu)建TOM矩陣
獲得鄰接矩陣:
softPower <- sft$powerEstimate
adjacency = adjacency(gene, power = softPower)
# 將臨近矩陣轉(zhuǎn)為 Tom 矩陣
TOM = TOMsimilarity(adjacency)
# 計(jì)算基因之間的相異度
dissTOM = 1-TOM
hierTOM = hclust(as.dist(dissTOM),method="average")
ADJ1_cor <- abs(WGCNA::cor( gene,use = "p" ))^softPower
# 基因少(<5000)的時(shí)候使用下面的代碼:
k <- as.vector(apply(ADJ1_cor,2,sum,na.rm=T))
# 基因多的時(shí)候使用下面的代碼:
k <- softConnectivity(datE=gene,power=softPower)
sizeGrWindow(10, 5)
par(mfrow=c(1,2))
hist(k)
scaleFreePlot(k,main="Check Scale free topology\n")
# 可以看出k與p(k)成負(fù)相關(guān)(相關(guān)性系數(shù)0.84),說(shuō)明選擇的β值能夠建立基因無(wú)尺度網(wǎng)絡(luò)。
# 使用相異度來(lái)聚類為gene tree(聚類樹(shù)):
geneTree = hclust(as.dist(dissTOM), method = "average")
# Plot the resulting clustering tree (dendrogram)
windows()
sizeGrWindow(12,9)
plot(geneTree, xlab="", sub="", main = "Gene clustering on TOM-based dissimilarity",
labels = FALSE, hang = 0.04)
# 使用動(dòng)態(tài)剪切樹(shù)挖掘模塊:
minModuleSize = 50;
# 動(dòng)態(tài)切割樹(shù):
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM,
deepSplit = 2, pamRespectsDendro = FALSE,
minClusterSize = minModuleSize);
table(dynamicMods)
dynamicColors <- labels2colors(dynamicMods)
table(dynamicColors)
pdf("Gene dendrogram and module colors.pdf")
#模塊顏色指代
dynamicColors <- labels2colors(dynamicMods)
table(dynamicColors)
plotDendroAndColors(geneTree, dynamicColors, 'Dynamic Tree Cut',
dendroLabels = FALSE, addGuide = TRUE, hang = 0.03, guideHang = 0.05,
main = 'Gene dendrogram and module colors')
dev.off()

四、隨機(jī)挑選400個(gè)基因查看TOM結(jié)構(gòu)
# 拓?fù)錈釄D:
nSelect = 400
# For reproducibility, we set the random seed
set.seed(10);
nGenes
select = sample(5210, size = nSelect);
selectTOM = dissTOM[select, select];
# There's no simple way of restricting a clustering tree to a subset of genes, so we must re-cluster.
selectTree = hclust(as.dist(selectTOM), method = "average")
selectColors = dynamicColors[select];
# Open a graphical window
sizeGrWindow(9,9)
# Taking the dissimilarity to a power, say 10, makes the plot more informative by effectively changing
# the color palette; setting the diagonal to NA also improves the clarity of the plot
plotDiss = selectTOM^softPower;
diag(plotDiss) = NA;
TOMplot(plotDiss,
selectTree,
selectColors,
main = "Network heatmap plot, selected genes")
5、計(jì)算根據(jù)模塊特征向量基因計(jì)算模塊相異度
MEList = moduleEigengenes(gene, colors = dynamicColors)
MEs = MEList$eigengenes
# 計(jì)算根據(jù)模塊特征向量基因計(jì)算模塊相異度:
MEDiss = 1-cor(MEs);
# Cluster module eigengenes
METree = hclust(as.dist(MEDiss), method = "average")
# Plot the result
plotEigengeneNetworks(MEs,
"Eigengene adjacency heatmap",
marHeatmap = c(3,4,2,2),
plotDendrograms = FALSE,
xLabelsAngle = 90)
plot(METree,
main = "Clustering of module eigengenes",
xlab = "",
sub = "")
# 在聚類圖中畫(huà)出剪切線
abline(h=MEDissThres, col = "red")
MEDissThres = 0.2
# 在聚類圖中畫(huà)出剪切線
abline(h=MEDissThres, col = "red")
# 合并模塊:
merge_modules = mergeCloseModules(gene, dynamicColors, cutHeight = MEDissThres, verbose = 3)
# 合并后的顏色:
mergedColors = merge_modules$colors;
# 新模塊的特征向量基因:
mergedMEs = merge_modules$newMEs;
plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors),
c("Dynamic Tree Cut", "Merged dynamic"),
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05)
probes = colnames(gene)
dimnames(TOM) <- list(probes, probes)
六、計(jì)算表型和模塊的相關(guān)性
trait_data <- read.table("1.trait_input.txt",header=T,sep="\t",row.names = 1)
trait <- trait_data
#使用上一步新組合的共表達(dá)模塊的結(jié)果
module <- mergedMEs
#共表達(dá)模塊和表型的相關(guān)性分析
moduleTraitCor <- cor(module, trait, use = 'p')
moduleTraitCor[1:6,1:3] #相關(guān)矩陣
#相關(guān)系數(shù)的 p 值矩陣
moduleTraitPvalue <- corPvalueStudent(moduleTraitCor, nrow(module))
#輸出相關(guān)系數(shù)矩陣或 p 值矩陣
write.table(moduleTraitCor, 'moduleTraitCor.txt', sep = '\t', col.names = NA, quote = FALSE)
write.table(moduleTraitPvalue, 'moduleTraitPvalue.txt', sep = '\t', col.names = NA, quote = FALSE)
#相關(guān)圖繪制
textMatrix <- paste(signif(moduleTraitCor, 2), '\n(', signif(moduleTraitPvalue, 1), ')', sep = '')
dim(textMatrix) <- dim(moduleTraitCor)
pdf("Module-trait relationships.pdf")
par(mar = c(5, 10, 1,1))
labeledHeatmap(Matrix = moduleTraitCor, main = paste('Module-trait relationships'),
xLabels = names(trait), yLabels = names(module), ySymbols = names(module),
colorLabels = FALSE, colors = blueWhiteRed(50), cex.text = 1, zlim = c(-1,1),
textMatrix = textMatrix, setStdMargins = FALSE)
dev.off()
gene_module <- data.frame(gene_name = colnames(gene), module = mergedColors, stringsAsFactors = FALSE)
head(gene_module)
write.csv(gene_module,"gene_module.csv")
七、感興趣模塊基因分析
#提取模塊內(nèi)的基因名稱
gene_module_select <- subset(gene_module, module == 'yellow')$gene_name
#“black”模塊內(nèi)基因在各樣本中的表達(dá)值矩陣(基因表達(dá)值矩陣的一個(gè)子集)
gene_select <- t(gene[,gene_module_select])
#“black”模塊內(nèi)基因的共表達(dá)相似度(在 TOM 矩陣中提取子集)
tom_select <- TOM[gene_module_select,gene_module_select]
#輸出
write.table(gene_select, 'blue_gene_select.txt', sep = '\t', col.names = NA, quote = FALSE)
write.table(tom_select, 'blue_tom_select.txt', sep = '\t', col.names = NA, quote = FALSE)
##重要基因的獲得,以與 TNM 顯著相關(guān)的 black 模塊為例
#選擇 black 模塊內(nèi)的基因
gene_yellow <- gene[ ,gene_module_select]
#基因的模塊成員度(module membership)計(jì)算
#即各基因表達(dá)值與相應(yīng)模塊特征基因的相關(guān)性,其衡量了基因在全局網(wǎng)絡(luò)中的位置
#基因的模塊成員度(module membership)計(jì)算
#即各基因表達(dá)值與相應(yīng)模塊特征基因的相關(guān)性,其衡量了基因在全局網(wǎng)絡(luò)中的位置
geneModuleMembership <- signedKME(gene_black, module['MEyellow'], outputColumnName = 'MM')
MMPvalue <- as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nrow(module)))
#各基因表達(dá)值與臨床表型的相關(guān)性分析
geneTraitSignificance <- as.data.frame(cor(gene_black, trait['spike_weight'], use = 'p'))
GSPvalue <- as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance), nrow(trait)))
##選擇顯著(p<0.01)、高 black 模塊成員度(MM>=0.8),與 TNM 表型高度相關(guān)(r>=0.8)的基因
geneModuleMembership[geneModuleMembership<0.8 | MMPvalue>0.01] <- 0
geneTraitSignificance[geneTraitSignificance<0.8 | MMPvalue>0.01] <- 0
select <- cbind(geneModuleMembership, geneTraitSignificance)
select <- subset(select, geneModuleMembership>=0.8 & geneTraitSignificance>=0.8)
head(select)
write.csv(select,"2.yellow_select基因模塊相關(guān)性和表型相關(guān)性都高于0.8.csv")
###對(duì)模塊內(nèi)選擇的基因進(jìn)行GO分析
library(clusterProfiler)
library(enrichplot)
library(ggplot2)
library(stringr)
#######重新嘗試進(jìn)行GO和KEGG提取
#讀取數(shù)據(jù)
term2gene <- read.csv("D:\\mywork\\mywork\\13.申大麥轉(zhuǎn)錄組分析_VRS突變體整穗\\2020.12.24_重新計(jì)算Bowman和突變體的轉(zhuǎn)錄組DESeq2和GO結(jié)果等\\Bowman_V1a\\GO結(jié)果\\term2gene.csv",header = T,row.names = 1)
head(term2gene)
go2term <- read.csv("D:\\mywork\\mywork\\13.申大麥轉(zhuǎn)錄組分析_VRS突變體整穗\\2020.12.24_重新計(jì)算Bowman和突變體的轉(zhuǎn)錄組DESeq2和GO結(jié)果等\\Bowman_V1a\\GO結(jié)果\\go2term.csv",header = T,row.names = 1)
go2ont <- read.csv("D:\\mywork\\mywork\\13.申大麥轉(zhuǎn)錄組分析_VRS突變體整穗\\2020.12.24_重新計(jì)算Bowman和突變體的轉(zhuǎn)錄組DESeq2和GO結(jié)果等\\Bowman_V1a\\GO結(jié)果\\go2ont.csv",header = T,row.names = 1)
df <- enricher(gene = row.names(select), TERM2GENE = term2gene, TERM2NAME = go2term, pvalueCutoff =1, qvalueCutoff = 1)
dim(df)
head(df)
p1 <- dotplot(df, showCategory=30)+scale_y_discrete(labels=function(x) str_wrap(x, width=50))
go<-as.data.frame(df)
names(go)
go<-as.data.frame(df)
names(go)
# go <- read.csv("D:\\mywork\\mywork\\申大麥轉(zhuǎn)錄組分析\\UPset圖\\BOWMAN比較\\GO分析\\18.V1a_V5_V4K_V3int.GO_result.csv.csv",header=T,row.names=1)
# go<-as.data.frame(go)
# names(go)
colnames(go) <- c("go_id","Description" ,"GeneRatio" , "BgRatio" , "pvalue" , "p.adjust" , "qvalue" , "geneID","Count")
go1 <- merge(go,go2ont,by="go_id")
go1 <- go1[order(-go1$Count),]
go_MF<-go1[go1$Ontology=="MF",][1:10,]
go_CC<-go1[go1$Ontology=="CC",][1:10,]
go_BP<-go1[go1$Ontology=="BP",][1:10,]
go_enrich_df<-data.frame(ID=c(go_BP$go_id, go_CC$go_id, go_MF$go_id),
Description=c(go_BP$Description, go_CC$Description, go_MF$Description),
GeneNumber=c(go_BP$Count, go_CC$Count, go_MF$Count),
type=factor(c(rep("biological process", 10), rep("cellular component", 10),rep("molecular function",10)),levels=c("molecular function", "cellular component", "biological process")))
# go_enrich_df <- rbind(go_BP,go_CC,go_MF)
# go_enrich_df <-go_enrich_df [,c(1,2,9,10)]
# go_enrich_df$type <-c(rep("biological process", 10), rep("cellular component", 10),rep("molecular function",10))
# ## numbers as data on x axis
go_enrich_df$number <- factor(rev(1:nrow(go_enrich_df)))
## shorten the names of GO terms
shorten_names <- function(x, n_word=4, n_char=40){
if (length(strsplit(x, " ")[[1]]) > n_word || (nchar(x) > 40))
{
if (nchar(x) > 40) x <- substr(x, 1, 40)
x <- paste(paste(strsplit(x, " ")[[1]][1:min(length(strsplit(x," ")[[1]]), n_word)],
collapse=" "),"...", sep="")
return(x)
}
else
{
return(x)
}
}
labels=(sapply(
levels(go_enrich_df$Description)[as.numeric(go_enrich_df$Description)],
shorten_names))
names(labels) = rev(1:nrow(go_enrich_df))
## colors for bar // green, blue, orange
CPCOLS <- c("#8DA1CB", "#FD8D62", "#66C3A5")
library(ggplot2)
p2 <- ggplot(data=go_enrich_df, aes(x=number, y=GeneNumber, fill=type)) +
geom_bar(stat="identity", width=0.8) + coord_flip() +
scale_fill_manual(values = CPCOLS) + theme_test() +
scale_x_discrete(labels=go_enrich_df$Description) +
xlab("GO term") +
theme(axis.text=element_text(face = "bold", color="gray50")) +
labs(title = "The Most Enriched GO Terms")
p2
filename1 <- paste("select_yellow.csv",sep=".")
write.csv(go1,filename1)
filename2 <- paste("select_yellow", "png", sep = ".")
filename3 <- paste("select_yellow", "png", sep = ".")
ggsave(filename2, p2, width = 10, height = 8)
ggsave(filename3, p1, width = 10, height = 8)
八、感興趣的模塊繪制網(wǎng)絡(luò)圖
##所有模塊結(jié)果輸出
#輸出各模塊子網(wǎng)絡(luò)的邊列表和節(jié)點(diǎn)列表,后續(xù)可導(dǎo)入 cytoscape 繪制網(wǎng)絡(luò)圖
#其中,邊的權(quán)重為 TOM 矩陣中的值,記錄了基因間共表達(dá)相似性
dir.create('cytoscape', recursive = TRUE)
for (mod in 1:nrow(table(mergedColors))) {
modules <- names(table(mergedColors))[mod]
probes <- colnames(gene)
inModule <- (mergedColors == modules)
modProbes <- probes[inModule]
modGenes <- modProbes
modtom_sim <- tom_sim[inModule, inModule]
dimnames(modtom_sim) <- list(modProbes, modProbes)
outEdge <- paste('cytoscape/', modules, '.edge_list.txt',sep = '')
outNode <- paste('cytoscape/', modules, '.node_list.txt', sep = '')
exportNetworkToCytoscape(modtom_sim,
edgeFile = outEdge,
nodeFile = outNode,
weighted = TRUE,
threshold = 0.3, #該參數(shù)可控制輸出的邊數(shù)量,過(guò)濾低權(quán)重的邊
nodeNames = modProbes,
altNodeNames = modGenes,
nodeAttr = mergedColors[inModule])
}
參考教程:
加權(quán)基因共表達(dá)網(wǎng)絡(luò)分析(WGCNA) - pome24的個(gè)人空間 - OSCHINA - 中文開(kāi)源技術(shù)交流社區(qū)
WGCNA(二)WGCNA的步驟和原理 - 簡(jiǎn)書(shū) (jianshu.com)
soft軟閾值設(shè)定的原理
(領(lǐng)接矩陣) 鄰接矩陣diviner_s的博客-CSDN博客鄰接矩陣
(23條消息) 鄰接矩陣與關(guān)聯(lián)矩陣的轉(zhuǎn)換及實(shí)現(xiàn)hukai7190的博客-CSDN博客鄰接矩陣轉(zhuǎn)關(guān)聯(lián)矩陣