這是ggplot2可視化專題的第三個(gè)實(shí)例操作

【ggplot2的基本思路見前文總論】：基于ggplot2的RNA-seq轉(zhuǎn)錄組可視化：總述和分文目錄

【ggplot2繪圖具體策略第一篇】：測序結(jié)果概覽：基因表達(dá)量rank瀑布圖，高密度表達(dá)相關(guān)性散點(diǎn)圖，注釋特定基因及errorbar的表達(dá)相關(guān)性散點(diǎn)圖繪制

【ggplot2繪圖具體策略第二篇】：單/多個(gè)基因在組間同時(shí)展示的多種選擇：分組小提琴圖、分組/分面柱狀圖、單基因蜂群點(diǎn)圖拼圖繪制

有的時(shí)候我們需要對(duì)配對(duì)樣品進(jìn)行某些操作或事件前后的宏觀比較，以及各樣本單獨(dú)的變化趨勢（如進(jìn)行治療前后樣本的某指標(biāo)變化，或給藥處理前后平行樣本間某基因的變化趨勢，或同一病人不同組織樣本間某基因的表達(dá)情況展示），這時(shí)候就可以用到連線散點(diǎn)圖了。找到了一篇文章利用它進(jìn)行了數(shù)據(jù)闡釋：

配對(duì)散點(diǎn)連線圖

該文章使用了多個(gè)GEO數(shù)據(jù)展示化療前后患者的免疫相關(guān)預(yù)后評(píng)分，不僅展示了組間在化療前后的總體改變趨勢，配對(duì)散點(diǎn)間的連線也展示了個(gè)體在化療前后的評(píng)分反應(yīng)。

致謝文章：Wang, S., Q. Zhang, C. Yu, Y. Cao, Y. Zuo and L. Yang (2020). "Immune cell infiltration-based signature for prognosis and immunogenomic analysis in breast cancer." Brief Bioinform.

下面就嘗試來用前文介紹的TCGA白人LUSC肺鱗癌mRNA-seq公共數(shù)據(jù)復(fù)制一下圖片大致的形式。因?yàn)槭褂肨CGA數(shù)據(jù)，下載的樣本點(diǎn)比較多，為了減少散點(diǎn)重合的情況，接下來介紹使用散點(diǎn)的jitter形式，即散點(diǎn)在橫向也展開一定距離，圖片的可視化效果更好。

數(shù)據(jù)獲取

基于第一篇文章從TCGA數(shù)據(jù)庫下載并整合清洗高通量腫瘤表達(dá)譜-臨床性狀數(shù)據(jù)，我們下載并清洗TCGA數(shù)據(jù)庫中white人種的LUSC肺鱗癌mRNA-seq轉(zhuǎn)錄組counts數(shù)據(jù)和FPKM數(shù)據(jù)。

隨后根據(jù)第二篇文章TCGA數(shù)據(jù)整合后進(jìn)行DESeq2差異表達(dá)分析和基于R的多種可視化對(duì)counts數(shù)據(jù)進(jìn)行了基于DESeq2的差異分析。

現(xiàn)在假設(shè)我們已經(jīng)獲得：
（樣本1到344為cancer，345到386為normal）
（1）resSigAll: the subset object generated by DESeq2 containing info about all differentially expressed genes.
（2）condition_table: the data frame defining the sample_type and recording the TCGA_IDs and submitter_id of each TCGA sample. It will be used for grouping.
（3）expr_vst: vst transformed normalized counts matrix of genes and samples generated by DESeq2. It is the raw material of downstream visualization.

需要R包：

ggplot2 (作圖)、DESeq2 (從差異分析對(duì)象中提取一些數(shù)據(jù))、customLayout和gridExtra（ggplot2的拼圖），和dplyr (進(jìn)行文字處理)。

install.packages('ggplot2')
install.packages('DESeq2')
install.packages('customLayout')
install.packages('dplyr')
install.packages('gridExtra')
library('ggplot2')
library('DESeq2')
library('customLayout')
library('dplyr')
library('gridExtra')

從所有樣本中僅選取配對(duì)樣本，并建立用于繪圖的數(shù)據(jù)表：

TCGA的正常樣本數(shù)據(jù)也是從部分患者的癌旁組織獲取，所以有癌旁組織就會(huì)有癌組織，而某些癌組織沒有相應(yīng)的癌旁組織。而TCGA樣本編號(hào)的規(guī)則此前也講過，配對(duì)的癌旁和癌組織樣本,TCGA ID前12位字符（也就是submitter ID）是完全相等的。

首先使用condition_table中的submitter_id列篩選匹配樣本到新的數(shù)據(jù)框中。然后用sample函數(shù)隨機(jī)選取四個(gè)差異表達(dá)基因在配對(duì)樣本中的vst transformed normalized counts值用于表達(dá)水平散點(diǎn)繪制（儲(chǔ)存在resSigAll@rownames中）。

對(duì)每個(gè)樣本加上sample_type列，和grouping_info列（也就是submitter_id，每個(gè)normal-cancer對(duì)間的submitter_id相等，用于連線）。將sample_type列因子化。（告訴ggplot2 x軸分組排列的順序）

paired_condition_table <- dplyr::filter(condition_table, submitter_id %in% condition_table$submitter_id[condition_table$sample_type=='normal'])
set.seed(2999)
paired_for_trend_vis <- t(expr_vst[sample(resSigAll@rownames, 4) ,paired_condition_table$TCGA_IDs])
paired_for_trend_vis <- data.frame(paired_for_trend_vis, 'grouping_info'=paired_condition_table$submitter_id,
                                   'sample_type'=paired_condition_table$sample_type, stringsAsFactors = F)
paired_for_trend_vis$sample_type <- factor(paired_for_trend_vis$sample_type, levels = c('normal','cancer'), ordered = T)

一個(gè)傻瓜但不完美的辦法：

直接使用ggplot2包中的geom_jitter函數(shù)，將散點(diǎn)橫向排開。這里我只可視化一個(gè)基因。
ggplot函數(shù)輸入新建立的數(shù)據(jù)表。geom_line函數(shù)繪制配對(duì)散點(diǎn)間的連線，補(bǔ)充group參數(shù)輸入組間配對(duì)，color參數(shù)確定連線顏色，alpha參數(shù)確定連線透明度。geom_jitter函數(shù)設(shè)定散點(diǎn)抖動(dòng)模式，補(bǔ)充color參數(shù)確定sample_type組間點(diǎn)的顏色，width確定橫向抖動(dòng)的寬度。theme確定背景分割線白色。

ggplot(paired_for_trend_vis,aes(x=sample_type,y=DLX5))+theme_bw()+
  geom_line(aes(group=grouping_info), color="black",alpha=0.4)+
  geom_jitter(aes(color=sample_type), width = 0.1)+
  scale_color_manual(values = c('blue','orange'))+
  theme(panel.grid = element_line(colour = 'white'))

得到圖片如下，可見盡管散點(diǎn)有了抖動(dòng)，但組間連線并未隨散點(diǎn)發(fā)生改變，仍舊是組間中線處互連。

an imperfect paired scatterplot

解決辦法：

首先我們依舊獲得配對(duì)數(shù)據(jù)并隨機(jī)選取4個(gè)基因表達(dá)值，整合成用于繪圖的數(shù)據(jù)框。

paired_condition_table <- dplyr::filter(condition_table, submitter_id %in% condition_table$submitter_id[condition_table$sample_type=='normal'])
set.seed(2999)
paired_for_trend_vis <- t(expr_vst[sample(resSigAll@rownames, 4) ,paired_condition_table$TCGA_IDs])
paired_for_trend_vis <- data.frame(paired_for_trend_vis, 'grouping_info'=paired_condition_table$submitter_id,
                                   'sample_type'=paired_condition_table$sample_type, stringsAsFactors = F)

隨后的思路是這樣的：相比于前面直接將sample_type作為因子型變量離散分組，這里我們將sample_type先轉(zhuǎn)化為連續(xù)數(shù)字變量（normal=1,cancer=2），然后用jitter函數(shù)微調(diào)各sample_type的連續(xù)數(shù)字值，使其在連續(xù)x軸上表現(xiàn)為抖動(dòng)。而同時(shí)保留原分類變量sample_type，用于點(diǎn)的分組上色。

paired_for_trend_vis$sample_type <- factor(paired_for_trend_vis$sample_type, levels = c('normal','cancer'), ordered = T)
paired_for_trend_vis <- data.frame(paired_for_trend_vis, 'group_jitter'=as.numeric(paired_for_trend_vis$sample_type))
paired_for_trend_vis$group_jitter <- jitter(paired_for_trend_vis$group_jitter, 0.4)

整理結(jié)束后得到的paired_for_trend_vis數(shù)據(jù)框格式是這樣的：

paired_for_trend_vis

這里依舊使用拼圖包c(diǎn)ustomLayout進(jìn)行g(shù)gplot2繪圖對(duì)象的拼圖，因此我們建立一個(gè)函數(shù)，輸入一個(gè)基因名可產(chǎn)生對(duì)應(yīng)繪圖對(duì)象。

注：x軸不再是分類sample_type，而是連續(xù)值group_jitter。然后使用 scale_x_continuous函數(shù)將將x坐標(biāo)軸上的1、2的刻度改為'normal'和'cancer'字樣。

draw_the_plot <- function(intended_gene){
  ggplot(paired_for_trend_vis,aes(x=group_jitter,y=get(intended_gene)))+
    geom_line(aes(group=grouping_info), colour='black',alpha=0.4)+
    geom_point(aes(colour=sample_type), size=1.5)+scale_color_manual(values = c('blue','orange'))+
    scale_x_continuous(breaks = c(1,2), labels=c('1'='normal','2'='cancer'), limits = c(0.7,2.3))+
    theme_bw()+theme(panel.grid = element_line(colour = 'white'), legend.position = 'none')+xlab('groups')+
    ylab(paste0('vst transformed counts of ', intended_gene))
}

然后用lay_new函數(shù)新建畫布，畫布中的圖片按matrix的數(shù)字排列：高1格，寬4格。使用lapply函數(shù)獲得含多個(gè)ggplot2繪圖對(duì)象的list作為lay_grid函數(shù)的輸入。

lay <- lay_new(mat=matrix(1:4, ncol = 4), widths=c(2,2,2,2),heights = 1)
multiple_jitters <- lapply(colnames(paired_for_trend_vis)[1:4], draw_the_plot)
lay_grid(grobs = multiple_jitters, lay = lay)

最終獲得的圖片輸出如下：

the collage containing 4 paired scatterplot showing trends of gene expressions

這樣就完美地解決了之前連線和抖動(dòng)點(diǎn)之間沒有橫向匹配上的尷尬~

喜歡的朋友就學(xué)起來，點(diǎn)個(gè)贊吧！

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

配對(duì)樣本基因表達(dá)趨勢：優(yōu)化前后的散點(diǎn)連線圖+拼圖繪制

配對(duì)樣本基因表達(dá)趨勢：優(yōu)化前后的散點(diǎn)連線圖+拼圖繪制

這是ggplot2可視化專題的第三個(gè)實(shí)例操作

數(shù)據(jù)獲取

需要R包：

從所有樣本中僅選取配對(duì)樣本，并建立用于繪圖的數(shù)據(jù)表：

一個(gè)傻瓜但不完美的辦法：

解決辦法：

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

配對(duì)樣本基因表達(dá)趨勢：優(yōu)化前后的散點(diǎn)連線圖+拼圖繪制

這是ggplot2可視化專題的第三個(gè)實(shí)例操作

數(shù)據(jù)獲取

需要R包：

從所有樣本中僅選取配對(duì)樣本，并建立用于繪圖的數(shù)據(jù)表：

一個(gè)傻瓜但不完美的辦法：

解決辦法：

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

從所有樣本中僅選取配對(duì)樣本，并建立用于繪圖的數(shù)據(jù)表：