上一期我們使用了Guitar包對(duì)Peak結(jié)果進(jìn)行可視化,今天展示另一種可視化方法:使用metaPlotR包。
這個(gè)包將一些bash命令以及位置處理信息封裝在了perl腳本中,然后使用R進(jìn)行了可視化。下載地址:https://github.com/olarerin/metaPlotR。
下載下來(lái)后解壓然后利用其中的腳本就可以了。
這一期如果linux基礎(chǔ)不太好的不要輕易嘗試這個(gè)方法,可以先穩(wěn)固一下基礎(chǔ)再來(lái)
一、數(shù)據(jù)準(zhǔn)備Peak結(jié)果
Peak結(jié)果bed格式,為bed6,并且需要進(jìn)行排序。
排序命令:
# 創(chuàng)建文件夾
mkdir metaPlotR
# 得到bed6并且排序
# -k1,1 表示只對(duì)第一列進(jìn)行排序
# -k2,2n 表示只對(duì)第二列按照數(shù)字進(jìn)行排序
# 先檢查命令
ls */Mod.bed |awk -F'/' '{print"cat "$0 " | cut -f 1-6 | sort -k1,1 -k2,2n >metaPlotR/" $1 ".sorted.bed"}'
cat KO1/Mod.bed | cut -f 1-6 | sort -k1,1 -k2,2n >metaPlotR/KO1.sorted.bed
cat KO2/Mod.bed | cut -f 1-6 | sort -k1,1 -k2,2n >metaPlotR/KO2.sorted.bed
cat KO3/Mod.bed | cut -f 1-6 | sort -k1,1 -k2,2n >metaPlotR/KO3.sorted.bed
cat WT1/Mod.bed | cut -f 1-6 | sort -k1,1 -k2,2n >metaPlotR/WT1.sorted.bed
cat WT2/Mod.bed | cut -f 1-6 | sort -k1,1 -k2,2n >metaPlotR/WT2.sorted.bed
cat WT3/Mod.bed | cut -f 1-6 | sort -k1,1 -k2,2n >metaPlotR/WT3.sorted.bed
# 執(zhí)行
ls */Mod.bed |awk -F'/' '{print"cat "$0 " | cut -f 1-6 | sort -k1,1 -k2,2n >metaPlotR/" $1 ".sorted.bed"}' |sh
二、參考基因組注釋文件
1)下載GRCm39 fa文件
前面我們使用的是ENSEMBL數(shù)據(jù)庫(kù)的GRCm39的參考基因組,去這里下載對(duì)應(yīng)UCSC的fa文件:http://hgdownload.soe.ucsc.edu/downloads.html
下載地址:http://hgdownload.soe.ucsc.edu/goldenPath/mm39/bigZips/mm39.chromFa.tar.gz
下載完后解壓到文件夾:chroms
2)制作GRCm39_annot.bed文件
首先去UCSC下載:the extended gene prediction tables
選擇參數(shù)如下:

?。?!多下載一次 兩次下載大小完全一致即可保證下載完整了。
關(guān)于這個(gè)地方,我們之前用的ENSEMBL的gtf文件,這個(gè)頁(yè)面用的GENCODE的,不必?fù)?dān)心這兩者的差異,看官網(wǎng)對(duì)于他們的注釋異同:
What is the difference between GENCODE and Ensembl annotation?
The GENCODE annotation is made by merging the Havana manual gene annotation and the Ensembl automated gene annotation. The GENCODE annotation is the default gene annotation displayed in the Ensembl browser. The GENCODE releases coincide with the Ensembl releases, although we can skip an Ensembl release is there is no update to the annotation with respect to the previous release. In practical terms, the GENCODE annotation is identical to the Ensembl annotation.
簡(jiǎn)而言之,這兩者就是一樣的。
創(chuàng)建轉(zhuǎn)錄組中每個(gè)核苷酸的主注釋文件
# 創(chuàng)建轉(zhuǎn)錄組中每個(gè)核苷酸的主注釋文件
# chroms/為剛剛解壓后的文件
perl make_annot_bed.pl --genomeDir chroms/ --genePred GRCm39_gencode.genePred > GRCm39_annot.bed
# 對(duì)基因bed文件進(jìn)行排序
sort -k1,1 -k2,2n GRCm39_annot.bed > GRCm39_annot.sorted.bed
sed -i 's/chr//g' GRCm39_annot.sorted.bed
# 生成轉(zhuǎn)錄區(qū)域(即5'UTR、CDS和3'UTR)的起始和結(jié)束位點(diǎn)的轉(zhuǎn)錄組坐標(biāo)的文件
# 它將排序后的主注釋文件作為輸入(GRCm39_annotation.sorted.bed),并輸出一個(gè)區(qū)域注釋文件。
# 區(qū)域注釋文件是確定查詢位點(diǎn)與轉(zhuǎn)錄組特征(即轉(zhuǎn)錄起始位點(diǎn)、起始密碼子、終止密碼子和轉(zhuǎn)錄結(jié)束)的距離所必需的。
# creates the transcript regions (i.e. 5’UTR, CDS and 3’UTR)
perl size_of_cds_utrs.pl --annot GRCm39_annot.sorted.bed > region_sizes.txt
對(duì)每個(gè)排序后的Peak的bed文件進(jìn)行注釋
# 注釋
ls metaPlotR/*bed |perl -ne 'chomp;/metaPlotR\/(.*)/;print"perl annotate_bed_file.pl --bed $_ --bed2 GRCm39_annot.sorted.bed >metaPlotR/annot_$1\n";' >anno1.sh
nohup sh anno1.sh >anno1.log &
# qsub anno1.sh
# anno1.sh內(nèi)容為下:
perl annotate_bed_file.pl --bed metaPlotR/KO1.sorted.bed --bed2 GRCm39_annot.sorted.bed >metaPlotR/annot_KO1.sorted.bed
perl annotate_bed_file.pl --bed metaPlotR/KO2.sorted.bed --bed2 GRCm39_annot.sorted.bed >metaPlotR/annot_KO2.sorted.bed
perl annotate_bed_file.pl --bed metaPlotR/KO3.sorted.bed --bed2 GRCm39_annot.sorted.bed >metaPlotR/annot_KO3.sorted.bed
perl annotate_bed_file.pl --bed metaPlotR/WT1.sorted.bed --bed2 GRCm39_annot.sorted.bed >metaPlotR/annot_WT1.sorted.bed
perl annotate_bed_file.pl --bed metaPlotR/WT2.sorted.bed --bed2 GRCm39_annot.sorted.bed >metaPlotR/annot_WT2.sorted.bed
perl annotate_bed_file.pl --bed metaPlotR/WT3.sorted.bed --bed2 GRCm39_annot.sorted.bed >metaPlotR/annot_WT3.sorted.bed
#運(yùn)行
# 識(shí)別用戶提供的位點(diǎn)所在的轉(zhuǎn)錄本區(qū)域,并將轉(zhuǎn)錄組坐標(biāo)轉(zhuǎn)換為元基因坐標(biāo)。
# 即,出現(xiàn)在5 ' utr中的位點(diǎn)的值從0到1,其中0和1分別代表5 ' utr的5 '和3 '末端。
# 類似地,CDS中的位點(diǎn)值從1到2,3 ' utr值從2到3。
# 腳本接受帶注釋的查詢文件annot_miclip.cims作為輸入。Bed和區(qū)域注釋文件utr_cds_ends.txt。
# 輸出的距離度量文件包含繪制元圖所需的所有值
ls metaPlotR/anno*bed |perl -ne 'chomp;/annot_(.*).sorted/;print"perl rel_and_abs_dist_calc.pl --bed $_ --regions region_sizes.txt > metaPlotR/$1.m6a.dist.measures.txt\n";' >anno2.sh
nohup sh anno2.sh >anno2.log &
# qsub anno2.sh
# anno2.sh的內(nèi)容為下:
perl rel_and_abs_dist_calc.pl --bed metaPlotR/annot_KO1.sorted.bed --regions region_sizes.txt > metaPlotR/KO1.m6a.dist.measures.txt
perl rel_and_abs_dist_calc.pl --bed metaPlotR/annot_KO2.sorted.bed --regions region_sizes.txt > metaPlotR/KO2.m6a.dist.measures.txt
perl rel_and_abs_dist_calc.pl --bed metaPlotR/annot_KO3.sorted.bed --regions region_sizes.txt > metaPlotR/KO3.m6a.dist.measures.txt
perl rel_and_abs_dist_calc.pl --bed metaPlotR/annot_WT1.sorted.bed --regions region_sizes.txt > metaPlotR/WT1.m6a.dist.measures.txt
perl rel_and_abs_dist_calc.pl --bed metaPlotR/annot_WT2.sorted.bed --regions region_sizes.txt > metaPlotR/WT2.m6a.dist.measures.txt
perl rel_and_abs_dist_calc.pl --bed metaPlotR/annot_WT3.sorted.bed --regions region_sizes.txt > metaPlotR/WT3.m6a.dist.measures.txt
至此,每個(gè)樣本的m6A Peak位置就已經(jīng)轉(zhuǎn)化為轉(zhuǎn)錄本上的坐標(biāo)以及對(duì)應(yīng)的Peak Number了:*.m6a.dist.measures.txt
tree metaPlotR
metaPlotR/
├── annot_KO1.sorted.bed
├── annot_KO2.sorted.bed
├── annot_KO3.sorted.bed
├── annot_WT1.sorted.bed
├── annot_WT2.sorted.bed
├── annot_WT3.sorted.bed
├── KO1.m6a.dist.measures.txt
├── KO1.sorted.bed
├── KO2.m6a.dist.measures.txt
├── KO2.sorted.bed
├── KO3.m6a.dist.measures.txt
├── KO3.sorted.bed
├── WT1.m6a.dist.measures.txt
├── WT1.sorted.bed
├── WT2.m6a.dist.measures.txt
├── WT2.sorted.bed
├── WT3.m6a.dist.measures.txt
└── WT3.sorted.bed
三、接下來(lái)是使用R語(yǔ)言進(jìn)行可視化操作部分。
R傳參腳本如下:metaPlot.R
rm(list=ls())
options(stringsAsFactors = F)
library(getopt)
spec <- matrix(c(
'help', 'h', 0, "logical",
'dist', 'd', 1, "character",
'name', 'n', 1, "character",
'od', 'o', 1, "character"),
byrow = TRUE, ncol = 4)
opt <- getopt(spec)
## usages
print_usage <- function(spec=NULL){
cat("
Usage example:
Rscript metaPlot.R --dist m6a.dist.measures.txt --name m6a --od ./
Options:
--help -h NULL get this help
--dist character m6a.dist.measures.txt [forced]
--name character sample name
--od character outdir [forced]
\n")
q(status=1)
}
## default paramter setting
if (!is.null(opt$help) |is.null(opt$dist) |is.null(opt$od)) { print_usage(spec) }
if (!file.exists(opt$od)) dir.create(opt$od)
# 加載包
library(ggplot2)
library(scales)
# 讀取數(shù)據(jù)
m6a.dist <- read.delim (opt$dist, header = T)
dim(m6a.dist)
head(m6a.dist)
# Determine longest length transcript for each gene
trx_len <- m6a.dist$utr5_size + m6a.dist$cds_size + m6a.dist$utr3_size
temp <- data.frame(m6a.dist$gene_name, m6a.dist$refseqID, trx_len)
colnames(temp) <- c("gene_name", "gid", "trx_len")
temp.df <- temp[order(temp$gene_name, temp$gid, -temp$trx_len),]
temp.df <- temp[!duplicated(temp$gene_name),]
# limit m6a data to one transcript per gene (longest)
m6a.dist <- m6a.dist[m6a.dist$refseqID %in% temp.df$gid,]
# View size of our dataset (rows, columns)
dim(m6a.dist)
# re-scale the widths of the 5'UTR and 3'UTR relative to the CDS
utr5.SF <- median(m6a.dist$utr5_size, na.rm = T)/median(m6a.dist$cds_size, na.rm = T)
utr3.SF <- median(m6a.dist$utr3_size, na.rm = T)/median(m6a.dist$cds_size, na.rm = T)
# assign the regions to new dataframes
utr5.m6a.dist <- m6a.dist[m6a.dist$rel_location < 1, ]
cds.m6a.dist <- m6a.dist [m6a.dist$rel_location < 2 & m6a.dist$rel_location >= 1, ]
utr3.m6a.dist <- m6a.dist[m6a.dist$rel_location >= 2, ]
# rescale 5'UTR and 3'UTR
utr5.m6a.dist$rel_location <- rescale(utr5.m6a.dist$rel_location, to = c(1-utr5.SF, 1), from = c(0,1))
utr3.m6a.dist$rel_location <- rescale(utr3.m6a.dist$rel_location, to = c(2, 2+utr3.SF), from = c(2,3))
m6a.metagene.coord <- c(utr5.m6a.dist$rel_location, cds.m6a.dist$rel_location, utr3.m6a.dist$rel_location)
df <- data.frame(m6a.metagene.coord = m6a.metagene.coord)
##---------------- Visualizing the metagene
# A smooth density plot
setwd(opt$od)
p <- ggplot(df,aes(x =m6a.metagene.coord)) + geom_density(aes(colour="red")) + geom_vline(xintercept = 1:2, col = "grey") +
theme_bw() + xlim(0, 3) + theme(legend.position="none")
p
png(filename = paste0(opt$name,".png"),width = 800,height = 600,res=200)
print(p)
dev.off()
write.table(m6a.metagene.coord, file = paste0(opt$name,"_m6a.metagene.coord.txt"),row.names = F,sep = "\t",quote = F)
運(yùn)行:
ls metaPlotR/*m6a.dist.measures.txt |perl -ne 'chomp;/metaPlotR\/(.*).m6a.dist.measures/;print"Rscript metaPlot.R --dist $_ --name $1 --od metaPlotR/\n";' >m6a_plot.sh
#sh腳本如下:
Rscript metaPlot.R --dist metaPlotR/KO1.m6a.dist.measures.txt --name KO1 --od metaPlotR/
Rscript metaPlot.R --dist metaPlotR/KO2.m6a.dist.measures.txt --name KO2 --od metaPlotR/
Rscript metaPlot.R --dist metaPlotR/KO3.m6a.dist.measures.txt --name KO3 --od metaPlotR/
Rscript metaPlot.R --dist metaPlotR/WT1.m6a.dist.measures.txt --name WT1 --od metaPlotR/
Rscript metaPlot.R --dist metaPlotR/WT2.m6a.dist.measures.txt --name WT2 --od metaPlotR/
Rscript metaPlot.R --dist metaPlotR/WT3.m6a.dist.measures.txt --name WT3 --od metaPlotR/
# 運(yùn)行
nohup sh m6a_plot.sh >m6a_plot.sh.log &
#qsub m6a_plot.sh
可視化圖中:0 to 1:表示5'UTR;1 to 2:表示CDS;2 to 3:表示3'UTR
貼上兩個(gè)樣本的結(jié)果如下:


也可以將多個(gè)樣本繪制在一起:這里選取兩個(gè)樣本示例,用到上一步繪圖的輸出結(jié)果*_m6a.metagene.coord.txt
library(ggplot2)
## ---------------------------Comparing multiple metagene plots
# Read in metagene coordinates
KO1.metagene.coord <- t(read.delim ("KO1_m6a.metagene.coord.txt", header = T))
WT1.metagene.coord <- t(read.delim ("WT1_m6a.metagene.coord.txt", header = T))
metagene.cord <- c(KO1.metagene.coord, WT1.metagene.coord)
mod <- c(rep("KO1", length(KO1.metagene.coord)),
rep("WT1", length(WT1.metagene.coord)))
df <- data.frame(metagene.cord, mod)
ggplot(df) + geom_density(aes(x = metagene.cord, colour = mod)) + xlim(0, 3) +
theme_bw() + geom_vline(xintercept = 1:2, col = "grey")

可以看到KO樣本與WT樣本上m6A修飾水平的差異。
這個(gè)繪圖結(jié)果是基于ggplot2的,大家可以自行對(duì)此圖進(jìn)行加工修飾,然后就可以達(dá)到發(fā)表文章級(jí)別的美圖啦!
我們下期再見~