孟佳課題組|m6A數(shù)據(jù)Peak識(shí)別軟件exomePeak2使用指北

exomePeak使用感受:

由于是R語(yǔ)言,因此真實(shí)項(xiàng)目數(shù)據(jù)跑起來(lái)耗時(shí)比較久;其次,peak calling與diff peak是兩個(gè)獨(dú)立的過(guò)程,結(jié)果會(huì)有一些不一致的地方。
隨著分析項(xiàng)目的增多,當(dāng)然可以曾加一些認(rèn)識(shí),上次的稿子中就有一些認(rèn)知是錯(cuò)誤的,僅供大家做參考。具體見(jiàn):
exomePeak使用過(guò)程疑問(wèn):http://www.itdecent.cn/p/42d4651295c1

后來(lái),作者又出了一個(gè)exomePeak2,并不是在原有的exomePeak基礎(chǔ)上進(jìn)行的更新。

主頁(yè):https://rdrr.io/github/ZhenWei10/exomePeak2/man/exomePeak2.html

exomePeak2與exomePeak最大的區(qū)別在于exomePeak2可以輸出中間結(jié)果,應(yīng)該還有其他不一樣的地方,來(lái)看看吧。

功能:exomePeak2使用bam文件進(jìn)行peak calling以及peak統(tǒng)計(jì),它整合了meRIP-seq數(shù)據(jù)分析的常規(guī)分析內(nèi)容:

  • 1.使用scanMeripBAM檢查BAM的index
  • 2.使用exomePeakCalling識(shí)別外顯子區(qū)域的被修飾的peaks
  • 3.normalizeGC計(jì)算GC偏倚
  • 4.glmM或glmDM構(gòu)建線性模型來(lái)計(jì)算差異位點(diǎn)
  • 5.exportResults輸出peak結(jié)果

看一下函數(shù)

exomePeak2(
  bam_ip = NULL,
  bam_input = NULL,
  bam_treated_ip = NULL,
  bam_treated_input = NULL,
  txdb = NULL,
  bsgenome = NULL,
  genome = NA,
  gff_dir = NULL,
  mod_annot = NULL,
  paired_end = FALSE,
  library_type = c("unstranded", "1st_strand", "2nd_strand"),
  fragment_length = 100,
  binding_length = 25,
  step_length = binding_length,
  peak_width = fragment_length/2,
  pc_count_cutoff = 5,
  bg_count_cutoff = 50,
  p_cutoff = 1e-05,
  p_adj_cutoff = NULL,
  log2FC_cutoff = 1,
  parallel = FALSE,
  background_method = c("all", "Gaussian_mixture", "m6Aseq_prior", "manual"),
  manual_background = NULL,
  correct_GC_bg = TRUE,
  qtnorm = FALSE,
  glm_type = c("DESeq2", "Poisson", "NB"),
  LFC_shrinkage = c("apeglm", "ashr", "Gaussian", "none"),
  export_results = TRUE,
  export_format = c("CSV", "BED", "RDS"),
  table_style = c("bed", "granges"),
  save_plot_GC = TRUE,
  save_plot_analysis = FALSE,
  save_plot_name = "",
  save_dir = "exomePeak2_output",
  peak_calling_mode = c("exon", "full_tx", "whole_genome")
)

與exomePeak相比

增了加雙端數(shù)據(jù)paired_end ,文庫(kù)類型library_type ,count閾值pc_count_cutoff ,bg_count_cutoff 等參數(shù)。還有一些輸出結(jié)果的更改。
沒(méi)有了那個(gè)要求在group中call peak一致性的參數(shù)。這個(gè)參數(shù)應(yīng)該是在分步計(jì)算里面。

在運(yùn)行之前,示例代碼使用USCS中的注釋文件,最好先安裝好那個(gè)注釋包

BiocManager::install("BSgenome.Hsapiens.UCSC.hg19")
library(BSgenome.Hsapiens.UCSC.hg19)

指定進(jìn)行peak calling的樣本,

rm(list = ls())
options(stringsAsFactors = F)
#BiocManager::install("exomePeak2")
library(exomePeak2)

# 設(shè)置隨機(jī)種子
set.seed(1)
GENE_ANNO_GTF = system.file("extdata", "example.gtf", package="exomePeak2")

f1 = system.file("extdata", "IP1.bam", package="exomePeak2")
f2 = system.file("extdata", "IP2.bam", package="exomePeak2")
f3 = system.file("extdata", "IP3.bam", package="exomePeak2")
f4 = system.file("extdata", "IP4.bam", package="exomePeak2")
IP_BAM = c(f1,f2,f3,f4)

f1 = system.file("extdata", "Input1.bam", package="exomePeak2")
f2 = system.file("extdata", "Input2.bam", package="exomePeak2")
f3 = system.file("extdata", "Input3.bam", package="exomePeak2")
INPUT_BAM = c(f1,f2,f3)

分步驟計(jì)算過(guò)程,大致分為7個(gè)步驟:

## Peak Calling and Visualization in Multiple Steps
#The exomePeak2 package can achieve peak calling and peak statistics calulation with multiple functions.

## 1. 檢查bam文件
MeRIP_Seq_Alignment <- scanMeripBAM(
                          bam_ip = IP_BAM,
                          bam_input = INPUT_BAM,
                          paired_end = FALSE)

# 同時(shí)含有處理組和非處理組
MeRIP_Seq_Alignment <- scanMeripBAM(
                        bam_ip = IP_BAM,
                        bam_input = INPUT_BAM,
                        bam_treated_input = TREATED_INPUT_BAM,
                        bam_treated_ip = TREATED_IP_BAM,
                        paired_end = FALSE)

# str函數(shù)可以方便快速的查看s4對(duì)象的結(jié)構(gòu)和內(nèi)容
str(MeRIP_Seq_Alignment,max.level = 3)


#2. 使用bam文件進(jìn)行 peak calling
SummarizedExomePeaks <- exomePeakCalling(merip_bams = MeRIP_Seq_Alignment,
                                         gff_dir = GENE_ANNO_GTF,
                                         genome = "hg19")


#可選,用來(lái)評(píng)估數(shù)據(jù)
SummarizedExomePeaks <- exomePeakCalling(merip_bams = MeRIP_Seq_Alignment,
                                         gff_dir = GENE_ANNO_GTF,
                                         genome = "hg19",
                                         mod_annot = MOD_ANNO_GRANGE) 

# 查看peak結(jié)果
str(SummarizedExomePeaks, max.level = 4)


#3. 計(jì)算size factors用來(lái)對(duì)GC偏倚進(jìn)行矯正
SummarizedExomePeaks <- normalizeGC(SummarizedExomePeaks)


#4. 使用glmM構(gòu)造peak統(tǒng)計(jì)量
SummarizedExomePeaks <- glmM(SummarizedExomePeaks) 

# 可選,如果有差異分析,就分析此步驟
SummarizedExomePeaks <- glmDM(SummarizedExomePeaks)

#5. Generate the scatter plot between GC content and log2 Fold Change (LFC).
p <- plotLfcGC(SummarizedExomePeaks) 

# 點(diǎn)的大小有點(diǎn)小,取出來(lái)數(shù)據(jù)重新加工
library(ggplot2)
data <- p$data
head(data)
p1 <- ggplot(data,aes(x=GC_idx,y=Log2FC,color=Label)) + geom_point(size=2) + theme_classic()
p1


#6. Generate the bar plot for the sequencing depth size factors.
plotSizeFactors(SummarizedExomePeaks)


#7. Export the modification peaks and the peak statistics with user decided format.
exportResults(SummarizedExomePeaks, format = "BED") 

GC含量散點(diǎn)圖與SizeFactor圖

image-20210310144152649.png
image-20210310144955116.png

使用示例結(jié)果中的一個(gè)差異peak在IGV中查看peak分布

相對(duì)于exomePeak,作者還輸出了每個(gè)peak在每個(gè)樣本中的read count數(shù),就用第一個(gè)peak示例吧,差異FC比較大,p值也顯著。peak大概100個(gè)bp這么長(zhǎng)。

image-20210310151256844.png
image-20210310151134620.png

至于具體效果如何,大家自行判斷吧。

請(qǐng)等待后續(xù)更新。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容