使用R處理一代測(cè)序的結(jié)果數(shù)據(jù)

一、一代測(cè)序的結(jié)果以*.ab1和*.seq儲(chǔ)存,其中*.seq文件可以使用記事本打開(kāi),儲(chǔ)存的是測(cè)序的序列結(jié)果(ATCG序列,而且是峰值最高的信號(hào)所代表的堿基),而*.ab1文件需要用特殊的可視化軟件打開(kāi),儲(chǔ)存的是序列信息及測(cè)序時(shí)每個(gè)堿基的信號(hào)強(qiáng)弱,可以通過(guò)該文件識(shí)別雜合位點(diǎn)(信號(hào)比最高信號(hào)稍低,比背景信號(hào)高)
二、本文主要介紹通過(guò)R包--sangerseqR來(lái)處理該類數(shù)據(jù)
1)安裝R包(該包在bioconductor中,通過(guò)biocmanager安裝即可)
2)讀入文件

library(sangerseqR)
#讀入*.ab1文件
seq<-readsangerseq("C://Users/shinelon/Desktop/g2001137145-C-3_3R.ab1")
#seq是一個(gè)sangerseq對(duì)象
> class(seq)
[1] "sangerseq"
attr(,"package")
[1] "sangerseqR"
#看一下seq的結(jié)構(gòu)
> str(seq)
Formal class 'sangerseq' [package "sangerseqR"] with 7 slots
  ..@ primarySeqID  : chr "From ab1 file"
  ..@ primarySeq    :Formal class 'DNAString' [package "Biostrings"] with 5 slots
  .. .. ..@ shared         :Formal class 'SharedRaw' [package "XVector"] with 2 slots
  .. .. .. .. ..@ xp                    :<externalptr> 
  .. .. .. .. ..@ .link_to_cached_object:<environment: 0x000000001515c418> 
  .. .. ..@ offset         : int 0
  .. .. ..@ length         : int 830
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ secondarySeqID: chr "From ab1 file"
  ..@ secondarySeq  :Formal class 'DNAString' [package "Biostrings"] with 5 slots
  .. .. ..@ shared         :Formal class 'SharedRaw' [package "XVector"] with 2 slots
  .. .. .. .. ..@ xp                    :<externalptr> 
  .. .. .. .. ..@ .link_to_cached_object:<environment: 0x000000001515c418> 
  .. .. ..@ offset         : int 0
  .. .. ..@ length         : int 830
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ traceMatrix   : int [1:9919, 1:4] 0 0 0 0 0 0 1 2 3 4 ...
  ..@ peakPosMatrix : num [1:830, 1:4] 3 20 36 47 61 75 86 97 109 116 ...
  ..@ peakAmpMatrix : int [1:830, 1:4] 413 1602 920 1425 750 1320 885 991 777 884 ...

#對(duì)該對(duì)象的詳細(xì)描述可以查閱幫助文檔了解
 ?sangerseq

Slots

primarySeqID
Source of the primary basecalls. Functions that modify these calls, such as makeBaseCalls and setAllelePhase will also change this value.

secondarySeqID
Source of the secondary basecalls. See above.

primarySeq
The primary Basecalls formatted as a DNAString object.

secondarySeq
The secondary Basecalls formatted as a DNAString object.

traceMatrix
A numerical matrix containing 4 columns corresponding to the normalized signal values for the chromatogram traces. Column order = A,C,G,T.
對(duì)于色譜信號(hào)的標(biāo)準(zhǔn)化的平均值

peakPosMatrix
A numerical matrix containing the position of the maximum peak values for each base within each Basecall window. If no peak was detected for a given base in a given window, then "NA". Column order = A,C,G,T.
色譜信號(hào)最高峰所在的位置

peakAmpMatrix
A numerical matrix containing the maximum peak amplitudes for each base within each Basecall window. If no peak was detected for a given base in a given window, then 0. Column order = A,C,G,T.
色譜信號(hào)在每個(gè)堿基處的高度

三、使用

seq<-makeBaseCalls(seq,ratio=0.2)#將高度比(某峰比上該處最高峰)的值大于0.2,認(rèn)為是真正的信號(hào)
peakAmp<-peakAmpMatrix(seq)
peakAmp<-as.data.frame(peakAmp)
colnames(peakAmp)<-c("A","C","G","T")
#計(jì)算ratio,用每個(gè)堿基的第二大的峰值除以最高峰
peakAmp$ratio<-apply(peakAmp,1,function(x){a=sort(x,decreasing = T);a[2]/a[1]})
peakAmp$sig<-ifelse(peakAmp$ratio>0.2,T,F)
peakAmp$primaryseq<-t(str_split(primarySeq(seq,string = T),pattern = "",simplify = T))
peakAmp$secondary<-t(str_split(secondarySeq(seq,string = T),pattern = "",simplify = T))
#檢驗(yàn)一下primaryseq和secondaryseq是否根據(jù)AMP計(jì)算的
#peakAmp$gussprimary<-apply(peakAmp[,1:4],1,function(x){c("A","C","G","T")[which(x==max(x))]})
#identical(as.character(peakAmp$primaryseq),as.character(peakAmp$gussprimary))
#[1] TRUE
#secondary沒(méi)法檢驗(yàn),因?yàn)樾枰紤]堿基的合并及ratio的閾值
guss寫(xiě)錯(cuò)了,應(yīng)該改成guess,手誤!
#可視化測(cè)序結(jié)果
chromatogram(seq, width = 200, height = 2, showcalls = "both")

peakAmp

chromatogram

四、我把這個(gè)過(guò)程寫(xiě)成了shiny程序,可自行取用
app.R

library(sangerseqR)
library(shiny)
library(stringr)
library(openxlsx)

ui <- fluidPage(
  headerPanel("Sangerseq"),
  mainPanel(fileInput(inputId = "ab1.input",label = "輸入ab1文件",accept = ".ab1"),
            textInput(inputId = "ratio.input",label = "輸入ratio",value = 0.2),
            actionButton(inputId = "start",label = "運(yùn)行"),
            plotOutput(outputId = "chromatogram.output",width = "2000px"),
            tableOutput(outputId = "table.output"),
            textInput(inputId = "filename",label = "保存的文件名",value = paste("sangerseq",Sys.time(),".xlsx",sep = "")),
            downloadButton(outputId = "res.download",label = "下載結(jié)果"))
)

server <- function(input, output, session) {
  
  seq=reactive({
    input$start
    isolate({seq=readsangerseq(input$ab1.input$datapath);seq=makeBaseCalls(seq,ratio = input$ratio.input);return(seq)})
  })
  output$chromatogram.output=renderPlot({
    chromatogram(seq(), width = 200, height = 2, showcalls = "both")})
  peakAmp<-reactive({
    peakAmp<-peakAmpMatrix(seq())
    peakAmp<-as.data.frame(peakAmp)
    colnames(peakAmp)<-c("A","C","G","T")
    peakAmp$ratio<-apply(peakAmp,1,function(x){a=sort(x,decreasing = T);a[2]/a[1]})
    peakAmp$sig<-ifelse(peakAmp$ratio>input$ratio.input,T,F)
    peakAmp$primaryseq<-t(str_split(primarySeq(seq(),string = T),pattern = "",simplify = T))
    peakAmp$secondaryseq<-t(str_split(secondarySeq(seq(),string = T),pattern = "",simplify = T))
    return(peakAmp)
  })
  output$table.output<-renderTable({head(peakAmp())},rownames = T)
  output$res.download<-downloadHandler(
    filename = input$filename,
    content = function(file){
      write.xlsx(peakAmp(),file = file)
    },
    contentType = "xlsx"
  )
  
  
}

shinyApp(ui, server)
shiny.sangerseq
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容