一、一代測(cè)序的結(jié)果以*.ab1和*.seq儲(chǔ)存,其中*.seq文件可以使用記事本打開(kāi),儲(chǔ)存的是測(cè)序的序列結(jié)果(ATCG序列,而且是峰值最高的信號(hào)所代表的堿基),而*.ab1文件需要用特殊的可視化軟件打開(kāi),儲(chǔ)存的是序列信息及測(cè)序時(shí)每個(gè)堿基的信號(hào)強(qiáng)弱,可以通過(guò)該文件識(shí)別雜合位點(diǎn)(信號(hào)比最高信號(hào)稍低,比背景信號(hào)高)
二、本文主要介紹通過(guò)R包--sangerseqR來(lái)處理該類數(shù)據(jù)
1)安裝R包(該包在bioconductor中,通過(guò)biocmanager安裝即可)
2)讀入文件
library(sangerseqR)
#讀入*.ab1文件
seq<-readsangerseq("C://Users/shinelon/Desktop/g2001137145-C-3_3R.ab1")
#seq是一個(gè)sangerseq對(duì)象
> class(seq)
[1] "sangerseq"
attr(,"package")
[1] "sangerseqR"
#看一下seq的結(jié)構(gòu)
> str(seq)
Formal class 'sangerseq' [package "sangerseqR"] with 7 slots
..@ primarySeqID : chr "From ab1 file"
..@ primarySeq :Formal class 'DNAString' [package "Biostrings"] with 5 slots
.. .. ..@ shared :Formal class 'SharedRaw' [package "XVector"] with 2 slots
.. .. .. .. ..@ xp :<externalptr>
.. .. .. .. ..@ .link_to_cached_object:<environment: 0x000000001515c418>
.. .. ..@ offset : int 0
.. .. ..@ length : int 830
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ secondarySeqID: chr "From ab1 file"
..@ secondarySeq :Formal class 'DNAString' [package "Biostrings"] with 5 slots
.. .. ..@ shared :Formal class 'SharedRaw' [package "XVector"] with 2 slots
.. .. .. .. ..@ xp :<externalptr>
.. .. .. .. ..@ .link_to_cached_object:<environment: 0x000000001515c418>
.. .. ..@ offset : int 0
.. .. ..@ length : int 830
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ traceMatrix : int [1:9919, 1:4] 0 0 0 0 0 0 1 2 3 4 ...
..@ peakPosMatrix : num [1:830, 1:4] 3 20 36 47 61 75 86 97 109 116 ...
..@ peakAmpMatrix : int [1:830, 1:4] 413 1602 920 1425 750 1320 885 991 777 884 ...
#對(duì)該對(duì)象的詳細(xì)描述可以查閱幫助文檔了解
?sangerseq
Slots
primarySeqID
Source of the primary basecalls. Functions that modify these calls, such as makeBaseCalls and setAllelePhase will also change this value.secondarySeqID
Source of the secondary basecalls. See above.primarySeq
The primary Basecalls formatted as a DNAString object.secondarySeq
The secondary Basecalls formatted as a DNAString object.traceMatrix
A numerical matrix containing 4 columns corresponding to the normalized signal values for the chromatogram traces. Column order = A,C,G,T.
對(duì)于色譜信號(hào)的標(biāo)準(zhǔn)化的平均值peakPosMatrix
A numerical matrix containing the position of the maximum peak values for each base within each Basecall window. If no peak was detected for a given base in a given window, then "NA". Column order = A,C,G,T.
色譜信號(hào)最高峰所在的位置peakAmpMatrix
A numerical matrix containing the maximum peak amplitudes for each base within each Basecall window. If no peak was detected for a given base in a given window, then 0. Column order = A,C,G,T.
色譜信號(hào)在每個(gè)堿基處的高度
三、使用
seq<-makeBaseCalls(seq,ratio=0.2)#將高度比(某峰比上該處最高峰)的值大于0.2,認(rèn)為是真正的信號(hào)
peakAmp<-peakAmpMatrix(seq)
peakAmp<-as.data.frame(peakAmp)
colnames(peakAmp)<-c("A","C","G","T")
#計(jì)算ratio,用每個(gè)堿基的第二大的峰值除以最高峰
peakAmp$ratio<-apply(peakAmp,1,function(x){a=sort(x,decreasing = T);a[2]/a[1]})
peakAmp$sig<-ifelse(peakAmp$ratio>0.2,T,F)
peakAmp$primaryseq<-t(str_split(primarySeq(seq,string = T),pattern = "",simplify = T))
peakAmp$secondary<-t(str_split(secondarySeq(seq,string = T),pattern = "",simplify = T))
#檢驗(yàn)一下primaryseq和secondaryseq是否根據(jù)AMP計(jì)算的
#peakAmp$gussprimary<-apply(peakAmp[,1:4],1,function(x){c("A","C","G","T")[which(x==max(x))]})
#identical(as.character(peakAmp$primaryseq),as.character(peakAmp$gussprimary))
#[1] TRUE
#secondary沒(méi)法檢驗(yàn),因?yàn)樾枰紤]堿基的合并及ratio的閾值
guss寫(xiě)錯(cuò)了,應(yīng)該改成guess,手誤!
#可視化測(cè)序結(jié)果
chromatogram(seq, width = 200, height = 2, showcalls = "both")


四、我把這個(gè)過(guò)程寫(xiě)成了shiny程序,可自行取用
app.R
library(sangerseqR)
library(shiny)
library(stringr)
library(openxlsx)
ui <- fluidPage(
headerPanel("Sangerseq"),
mainPanel(fileInput(inputId = "ab1.input",label = "輸入ab1文件",accept = ".ab1"),
textInput(inputId = "ratio.input",label = "輸入ratio",value = 0.2),
actionButton(inputId = "start",label = "運(yùn)行"),
plotOutput(outputId = "chromatogram.output",width = "2000px"),
tableOutput(outputId = "table.output"),
textInput(inputId = "filename",label = "保存的文件名",value = paste("sangerseq",Sys.time(),".xlsx",sep = "")),
downloadButton(outputId = "res.download",label = "下載結(jié)果"))
)
server <- function(input, output, session) {
seq=reactive({
input$start
isolate({seq=readsangerseq(input$ab1.input$datapath);seq=makeBaseCalls(seq,ratio = input$ratio.input);return(seq)})
})
output$chromatogram.output=renderPlot({
chromatogram(seq(), width = 200, height = 2, showcalls = "both")})
peakAmp<-reactive({
peakAmp<-peakAmpMatrix(seq())
peakAmp<-as.data.frame(peakAmp)
colnames(peakAmp)<-c("A","C","G","T")
peakAmp$ratio<-apply(peakAmp,1,function(x){a=sort(x,decreasing = T);a[2]/a[1]})
peakAmp$sig<-ifelse(peakAmp$ratio>input$ratio.input,T,F)
peakAmp$primaryseq<-t(str_split(primarySeq(seq(),string = T),pattern = "",simplify = T))
peakAmp$secondaryseq<-t(str_split(secondarySeq(seq(),string = T),pattern = "",simplify = T))
return(peakAmp)
})
output$table.output<-renderTable({head(peakAmp())},rownames = T)
output$res.download<-downloadHandler(
filename = input$filename,
content = function(file){
write.xlsx(peakAmp(),file = file)
},
contentType = "xlsx"
)
}
shinyApp(ui, server)
