HMM 隱馬爾可夫模型初學(xué)(一)

筆記首先總結(jié)概括HMM相關(guān)理論知識,再通過生信序列的R語言實操對HMM深入了解

1、MM,Markov model 馬爾科夫模型

(1)天氣舉例

  • 假設(shè)只有晴、陰、雨三種天氣情況。某地周一為晴天,已知若晴天后,第二天出現(xiàn)晴陰雨的概率分別為0.5,0.3,0.2(sum=1)。
  • 例如周二為雨天的概率即為0.2。而已知若雨天后,第二天出現(xiàn)晴陰雨的概率分別為0.1,0.6,0.3(sum=1),可預(yù)測周三為陰天的概率為0.2*0.3=0.06。
  • 繼續(xù)預(yù)測,已知若陰天后,第二天出現(xiàn)晴陰雨的概率分別為0.3,0.4,0.3(sum=1),可預(yù)測周四出現(xiàn)晴天的概率為0.20.30.4=0.024;
  • 綜上在已知假設(shè)及周一天氣情況下,周一~周四天氣為晴陰雨晴的概率為0.024,還可以繼續(xù)預(yù)測接下來一周的天氣情況概率。
(2)Markov chain馬爾可夫鏈
  • 如上假設(shè),即為一個簡單的馬爾科夫鏈。首先來說,它是一個離散性質(zhì)的時序模型,即在順序離散過程中,一個時間點(周幾)代表一種特定的狀態(tài)(天氣)。
  • 其次馬爾科夫鏈最重要的假設(shè)是:當(dāng)前的狀態(tài)只和上一時刻有關(guān),與上一時刻之前的任何狀態(tài)都沒有關(guān)系。反映到例子中就是今天的天氣情況只取決于昨天的天氣情況,與前天,上周的天氣都無關(guān)。
    反映到如下公式中,t+1時間點的狀態(tài)可能分布只與t時間點的狀態(tài)有關(guān)。


    MM
(3)Transition Matrix狀態(tài)轉(zhuǎn)移概率矩陣?。。▌澲攸c)
  • stata狀態(tài):馬爾可夫鏈在狀態(tài)空間內(nèi)的取值稱為狀態(tài),所有的取值情況稱為狀態(tài)空間 stata space。簡單理解就是例子中的三種天氣(晴陰雨);
  • 狀態(tài)轉(zhuǎn)移概率矩陣即是指由一種特定狀態(tài)分別轉(zhuǎn)移到所有狀態(tài)的概率transition probability(和必然為1),假設(shè)存在n種狀態(tài),則Transition Matrix就是n*n的矩陣。γij的值就表示狀態(tài) i 轉(zhuǎn)移到狀態(tài) j 的概率。


    TM

    TM
  • Transition Matrix的圖形展示


    TM,from web
(4)initial probabilities of the states
  • 上述例子中是給定周一天氣為晴天,更多情況下馬爾科夫鏈的開端也是狀態(tài)的概率分布π;
  • Often, the initial probabilities of the states πi are assumed to be the stationary distribution implied by the transition probability matrix。
    即趨近于無窮時刻下,三種狀態(tài)的分布的概率。


    initial probabilities of the states

2、R代碼實操:馬爾科夫模型預(yù)測DNA序列

(1)已知條件

A:特殊的序列

For some DNA sequences, the probability of finding a particular nucleotide at a particular position in the sequence does depend on what nucleotides are found at adjacent positions in the sequence.
即符合馬爾可夫鏈的假設(shè)

B:轉(zhuǎn)移矩陣
TM
C:初始概率

可任意指定,例如T 0.3 C 0.3 G 0.2 A 0.2

(2)R實操

目的:主要根據(jù)轉(zhuǎn)移矩陣預(yù)測可能的這樣一段特殊序列

  • 制作TM
nucleotides         <- c("A", "C", "G", "T")
afterAprobs <- c(0.2, 0.3, 0.3, 0.2)         # Set the values of the probabilities, where the previous nucleotide was "A"
afterCprobs <- c(0.1, 0.41, 0.39, 0.1)       # Set the values of the probabilities, where the previous nucleotide was "C"
afterGprobs <- c(0.25, 0.25, 0.25, 0.25)     # Set the values of the probabilities, where the previous nucleotide was "G"
afterTprobs <- c(0.5, 0.17, 0.17, 0.17)      # Set the values of the probabilities, where the previous nucleotide was "T"
mytransitionmatrix <- matrix(c(afterAprobs, afterCprobs, afterGprobs, afterTprobs), 4, 4, byrow = TRUE) # Create a 4 x 4 matrix
rownames(mytransitionmatrix) <- nucleotides
colnames(mytransitionmatrix) <- nucleotides
mytransitionmatrix 
TM
  • 編寫函數(shù)
    主要基于第一個起始堿基的概率與TM概率分布,利用sample()抽樣函數(shù)進行實現(xiàn)序列預(yù)測
generatemarkovseq <- function(transitionmatrix, initialprobs, seqlength)
{
  nucleotides     <- c("A", "C", "G", "T") # Define the alphabet of nucleotides
  mysequence      <- character()           # Create a vector for storing the new sequence
  # Choose the nucleotide for the first position in the sequence:
  firstnucleotide <- sample(nucleotides, 1, rep=TRUE, prob=initialprobs)
  mysequence[1]   <- firstnucleotide       # Store the nucleotide for the first position of the sequence
  for (i in 2:seqlength)
  {
    prevnucleotide <- mysequence[i-1]     # Get the previous nucleotide in the new sequence
    # Get the probabilities of the current nucleotide, given previous nucleotide "prevnucleotide":
    probabilities  <- transitionmatrix[prevnucleotide,]
    # Choose the nucleotide at the current position of the sequence:
    nucleotide     <- sample(nucleotides, 1, rep=TRUE, prob=probabilities)
    mysequence[i]  <- nucleotide          # Store the nucleotide for the current position of the sequence
  }
  return(mysequence)
}
  • 運行函數(shù)
myinitialprobs <- c(0.3, 0.3, 0.2, 0.2) #對應(yīng)函數(shù)體設(shè)置的"A", "C", "G", "T"順序
generatemarkovseq(mytransitionmatrix, myinitialprobs, 30)
table(generatemarkovseq(mytransitionmatrix, myinitialprobs, 30))

result

參考文章
1、Hidden Markov Models — Bioinformatics 0.1 documentation
2、01 隱馬爾可夫模型 - 馬爾可夫鏈、HMM參數(shù)和性質(zhì) - 簡書
3、Multilevel HMM tutorial
4、馬爾可夫鏈_百度百科

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容