1、HMM,Hidden Markov model 隱馬爾科夫模型
(1)天氣舉例
- 假設(shè)不能直接觀察天氣陰晴雨情況,只能看到地面的潮濕情況(假如分為非常潮濕,一般潮濕,不潮濕三種對(duì)應(yīng)A,B,C三種評(píng)級(jí))?,F(xiàn)在我一連觀察了一周的地面潮濕情況(AABBCBA),是否能夠判斷這一周的天氣?
- 如上所述,有兩類狀態(tài):一類是地面潮濕狀態(tài) observation stata(A、B、C);一類是天氣情況 latent stata(陰晴雨);
- 之前已經(jīng)知道了天氣的狀態(tài)轉(zhuǎn)移概率矩陣,現(xiàn)在主要就需要知道天氣情況與地面潮濕情況的對(duì)應(yīng)關(guān)系,從而預(yù)測(cè)天氣情況。
(2)Hidden Markov model
HMM理解
- HMM隨機(jī)生成的狀態(tài)隨機(jī)序列被稱為狀態(tài)序列;每個(gè)狀態(tài)生成一個(gè)觀測(cè),由此產(chǎn)生的觀測(cè)隨機(jī)序列,被稱為觀測(cè)序列。
- HMM是一個(gè)雙重隨機(jī)過(guò)程---具有一定狀態(tài)的隱馬爾可夫鏈(天氣情況)和隨機(jī)的觀測(cè)序列(地面潮濕情況)。
-
The HMM is a discrete time model: for each point in time t, we have one hidden state that generates one observed event for that time point t.
HMM
HMM應(yīng)用條件
-
The hidden states follow a Markov process, i.e., the states over time are not independent of one another, but the current state depends on the previous state only (and not on earlier states)
即MM馬爾科夫的基本假設(shè),當(dāng)前狀態(tài)只與上一狀態(tài)有關(guān)
assumption1 -
The distribution generating the observation depends on the state of an underlying, hidden state,。
某時(shí)刻的觀測(cè)結(jié)果只與當(dāng)前時(shí)刻對(duì)應(yīng)的隱狀態(tài)有關(guān)
assumption2
發(fā)射矩陣 emission distribution matrix
- 即每種隱狀態(tài)對(duì)應(yīng)的,可能出現(xiàn)的觀測(cè)結(jié)果的概率情況。
-
例如晴天天氣里,地面出現(xiàn)A、B、C三種情況的概率分別為0.1,0.3,0.6;陰天分別為0.2,0.5,0.3;雨天分別為0.5,0.3,0.2
EM
HMM三要素
-
綜上,以及MM知識(shí)點(diǎn),描述HMM,需要三大參數(shù)
three sets of parameters - 此外還有隱狀態(tài)集合,預(yù)測(cè)值集合
關(guān)于預(yù)測(cè)值,一般以分類類型結(jié)果為主;
關(guān)于隱狀態(tài),我認(rèn)為在實(shí)際探索的過(guò)程中往往并不清楚狀態(tài)的含義,而是根據(jù)預(yù)測(cè)值及專業(yè)知識(shí)分析狀態(tài)意義。例如我把天氣視為觀測(cè)結(jié)果,那么隱狀態(tài)可以是梅雨期?臺(tái)風(fēng)期?甚至可以說(shuō)是太陽(yáng)公公心情的好壞狀態(tài)。
HMM模型可以解決的三類問(wèn)題?。?!劃重點(diǎn)
(1)概率計(jì)算問(wèn)題
- 給定模型λ=(A,B,π)和觀測(cè)序列Q={q1,q2,...,qT},計(jì)算模型λ下觀測(cè)到序列Q出現(xiàn)的概率P(Q|λ);
- 例如給定天氣與地面潮濕情況的HMM三要素,及一組地面潮濕情況數(shù)據(jù)。計(jì)算這組數(shù)據(jù)的出現(xiàn)概率是多大
- 方法:前向-后向算法
(2)學(xué)習(xí)問(wèn)題
- 已知觀測(cè)序列Q={q1,q2,...,qT},估計(jì)模型λ=(A,B,π)的參數(shù),使得在該模型下觀測(cè)序列P(Q|λ)最大;
- 例如已知一組地面潮濕情況數(shù)據(jù),估計(jì)天氣與地面潮濕情況的HMM三要素,從而使出現(xiàn)這組數(shù)據(jù)的可能性最大;
- 方法:Maximum likelihood, Expectation Maximization or Baum-Welch algorithm, and Bayesian estimation
(3)預(yù)測(cè)問(wèn)題
- 給定模型λ=(A,B,π)和觀測(cè)序列Q={q1,q2,...,qT},求給定觀測(cè)序列條件概率P(I|Q,λ)最大的狀態(tài)序列I。
- 例如已知一周的地面潮濕情況數(shù)據(jù),并且已知天氣與地面潮濕情況的HMM三要素,從而估計(jì)這一周最有可能的天氣情況
- 方法:Viterbi algorithm
個(gè)人認(rèn)為在實(shí)驗(yàn)探索中,觀察預(yù)測(cè)值比較容易獲得,由此學(xué)習(xí)建模,估計(jì)參數(shù)。然后根據(jù)模型結(jié)果,進(jìn)行其它觀察結(jié)果的隱狀態(tài)序列的預(yù)測(cè)。
2、R代碼實(shí)操
已知某特殊DNA序列存在兩個(gè)區(qū)(hidden stata),分別是AT-rich與CG-rich。
-
前者富含AT堿基,后者CG堿基,具體堿基概率分布(emission matrix)如下圖。
HMM 此外也知道了初始狀態(tài)分布概率以及狀態(tài)概率轉(zhuǎn)移矩陣,據(jù)此隨機(jī)生成一段符合要求的堿基序列。
第一步:準(zhǔn)備TM、EM
states <- c("AT-rich", "GC-rich") # Define the names of the states
ATrichprobs <- c(0.7, 0.3) # Set the probabilities of switching states, where the previous state was "AT-rich"
GCrichprobs <- c(0.1, 0.9) # Set the probabilities of switching states, where the previous state was "GC-rich"
thetransitionmatrix <- matrix(c(ATrichprobs, GCrichprobs), 2, 2, byrow = TRUE) # Create a 2 x 2 matrix
rownames(thetransitionmatrix) <- states
colnames(thetransitionmatrix) <- states
thetransitionmatrix
nucleotides <- c("A", "C", "G", "T") # Define the alphabet of nucleotides
ATrichstateprobs <- c(0.39, 0.1, 0.1, 0.41) # Set the values of the probabilities, for the AT-rich state
GCrichstateprobs <- c(0.1, 0.41, 0.39, 0.1) # Set the values of the probabilities, for the GC-rich state
theemissionmatrix <- matrix(c(ATrichstateprobs, GCrichstateprobs), 2, 4, byrow = TRUE) # Create a 2 x 4 matrix
rownames(theemissionmatrix) <- states
colnames(theemissionmatrix) <- nucleotides
theemissionmatrix
- 第二步:編寫函數(shù)
# Function to generate a DNA sequence, given a HMM and the length of the sequence to be generated.
generatehmmseq <- function(transitionmatrix, emissionmatrix, initialprobs, seqlength)
{
nucleotides <- c("A", "C", "G", "T") # Define the alphabet of nucleotides
states <- c("AT-rich", "GC-rich") # Define the names of the states
mysequence <- character() # Create a vector for storing the new sequence
mystates <- character() # Create a vector for storing the state that each position in the new sequence
# was generated by
# Choose the state for the first position in the sequence:
firststate <- sample(states, 1, rep=TRUE, prob=initialprobs)
# Get the probabilities of the current nucleotide, given that we are in the state "firststate":
probabilities <- emissionmatrix[firststate,]
# Choose the nucleotide for the first position in the sequence:
firstnucleotide <- sample(nucleotides, 1, rep=TRUE, prob=probabilities)
mysequence[1] <- firstnucleotide # Store the nucleotide for the first position of the sequence
mystates[1] <- firststate # Store the state that the first position in the sequence was generated by
for (i in 2:seqlength)
{
prevstate <- mystates[i-1] # Get the state that the previous nucleotide in the sequence was generated by
# Get the probabilities of the current state, given that the previous nucleotide was generated by state "prevstate"
stateprobs <- transitionmatrix[prevstate,]
# Choose the state for the ith position in the sequence:
state <- sample(states, 1, rep=TRUE, prob=stateprobs)
# Get the probabilities of the current nucleotide, given that we are in the state "state":
probabilities <- emissionmatrix[state,]
# Choose the nucleotide for the ith position in the sequence:
nucleotide <- sample(nucleotides, 1, rep=TRUE, prob=probabilities)
mysequence[i] <- nucleotide # Store the nucleotide for the current position of the sequence
mystates[i] <- state # Store the state that the current position in the sequence was generated by
}
for (i in 1:length(mysequence))
{
nucleotide <- mysequence[i]
state <- mystates[i]
print(paste("Position", i, ", State", state, ", Nucleotide = ", nucleotide))
}
}
- 第三步:給定初始狀態(tài)概率進(jìn)行預(yù)測(cè)
theinitialprobs <- c(0.5, 0.5)
generatehmmseq(thetransitionmatrix, theemissionmatrix, theinitialprobs, 30)

R實(shí)操代碼本身意義可能不大,但對(duì)于我們具體了解HMM模型很有幫助。在具體應(yīng)用HMM模型時(shí),更多的是采用相應(yīng)的R包進(jìn)行分析。這類R包有不少,會(huì)挑選幾個(gè)進(jìn)行示例學(xué)習(xí)。
參考文章
1、Hidden Markov Models — Bioinformatics 0.1 documentation
2、01 隱馬爾可夫模型 - 馬爾可夫鏈、HMM參數(shù)和性質(zhì) - 簡(jiǎn)書
3、Multilevel HMM tutorial
4、馬爾可夫鏈_百度百科





