Overview
雙向LSTM是傳統(tǒng)LSTM的擴(kuò)展,可以改善序列分類問題上的模型性能。
在輸入序列的所有時(shí)間步均可用的問題中,雙向LSTM在輸入序列上訓(xùn)練兩個(gè)LSTM,而不是一個(gè)LSTM。輸入序列上的第一個(gè)按原樣,第二個(gè)輸入序列的反向副本。這可以為網(wǎng)絡(luò)提供其他上下文,并導(dǎo)致對問題的更快,甚至更充分的學(xué)習(xí)。
在本教程中,您將發(fā)現(xiàn)如何使用Keras深度學(xué)習(xí)庫開發(fā)用于序列分類的雙向LSTM。
完成本教程后,您將知道:
- 如何開發(fā)一個(gè)小的人為和可配置序列分類問題。
- 如何開發(fā)用于序列分類的LSTM和雙向LSTM。
- 如何比較雙向LSTM中使用的合并模式的性能。
本教程分為6個(gè)部分。他們是:
- 雙向LSTM
- 序列分類問題
- LSTM用于序列分類
- 雙向LSTM用于序列分類
- 比較LSTM和雙向LSTM
- 比較雙向LSTM合并模式
PART1 - Bi-LSTMs
雙向遞歸神經(jīng)網(wǎng)絡(luò)(RNN)的想法很簡單。它涉及到復(fù)制網(wǎng)絡(luò)中的第一個(gè)循環(huán)層,以便現(xiàn)在并排有兩層,然后按原樣提供輸入序列作為對第一層的輸入,并向第二層提供輸入序列的反向副本。
"""
為了克服常規(guī)RNN […]的局限性,我們提出了一種雙向遞歸神經(jīng)網(wǎng)絡(luò)(BRNN),可以在特定時(shí)間范圍的過去和將來使用所有可用的輸入信息來進(jìn)行訓(xùn)練。##
這個(gè)想法是將規(guī)則RNN的狀態(tài)神經(jīng)元分成負(fù)責(zé)正向時(shí)間方向(正向狀態(tài))和負(fù)向時(shí)間方向(后向狀態(tài))的部分##
"""
— Mike Schuster and Kuldip K. Paliwal, Bidirectional Recurrent Neural Networks, 1997
這種方法已與長短期記憶(LSTM)遞歸神經(jīng)網(wǎng)絡(luò)一起使用,效果很好。最初在語音識別領(lǐng)域證明了使用雙向提供序列的合理性,因?yàn)橛凶C據(jù)表明整個(gè)話語的上下文用于解釋正在說的內(nèi)容,而不是線性解釋。
"""
乍一看,依靠未來的知識似乎違反了因果關(guān)系。我們?nèi)绾尾拍芑谏形凑f過的話來了解所聽到的內(nèi)容?但是,聽眾正是這樣做的。聲音,單詞,甚至整個(gè)句子最初都意味著沒有意義,因此從未來的角度來看是沒有意義的。我們必須記住的是,真正在線的任務(wù)(每次輸入后都需要輸出)與僅在某些輸入段的末尾需要輸出的任務(wù)之間的區(qū)別。
— Alex Graves and Jurgen Schmidhuber, Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures, 2005
使用雙向LSTM可能無法解決所有序列預(yù)測問題,但可以在適當(dāng)?shù)那闆r下為這些域提供更好的結(jié)果,從而帶來一些好處。
需要明確的是,輸入序列中的時(shí)間步仍一次處理一次,只是網(wǎng)絡(luò)同時(shí)在兩個(gè)方向上遍歷輸入序列。
PART2 - Bidirectional LSTMs in Keras
Keras通過雙向?qū)影b器支持雙向LSTM。它還允許您指定合并模式,即在將前進(jìn)和后退輸出傳遞到下一層之前應(yīng)該對其進(jìn)行組合的方式。選項(xiàng)包括:
‘sum‘: The outputs are added together.
‘mul‘: The outputs are multiplied together.
‘concat‘: The outputs are concatenated together (the default), providing double the number of outputs to the next layer.
‘a(chǎn)ve‘: The average of the outputs is taken.
默認(rèn)模式是連接,這是雙向LSTM研究中經(jīng)常使用的方法。
Sequence Classification Problem
我們將定義一個(gè)簡單的序列分類問題,以探索雙向LSTM。
問題定義為一個(gè)介于0到1之間的隨機(jī)值序列。該序列被用作問題的輸入,每個(gè)時(shí)間步長提供一個(gè)數(shù)字。每個(gè)輸入都關(guān)聯(lián)一個(gè)二進(jìn)制標(biāo)簽(0或1)。輸出值全為0。一旦序列中輸入值的累加總和超過閾值,則輸出值將從0翻轉(zhuǎn)為1。使用1/4序列長度的閾值。
例如,下面是一個(gè)步長為10的序列:
0.63144003 0.29414551 0.91587952 0.95189228 0.32195638 0.60742236 0.83895793 0.18023048 0.84762691 0.29165514
對應(yīng)的分類序列(y)是:
0 0 0 1 1 1 1 1 1 10
用Python實(shí)現(xiàn):
from random import random
from numpy import array
from numpy import cumsum
# create a sequence of random numbers in [0,1]
X = array([random() for _ in range(10)])
# calculate cut-off value to change class values
limit = 10/4.0
# 可以使用cumsum()NumPy函數(shù)來計(jì)算輸入序列的累積和。
#此函數(shù)返回一系列累加和值,例如:pos1, pos1+pos2, pos1+pos2+pos3, ...
# determine the class outcome for each item in cumulative sequence
y = array([0 if x < limit else 1 for x in cumsum(X)])
# create a sequence classification instance
def get_sequence(n_timesteps):
# create a sequence of random numbers in [0,1]
X = array([random() for _ in range(n_timesteps)])
# calculate cut-off value to change class values
limit = n_timesteps/4.0
# determine the class outcome for each item in cumulative sequence
y = array([0 if x < limit else 1 for x in cumsum(X)])
return X, y
X, y = get_sequence(10)
print(X)
print(y)
Output:
[ 0.22228819 0.26882207 0.069623 0.91477783 0.02095862 0.71322527
0.90159654 0.65000306 0.88845226 0.4037031 ]
[0 0 0 0 0 0 1 1 1 1]
PART3 - LSTM For Sequence Classification
我們可以從開發(fā)用于序列分類問題的傳統(tǒng)LSTM開始。首先,我們必須更新get_sequence()函數(shù)以將輸入和輸出序列整形為3維以滿足LSTM的期望。預(yù)期的結(jié)構(gòu)具有尺寸[樣本,時(shí)間步長,特征]。分類問題有1個(gè)樣本(例如一個(gè)序列),可配置數(shù)量的時(shí)間步長和每個(gè)時(shí)間步長一個(gè)特征。分類問題有1個(gè)樣本(例如一個(gè)序列),可配置數(shù)量的時(shí)間步長和每個(gè)時(shí)間步長一個(gè)特征。
因此,我們可以按以下方式重塑序列。
# reshape input and output data to be suitable for LSTMs
X = X.reshape(1, n_timesteps, 1)
y = y.reshape(1, n_timesteps, 1)
# create a sequence classification instance
def get_sequence(n_timesteps):
# create a sequence of random numbers in [0,1]
X = array([random() for _ in range(n_timesteps)])
# calculate cut-off value to change class values
limit = n_timesteps/4.0
# determine the class outcome for each item in cumulative sequence
y = array([0 if x < limit else 1 for x in cumsum(X)])
# reshape input and output data to be suitable for LSTMs
X = X.reshape(1, n_timesteps, 1)
y = y.reshape(1, n_timesteps, 1)
return X, y
我們將序列定義為具有10個(gè)時(shí)間步長。接下來,我們可以為該問題定義一個(gè)LSTM。輸入層將有10個(gè)時(shí)間步長,其中1個(gè)特征是一個(gè)輸入塊,input_shape =(10,1)。第一個(gè)隱藏層將具有20個(gè)存儲(chǔ)單元,輸出層將是一個(gè)完全連接的層,每個(gè)時(shí)間步輸出一個(gè)值。在輸出上使用S型激活函數(shù)來預(yù)測二進(jìn)制值。在輸出層周圍使用一個(gè)TimeDistributed包裝器層,這樣,在給定作為輸入提供的完整序列的情況下,可以預(yù)測每個(gè)時(shí)間步長的一個(gè)值。這要求LSTM隱藏層返回一個(gè)值序列(每個(gè)時(shí)間步一個(gè)),而不是整個(gè)輸入序列的單個(gè)值。最后,由于這是二進(jìn)制分類問題,因此使用了二進(jìn)制對數(shù)損失(Keras中的binary_crossentropy)。高效的ADAM優(yōu)化算法用于查找權(quán)重,并在每個(gè)時(shí)期計(jì)算并報(bào)告準(zhǔn)確性指標(biāo)。
# define LSTM
model = Sequential()
model.add(LSTM(20, input_shape=(10, 1), return_sequences=True))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])
LSTM將接受1,000 epoches的培訓(xùn)。每個(gè)epoch將生成一個(gè)新的隨機(jī)輸入序列,以適合網(wǎng)絡(luò)。這樣可以確保模型不存儲(chǔ)單個(gè)序列,而可以泛化一個(gè)解決此問題的所有可能的隨機(jī)輸入序列的解決方案。
# train LSTM
for epoch in range(1000):
# generate new random sequence
X,y = get_sequence(n_timesteps)
# fit model for one epoch on this sequence
model.fit(X, y, epochs=1, batch_size=1, verbose=2)
訓(xùn)練后,將在另一個(gè)隨機(jī)序列上評估網(wǎng)絡(luò)。然后將預(yù)測與預(yù)期輸出序列進(jìn)行比較,以提供系統(tǒng)技能的具體示例。
# evaluate LSTM
X,y = get_sequence(n_timesteps)
yhat = model.predict_classes(X, verbose=0)
for i in range(n_timesteps):
print('Expected:', y[0, i], 'Predicted', yhat[0, i])
完整代碼如下:
from random import random
from numpy import array
from numpy import cumsum
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import TimeDistributed
# create a sequence classification instance
def get_sequence(n_timesteps):
# create a sequence of random numbers in [0,1]
X = array([random() for _ in range(n_timesteps)])
# calculate cut-off value to change class values
limit = n_timesteps/4.0
# determine the class outcome for each item in cumulative sequence
y = array([0 if x < limit else 1 for x in cumsum(X)])
# reshape input and output data to be suitable for LSTMs
X = X.reshape(1, n_timesteps, 1)
y = y.reshape(1, n_timesteps, 1)
return X, y
# define problem properties
n_timesteps = 10
# define LSTM
model = Sequential()
model.add(LSTM(20, input_shape=(n_timesteps, 1), return_sequences=True))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])
# train LSTM
for epoch in range(1000):
# generate new random sequence
X,y = get_sequence(n_timesteps)
# fit model for one epoch on this sequence
model.fit(X, y, epochs=1, batch_size=1, verbose=2)
# evaluate LSTM
X,y = get_sequence(n_timesteps)
yhat = model.predict_classes(X, verbose=0)
for i in range(n_timesteps):
print('Expected:', y[0, i], 'Predicted', yhat[0, i])
運(yùn)行該示例將在每個(gè)時(shí)期的隨機(jī)序列上顯示對數(shù)丟失和分類準(zhǔn)確性。這提供了一個(gè)清晰的思路,可以明確模型對序列分類問題的解決方案的一般化程度。我們可以看到該模型運(yùn)行良好,最終精度在90%和100%左右徘徊。不完美,但對我們的目的有利。將新隨機(jī)序列的預(yù)測與預(yù)期值進(jìn)行比較,顯示出幾乎正確的結(jié)果,但有一個(gè)錯(cuò)誤。
Epoch 1/1
0s - loss: 0.2039 - acc: 0.9000
Epoch 1/1
0s - loss: 0.2985 - acc: 0.9000
Epoch 1/1
0s - loss: 0.1219 - acc: 1.0000
Epoch 1/1
0s - loss: 0.2031 - acc: 0.9000
Epoch 1/1
0s - loss: 0.1698 - acc: 0.9000
Expected: [0] Predicted [0]
Expected: [0] Predicted [0]
Expected: [0] Predicted [0]
Expected: [0] Predicted [0]
Expected: [0] Predicted [0]
Expected: [0] Predicted [1]
Expected: [1] Predicted [1]
Expected: [1] Predicted [1]
Expected: [1] Predicted [1]
Expected: [1] Predicted [1]
PART4 - Bidirectional LSTM For Sequence Classification
現(xiàn)在我們知道如何為序列分類問題開發(fā)LSTM,我們可以擴(kuò)展該示例以演示雙向LSTM。
我們可以通過將LSTM隱藏層與雙向?qū)影b在一起來做到這一點(diǎn),如下所示:
model.add(Bidirectional(LSTM(20, return_sequences=True), input_shape=(n_timesteps, 1)))
這將創(chuàng)建隱藏層的兩個(gè)副本,一個(gè)副本按原樣適合輸入序列,一個(gè)副本在輸入序列的反向副本上。默認(rèn)情況下,這些LSTM的輸出值將被串聯(lián)。這意味著,現(xiàn)在代替接收20個(gè)輸出的10個(gè)時(shí)間步長的TimeDistributed層,它現(xiàn)在將接收40個(gè)輸出(20個(gè)單元+ 20個(gè)單元)的10個(gè)時(shí)間步長。下面列出了完整的示例。
from random import random
from numpy import array
from numpy import cumsum
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import TimeDistributed
from keras.layers import Bidirectional
# create a sequence classification instance
def get_sequence(n_timesteps):
# create a sequence of random numbers in [0,1]
X = array([random() for _ in range(n_timesteps)])
# calculate cut-off value to change class values
limit = n_timesteps/4.0
# determine the class outcome for each item in cumulative sequence
y = array([0 if x < limit else 1 for x in cumsum(X)])
# reshape input and output data to be suitable for LSTMs
X = X.reshape(1, n_timesteps, 1)
y = y.reshape(1, n_timesteps, 1)
return X, y
# define problem properties
n_timesteps = 10
# define LSTM
model = Sequential()
model.add(Bidirectional(LSTM(20, return_sequences=True), input_shape=(n_timesteps, 1)))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])
# train LSTM
for epoch in range(1000):
# generate new random sequence
X,y = get_sequence(n_timesteps)
# fit model for one epoch on this sequence
model.fit(X, y, epochs=1, batch_size=1, verbose=2)
# evaluate LSTM
X,y = get_sequence(n_timesteps)
yhat = model.predict_classes(X, verbose=0)
for i in range(n_timesteps):
print('Expected:', y[0, i], 'Predicted', yhat[0, i])
Output:
...
Epoch 1/1
0s - loss: 0.0967 - acc: 0.9000
Epoch 1/1
0s - loss: 0.0865 - acc: 1.0000
Epoch 1/1
0s - loss: 0.0905 - acc: 0.9000
Epoch 1/1
0s - loss: 0.2460 - acc: 0.9000
Epoch 1/1
0s - loss: 0.1458 - acc: 0.9000
Expected: [0] Predicted [0]
Expected: [0] Predicted [0]
Expected: [0] Predicted [0]
Expected: [0] Predicted [0]
Expected: [0] Predicted [0]
Expected: [1] Predicted [1]
Expected: [1] Predicted [1]
Expected: [1] Predicted [1]
Expected: [1] Predicted [1]
Expected: [1] Predicted [1]
PART5 - Compare LSTM to Bidirectional LSTM
在此示例中,我們將在訓(xùn)練模型的同時(shí)將傳統(tǒng)LSTM與雙向LSTM的性能進(jìn)行比較。
我們將調(diào)整實(shí)驗(yàn),以便僅針對250個(gè)epoches訓(xùn)練模型。這樣一來,我們就可以清楚地了解每個(gè)模型的學(xué)習(xí)方式以及雙向LSTM的學(xué)習(xí)行為如何不同。
我們將比較三種不同的模型:
LSTM(原樣)
具有反向輸入序列的LSTM(例如,您可以通過將LSTM層的“ go_backwards”參數(shù)設(shè)置為“ True”來執(zhí)行此操作)
雙向LSTM
這種比較將有助于表明,雙向LSTM實(shí)際上可以增加一些東西,而不僅僅是簡單地反轉(zhuǎn)輸入序列。我們將定義一個(gè)函數(shù)來創(chuàng)建和返回具有向前或向后輸入序列的LSTM,如下所示:
def get_lstm_model(n_timesteps, backwards):
model = Sequential()
model.add(LSTM(20, input_shape=(n_timesteps, 1), return_sequences=True, go_backwards=backwards))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
model.compile(loss='binary_crossentropy', optimizer='adam')
return model
我們可以為雙向LSTM開發(fā)類似的功能,其中可以將合并模式指定為參數(shù)??梢酝ㄟ^將合并模式設(shè)置為值“ concat”來指定默認(rèn)的串聯(lián)。
def get_bi_lstm_model(n_timesteps, mode):
model = Sequential()
model.add(Bidirectional(LSTM(20, return_sequences=True), input_shape=(n_timesteps, 1), merge_mode=mode))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
model.compile(loss='binary_crossentropy', optimizer='adam')
return model
最后,我們定義一個(gè)適合模型的函數(shù),并在每個(gè)訓(xùn)練時(shí)期檢索并存儲(chǔ)損失,然后在模型擬合后返回收集的損失值列表。這樣,我們就可以繪制每個(gè)模型配置的對數(shù)損失圖表并進(jìn)行比較。
def train_model(model, n_timesteps):
loss = list()
for _ in range(250):
# generate new random sequence
X,y = get_sequence(n_timesteps)
# fit model for one epoch on this sequence
hist = model.fit(X, y, epochs=1, batch_size=1, verbose=0)
loss.append(hist.history['loss'][0])
return loss
綜上,下面列出了完整的示例。首先,創(chuàng)建并擬合傳統(tǒng)的LSTM,并繪制對數(shù)損失值。重復(fù)使用具有反向輸入序列的LSTM,最后使用具有級聯(lián)合并的LSTM重復(fù)此過程。
from random import random
from numpy import array
from numpy import cumsum
from matplotlib import pyplot
from pandas import DataFrame
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import TimeDistributed
from keras.layers import Bidirectional
# create a sequence classification instance
def get_sequence(n_timesteps):
# create a sequence of random numbers in [0,1]
X = array([random() for _ in range(n_timesteps)])
# calculate cut-off value to change class values
limit = n_timesteps/4.0
# determine the class outcome for each item in cumulative sequence
y = array([0 if x < limit else 1 for x in cumsum(X)])
# reshape input and output data to be suitable for LSTMs
X = X.reshape(1, n_timesteps, 1)
y = y.reshape(1, n_timesteps, 1)
return X, y
def get_lstm_model(n_timesteps, backwards):
model = Sequential()
model.add(LSTM(20, input_shape=(n_timesteps, 1), return_sequences=True, go_backwards=backwards))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
model.compile(loss='binary_crossentropy', optimizer='adam')
return model
def get_bi_lstm_model(n_timesteps, mode):
model = Sequential()
model.add(Bidirectional(LSTM(20, return_sequences=True), input_shape=(n_timesteps, 1), merge_mode=mode))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
model.compile(loss='binary_crossentropy', optimizer='adam')
return model
def train_model(model, n_timesteps):
loss = list()
for _ in range(250):
# generate new random sequence
X,y = get_sequence(n_timesteps)
# fit model for one epoch on this sequence
hist = model.fit(X, y, epochs=1, batch_size=1, verbose=0)
loss.append(hist.history['loss'][0])
return loss
n_timesteps = 10
results = DataFrame()
# lstm forwards
model = get_lstm_model(n_timesteps, False)
results['lstm_forw'] = train_model(model, n_timesteps)
# lstm backwards
model = get_lstm_model(n_timesteps, True)
results['lstm_back'] = train_model(model, n_timesteps)
# bidirectional concat
model = get_bi_lstm_model(n_timesteps, 'concat')
results['bilstm_con'] = train_model(model, n_timesteps)
# line plot of results
results.plot()
pyplot.show()
運(yùn)行示例將創(chuàng)建一個(gè)折線圖。你的特定繪圖可能在細(xì)節(jié)上有所不同,但將顯示相同的趨勢。
我們可以看到,在250個(gè)訓(xùn)練時(shí)期內(nèi),LSTM正向(藍(lán)色)和LSTM向后(橙色)顯示出相似的對數(shù)損失。我們可以看到,雙向LSTM對數(shù)損耗是不同的(綠色),下降得更快,并且通常比其他兩種配置低。

PART 6 - Comparing Bidirectional LSTM Merge Modes
可以使用4種不同的合并模式來組合雙向LSTM層的結(jié)果。它們是串聯(lián)(默認(rèn)),乘法,平均值和總和。通過更新上一節(jié)中的示例,我們可以比較不同合并模式的行為,如下所示:
n_timesteps = 10
results = DataFrame()
# sum merge
model = get_bi_lstm_model(n_timesteps, 'sum')
results['bilstm_sum'] = train_model(model, n_timesteps)
# mul merge
model = get_bi_lstm_model(n_timesteps, 'mul')
results['bilstm_mul'] = train_model(model, n_timesteps)
# avg merge
model = get_bi_lstm_model(n_timesteps, 'ave')
results['bilstm_ave'] = train_model(model, n_timesteps)
# concat merge
model = get_bi_lstm_model(n_timesteps, 'concat')
results['bilstm_con'] = train_model(model, n_timesteps)
# line plot of results
results.plot()
pyplot.show()
運(yùn)行示例將創(chuàng)建一個(gè)折線圖,比較每種合并模式的對數(shù)損失。你的圖片可能有所不同,但將顯示相同的行為趨勢。不同的合并模式會(huì)導(dǎo)致不同的模型性能,并且這將取決于您的特定序列預(yù)測問題。在這種情況下,我們可以看到總和(藍(lán)色)和串聯(lián)(紅色)合并模式可能會(huì)導(dǎo)致更好的性能,或者至少降低對數(shù)損失。
