RNN&LSTM
當(dāng)前輸出的產(chǎn)生與之前的信息(輸出、狀態(tài))相關(guān)
RNN:過去發(fā)生的事件可以預(yù)測現(xiàn)在
LSTM(進(jìn)階RNN):過去發(fā)生的事情太多了,全部記住的話,處理起來過于復(fù)雜,因此選擇遺忘不重要事件,記住重要的(解決長期依賴問題)
LSTM 核心思想
Sigmoid 層輸出 0 到 1 之間的數(shù)值,可以用來描述每個部分有多少需要記住。0 指代“全忘記”,1 指“記住所有”
原本RNN只有 tanh層 來處理當(dāng)前輸出(過去和現(xiàn)在輸入),LSTM將 Sigmoid 層運用到 tanh層,增添了狀態(tài)信息,并進(jìn)一步將其設(shè)計為門結(jié)構(gòu),來決定當(dāng)前輸出。
LSTM 有通過精心設(shè)計的稱作為“門”的結(jié)構(gòu)來去除或者增加信息到細(xì)胞狀態(tài)的能力。
門是一種讓信息選擇式通過的方法。他們包含一個sigmoid 神經(jīng)網(wǎng)絡(luò)層和一個pointwise 乘法操作。
門
LSTM 擁有三個門,來保護(hù)和控制細(xì)胞狀態(tài)。f,i,o 分別表示遺忘門、輸入門、輸出門
分別表示遺忘門、輸入門、輸出門
前向傳播
當(dāng)前狀態(tài)
LSTM 設(shè)計了狀態(tài)信息Ct來選擇是否遺忘,因而三個門中,有兩個是用來產(chǎn)生狀態(tài)信息的,如下:
- (遺忘門 f )sigmoid (Ht-1,Xt)
(pointwise)C t-1 ;——乘:有 0為 0 ,并且以后所有乘0都為0
- (遺忘門 f )sigmoid (Ht-1,Xt)
- (輸入門 i )sigmoid (Ht-1,Xt)
(pointwise)tanh(Ht-1,Xt) ;
- (輸入門 i )sigmoid (Ht-1,Xt)
- C t = 1輸出
+2輸出 ;*——與:有0為0,同1為 1 // 加號:同0為0,異為1,同1為2
狀態(tài)信息的產(chǎn)生
當(dāng)前輸出
- (輸出門 o )sigmoid (Ht-1,Xt)
(pointwise)tanh(Ct)
其中 sigmoid (x,y) 和 tanh(x,y) 等涉及的兩個向量x,y處理為 W(x,y)+b,即 sigmoid (W(x,y)+b)、tanh(W(x,y)+b)。再將兩個參數(shù)向量合并 Xc(t) = [X(t),H(t-1)],則 sigmoid (W(Xc)+b)、tanh(W(Xc)+b)。
則當(dāng)前cell相關(guān)方程如下(s 為狀態(tài) C):


sigmoid和tanh的區(qū)別
sigmoid 在特征相差比較復(fù)雜或是相差不大時效果比較好
tanh 在特征相差明顯時的效果會很好,在循環(huán)過程中會不斷擴(kuò)大特征效果,0均值
sigmoid在輸入處于[-1,1]之間時,函數(shù)值變化敏感,一旦接近或者超出區(qū)間就失去敏感性,處于飽和狀態(tài),影響神經(jīng)網(wǎng)絡(luò)預(yù)測的精度值。tanh的輸出和輸入能夠保持非線性單調(diào)上升和下降關(guān)系,符合BP網(wǎng)絡(luò)的梯度求解,容錯性好,有界,漸進(jìn)于0、1,符合人腦神經(jīng)飽和的規(guī)律,但比 sigmoid 函數(shù)延遲了飽和期。




反向傳播
已知部分參數(shù)基于損失函數(shù) t+1 時刻的偏導(dǎo)數(shù),計算所有參數(shù)基于損失函數(shù) t 時刻的偏導(dǎo)數(shù)
反向傳播計算誤差的傳播沿兩個方向,分別為從輸出層傳遞至輸入層,以及沿時間 t 的反向傳播
先計算h輸出誤差,用h表述s狀態(tài)的誤差,再用h和s計算其他參數(shù)誤差
- diff_h
損失函數(shù)
h(t),y(t) 分別為輸出序列與樣本標(biāo)簽
diff_h = 2·sum(h(t)-y(t))
輸出誤差計算
def bottom_diff(self, pred, label):
diff = np.zeros_like(pred)
diff[0] = 2 * (pred[0] - label)
return diff
while idx >= 0:
diff_h = bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])
diff_h += self.lstm_node_list[idx + 1].state.bottom_diff_h
-
diff_s
diff_s
ds = self.state.o * top_diff_h + top_diff_s
-
其他 diff_o i g f
反向傳播導(dǎo)數(shù)
do = self.state.s * top_diff_h
di = self.state.g * ds
dg = self.state.i * ds
df = self.s_prev * ds
-
wi_diff
權(quán)重 diff
wi_diff += np.outer( (1. - i) * i *di, xc)
損失 J 對 h 的導(dǎo)數(shù)—— sigmoid 型
1. - i相比較1 - i,若 i 為整數(shù),結(jié)果為 float
以下代碼主要來自參考文章7
LSTM 代碼實現(xiàn)
import random
import numpy as np
import math
def sigmoid(x):
return 1. / (1 + np.exp(-x))
# createst uniform random array w/ values in [a,b) and shape args
def rand_arr(a, b, *args):
np.random.seed(0)
return np.random.rand(*args) * (b - a) + a
# 參數(shù)
class LstmParam:
def __init__(self, mem_cell_ct, x_dim):
self.mem_cell_ct = mem_cell_ct
self.x_dim = x_dim
concat_len = x_dim + mem_cell_ct
# 權(quán)重矩陣 weight matrices
self.wg = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)
self.wi = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)
self.wf = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)
self.wo = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)
# 偏置 bias terms
self.bg = rand_arr(-0.1, 0.1, mem_cell_ct)
self.bi = rand_arr(-0.1, 0.1, mem_cell_ct)
self.bf = rand_arr(-0.1, 0.1, mem_cell_ct)
self.bo = rand_arr(-0.1, 0.1, mem_cell_ct)
# 梯度 diffs (derivative of loss function w.r.t. all parameters)
self.wg_diff = np.zeros((mem_cell_ct, concat_len))
self.wi_diff = np.zeros((mem_cell_ct, concat_len))
self.wf_diff = np.zeros((mem_cell_ct, concat_len))
self.wo_diff = np.zeros((mem_cell_ct, concat_len))
self.bg_diff = np.zeros(mem_cell_ct)
self.bi_diff = np.zeros(mem_cell_ct)
self.bf_diff = np.zeros(mem_cell_ct)
self.bo_diff = np.zeros(mem_cell_ct)
def apply_diff(self, lr = 1):
self.wg -= lr * self.wg_diff
self.wi -= lr * self.wi_diff
self.wf -= lr * self.wf_diff
self.wo -= lr * self.wo_diff
self.bg -= lr * self.bg_diff
self.bi -= lr * self.bi_diff
self.bf -= lr * self.bf_diff
self.bo -= lr * self.bo_diff
# reset diffs to zero
self.wg_diff = np.zeros_like(self.wg)
self.wi_diff = np.zeros_like(self.wi)
self.wf_diff = np.zeros_like(self.wf)
self.wo_diff = np.zeros_like(self.wo)
self.bg_diff = np.zeros_like(self.bg)
self.bi_diff = np.zeros_like(self.bi)
self.bf_diff = np.zeros_like(self.bf)
self.bo_diff = np.zeros_like(self.bo)
# 狀態(tài)
class LstmState:
def __init__(self, mem_cell_ct, x_dim):
self.g = np.zeros(mem_cell_ct)
self.i = np.zeros(mem_cell_ct)
self.f = np.zeros(mem_cell_ct)
self.o = np.zeros(mem_cell_ct)
self.s = np.zeros(mem_cell_ct)
self.h = np.zeros(mem_cell_ct)
self.bottom_diff_h = np.zeros_like(self.h)
self.bottom_diff_s = np.zeros_like(self.s)
self.bottom_diff_x = np.zeros(x_dim)
# LSTM各節(jié)點
class LstmNode:
def __init__(self, lstm_param, lstm_state):
# 狀態(tài)和參數(shù) store reference to parameters and to activations
self.state = lstm_state
self.param = lstm_param
# 輸入x(t) 節(jié)點的非經(jīng)常性輸入 non-recurrent input to node
self.x = None
# 輸入x(t)和 h(t-1) 非經(jīng)常性輸入與經(jīng)常性輸入同時進(jìn)行 non-recurrent input concatenated with recurrent input
self.xc = None
# 前向傳播
def bottom_data_is(self, x, s_prev = None, h_prev = None):
# 首節(jié)點
if s_prev is None:
s_prev = np.zeros_like(self.state.s)
if h_prev is None:
h_prev = np.zeros_like(self.state.h)
# 為反向傳播存儲數(shù)據(jù)
self.s_prev = s_prev
self.h_prev = h_prev
# 拼接 x(t) and h(t-1)為 xc
xc = np.hstack((x, h_prev))
# 定義參數(shù)方程
self.state.g = np.tanh(np.dot(self.param.wg, xc) + self.param.bg)
self.state.i = sigmoid(np.dot(self.param.wi, xc) + self.param.bi)
self.state.f = sigmoid(np.dot(self.param.wf, xc) + self.param.bf)
self.state.o = sigmoid(np.dot(self.param.wo, xc) + self.param.bo)
self.state.s = self.state.g * self.state.i + s_prev * self.state.f
self.state.h = self.state.s * self.state.o
self.x = x
self.xc = xc
# 導(dǎo)數(shù)計算
def top_diff_is(self, top_diff_h, top_diff_s):
# notice that top_diff_s is carried along the constant error carousel
ds = self.state.o * top_diff_h + top_diff_s
do = self.state.s * top_diff_h
di = self.state.g * ds
dg = self.state.i * ds
df = self.s_prev * ds
# diffs w.r.t. vector inside sigma / tanh function
# sigmoid導(dǎo)數(shù)
di_input = (1. - self.state.i) * self.state.i * di
df_input = (1. - self.state.f) * self.state.f * df
do_input = (1. - self.state.o) * self.state.o * do
# tanh 導(dǎo)數(shù)
dg_input = (1. - self.state.g ** 2) * dg
# 權(quán)重和偏置導(dǎo)數(shù) diffs w.r.t. inputs
self.param.wi_diff += np.outer(di_input, self.xc)
self.param.wf_diff += np.outer(df_input, self.xc)
self.param.wo_diff += np.outer(do_input, self.xc)
self.param.wg_diff += np.outer(dg_input, self.xc)
self.param.bi_diff += di_input
self.param.bf_diff += df_input
self.param.bo_diff += do_input
self.param.bg_diff += dg_input
# compute bottom diff
dxc = np.zeros_like(self.xc)
dxc += np.dot(self.param.wi.T, di_input)
dxc += np.dot(self.param.wf.T, df_input)
dxc += np.dot(self.param.wo.T, do_input)
dxc += np.dot(self.param.wg.T, dg_input)
# save bottom diffs
self.state.bottom_diff_s = ds * self.state.f
self.state.bottom_diff_x = dxc[:self.param.x_dim]
self.state.bottom_diff_h = dxc[self.param.x_dim:]
# LSTM網(wǎng)絡(luò)
class LstmNetwork():
def __init__(self, lstm_param):
self.lstm_param = lstm_param
self.lstm_node_list = []
# input sequence
self.x_list = []
# 輸出節(jié)點遍歷
def y_list_is(self, y_list, loss_layer):
"""
Updates diffs by setting target sequence
with corresponding loss layer.
Will *NOT* update parameters. To update parameters,
call self.lstm_param.apply_diff()
"""
assert len(y_list) == len(self.x_list)
idx = len(self.x_list) - 1
# first node only gets diffs from label ...
loss = loss_layer.loss(self.lstm_node_list[idx].state.h, y_list[idx])
diff_h = loss_layer.bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])
# here s is not affecting loss due to h(t+1), hence we set equal to zero
diff_s = np.zeros(self.lstm_param.mem_cell_ct)
self.lstm_node_list[idx].top_diff_is(diff_h, diff_s)
idx -= 1
### ... following nodes also get diffs from next nodes, hence we add diffs to diff_h
### we also propagate error along constant error carousel using diff_s
while idx >= 0:
loss += loss_layer.loss(self.lstm_node_list[idx].state.h, y_list[idx])
diff_h = loss_layer.bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])
diff_h += self.lstm_node_list[idx + 1].state.bottom_diff_h
diff_s = self.lstm_node_list[idx + 1].state.bottom_diff_s
self.lstm_node_list[idx].top_diff_is(diff_h, diff_s)
idx -= 1
return loss
def x_list_clear(self):
self.x_list = []
def x_list_add(self, x):
self.x_list.append(x)
if len(self.x_list) > len(self.lstm_node_list):
# need to add new lstm node, create new state mem
lstm_state = LstmState(self.lstm_param.mem_cell_ct, self.lstm_param.x_dim)
self.lstm_node_list.append(LstmNode(self.lstm_param, lstm_state))
# get index of most recent x input
idx = len(self.x_list) - 1
if idx == 0:
# no recurrent inputs yet
self.lstm_node_list[idx].bottom_data_is(x)
else:
s_prev = self.lstm_node_list[idx - 1].state.s
h_prev = self.lstm_node_list[idx - 1].state.h
self.lstm_node_list[idx].bottom_data_is(x, s_prev, h_prev)
# 損失函數(shù)
class ToyLossLayer:
"""
Computes square loss with first element of hidden layer array.
"""
# 損失函數(shù)定義
@classmethod
def loss(self, pred, label):
return (pred[0] - label) ** 2
# 節(jié)點輸出求導(dǎo)
@classmethod
def bottom_diff(self, pred, label):
diff = np.zeros_like(pred)
diff[0] = 2 * (pred[0] - label)
return diff
輸入一串連續(xù)質(zhì)數(shù),預(yù)估下一個質(zhì)數(shù)
#lstm在輸入一串連續(xù)質(zhì)數(shù)時預(yù)估下一個質(zhì)數(shù)
# 測試
import numpy as np
# from lstm import LstmParam, LstmNetwork, ToyLossLayer
def example_0():
# learns to repeat simple sequence from random inputs
np.random.seed(0)
# 隱藏層節(jié)點數(shù)和詞向量維度
# parameters for input data dimension and lstm cell count
mem_cell_ct = 100
x_dim = 50
# 實例化 LSTM 參數(shù)及網(wǎng)絡(luò)
lstm_param = LstmParam(mem_cell_ct, x_dim)
lstm_net = LstmNetwork(lstm_param)
# 前項輸出
y_list = [-0.5,0.2,0.1, -0.5]
# y_list個x_dim維0-1數(shù)組
input_val_arr = [np.random.random(x_dim) for _ in y_list]
for cur_iter in range(100):
print("cur iter: ", cur_iter)
for ind in range(len(y_list)):
lstm_net.x_list_add(input_val_arr[ind])
print("y_pred[%d] : %f" % (ind, lstm_net.lstm_node_list[ind].state.h[0]))
loss = lstm_net.y_list_is(y_list, ToyLossLayer)
print("loss: ", loss)
lstm_param.apply_diff(lr=0.1)
lstm_net.x_list_clear()
if __name__ == "__main__":
example_0()
參考文章:
-
[譯] 理解 LSTM(Long Short-Term Memory, LSTM) 網(wǎng)絡(luò) - wangduo - 博客園 (cnblogs.com)
[英原文] Understanding LSTM Networks -- colah's blog - 深入理解RNN與LSTM - 知乎 (zhihu.com)
- RNN - LSTM - GRU - 知乎 (zhihu.com)
- CNN入門講解:什么是激活函數(shù)(Activation Function) - 知乎 (zhihu.com)
- 一文概覽深度學(xué)習(xí)中的激活函數(shù) | 機(jī)器之心 (jiqizhixin.com)
- [機(jī)器學(xué)習(xí)] 神經(jīng)網(wǎng)絡(luò)-各大主流激活函數(shù)-優(yōu)缺點 - CSDN
- LSTM的推導(dǎo)與實現(xiàn) - liujshi - 博客園 (cnblogs.com)
- LSTM結(jié)構(gòu)理解與python實現(xiàn)_FlyingLittlePig的博客-CSDN博客_lstm python
- 人人都能看懂的LSTM介紹及反向傳播算法推導(dǎo)(非常詳細(xì)) - 知乎 (zhihu.com)
- LSTM模型與前向反向傳播算法 - 劉建平Pinard - 博客園 (cnblogs.com)
- 反向傳播算法推導(dǎo)過程(非常詳細(xì)) - 知乎 (zhihu.com)
- python - 為什么numpy.zeros和numpy.zeros_like之間的性能差異? - IT工具網(wǎng) (coder.work)
- LSTM-基本原理-前向傳播與反向傳播過程推導(dǎo)_SZ-crystal-CSDN博客_lstm反向傳播
- LSTM背后的數(shù)學(xué)原理_日積月累,天道酬勤-CSDN博客_lstm數(shù)學(xué)原理
- 從零實現(xiàn)循環(huán)神經(jīng)網(wǎng)絡(luò)_日積月累,天道酬勤-CSDN博客
- LSTM cell結(jié)構(gòu)的理解和計算_songhk0209的博客-CSDN博客
- LSTM神經(jīng)網(wǎng)絡(luò)的詳細(xì)推導(dǎo)及C++實現(xiàn)_新博客:https://aping-dev.com/-CSDN博客
- 詞向量維度和隱層神經(jīng)元數(shù)目的關(guān)系_atarik@163.com-CSDN博客
- 最小熵原理系列:詞向量的維度應(yīng)該怎么選擇? - 知乎 (zhihu.com)
- 詳解LSTM - 知乎 (zhihu.com)







