LSTM學(xué)習(xí)

RNN&LSTM

當(dāng)前輸出的產(chǎn)生與之前的信息(輸出、狀態(tài))相關(guān)
RNN:過去發(fā)生的事件可以預(yù)測現(xiàn)在
LSTM(進(jìn)階RNN):過去發(fā)生的事情太多了,全部記住的話,處理起來過于復(fù)雜,因此選擇遺忘不重要事件,記住重要的(解決長期依賴問題)

LSTM 核心思想

Sigmoid 層輸出 0 到 1 之間的數(shù)值,可以用來描述每個部分有多少需要記住。0 指代“全忘記”,1 指“記住所有”
原本RNN只有 tanh層 來處理當(dāng)前輸出(過去和現(xiàn)在輸入),LSTM將 Sigmoid 層運用到 tanh層,增添了狀態(tài)信息,并進(jìn)一步將其設(shè)計為門結(jié)構(gòu),來決定當(dāng)前輸出。

LSTM 有通過精心設(shè)計的稱作為“門”的結(jié)構(gòu)來去除或者增加信息到細(xì)胞狀態(tài)的能力。
門是一種讓信息選擇式通過的方法。他們包含一個 sigmoid 神經(jīng)網(wǎng)絡(luò)層和一個pointwise 乘法操作。


LSTM 擁有三個門,來保護(hù)和控制細(xì)胞狀態(tài)。f,i,o 分別表示遺忘門、輸入門、輸出門
分別表示遺忘門、輸入門、輸出門

前向傳播

當(dāng)前狀態(tài)

LSTM 設(shè)計了狀態(tài)信息Ct來選擇是否遺忘,因而三個門中,有兩個是用來產(chǎn)生狀態(tài)信息的,如下:

    1. (遺忘門 f )sigmoid (Ht-1,Xt)(pointwise)C t-1 ;——乘:有 0為 0 ,并且以后所有乘0都為0
    1. (輸入門 i )sigmoid (Ht-1,Xt)(pointwise) tanh(Ht-1,Xt) ;
  • C t = 1輸出 + 2輸出 ;*——與:有0為0,同1為 1 // 加號:同0為0,異為1,同1為2
    狀態(tài)信息的產(chǎn)生
當(dāng)前輸出
  • (輸出門 o )sigmoid (Ht-1,Xt)(pointwise) tanh(Ct)

其中 sigmoid (x,y)tanh(x,y) 等涉及的兩個向量x,y處理為 W(x,y)+b,即 sigmoid (W(x,y)+b)、tanh(W(x,y)+b)。再將兩個參數(shù)向量合并 Xc(t) = [X(t),H(t-1)],則 sigmoid (W(Xc)+b)、tanh(W(Xc)+b)。

則當(dāng)前cell相關(guān)方程如下(s 為狀態(tài) C):

參數(shù)方程

LSTM當(dāng)前節(jié)點參數(shù)圖

sigmoid和tanh的區(qū)別

sigmoid 在特征相差比較復(fù)雜或是相差不大時效果比較好
tanh 在特征相差明顯時的效果會很好,在循環(huán)過程中會不斷擴(kuò)大特征效果,0均值

sigmoid在輸入處于[-1,1]之間時,函數(shù)值變化敏感,一旦接近或者超出區(qū)間就失去敏感性,處于飽和狀態(tài),影響神經(jīng)網(wǎng)絡(luò)預(yù)測的精度值。tanh的輸出和輸入能夠保持非線性單調(diào)上升和下降關(guān)系,符合BP網(wǎng)絡(luò)的梯度求解,容錯性好,有界,漸進(jìn)于0、1,符合人腦神經(jīng)飽和的規(guī)律,但比 sigmoid 函數(shù)延遲了飽和期。

sigmoid
sigmoid函數(shù)及其導(dǎo)數(shù)

tanh
tanh函數(shù)及其導(dǎo)數(shù)

反向傳播

已知部分參數(shù)基于損失函數(shù) t+1 時刻的偏導(dǎo)數(shù),計算所有參數(shù)基于損失函數(shù) t 時刻的偏導(dǎo)數(shù)
反向傳播計算誤差的傳播沿兩個方向,分別為從輸出層傳遞至輸入層,以及沿時間 t 的反向傳播
先計算h輸出誤差,用h表述s狀態(tài)的誤差,再用h和s計算其他參數(shù)誤差

  1. diff_h

損失函數(shù)

h(t),y(t) 分別為輸出序列與樣本標(biāo)簽
diff_h = 2·sum(h(t)-y(t))
輸出誤差計算

def bottom_diff(self, pred, label):
  diff = np.zeros_like(pred)
  diff[0] = 2 * (pred[0] - label)
  return diff

while idx >= 0:
  diff_h = bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])
  diff_h += self.lstm_node_list[idx + 1].state.bottom_diff_h
  1. diff_s


    diff_s
        ds = self.state.o * top_diff_h + top_diff_s
  1. 其他 diff_o i g f


    反向傳播導(dǎo)數(shù)
        do = self.state.s * top_diff_h
        di = self.state.g * ds
        dg = self.state.i * ds
        df = self.s_prev * ds
  1. wi_diff


    權(quán)重 diff
        wi_diff += np.outer( (1. - i) * i *di, xc)

損失 J 對 h 的導(dǎo)數(shù)—— sigmoid 型

1. - i 相比較 1 - i ,若 i 為整數(shù),結(jié)果為 float


以下代碼主要來自參考文章7

LSTM 代碼實現(xiàn)

import random

import numpy as np
import math

def sigmoid(x): 
    return 1. / (1 + np.exp(-x))

# createst uniform random array w/ values in [a,b) and shape args
def rand_arr(a, b, *args): 
    np.random.seed(0)
    return np.random.rand(*args) * (b - a) + a
# 參數(shù)
class LstmParam:
    def __init__(self, mem_cell_ct, x_dim):
        self.mem_cell_ct = mem_cell_ct
        self.x_dim = x_dim
        concat_len = x_dim + mem_cell_ct
        # 權(quán)重矩陣 weight matrices
        self.wg = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)
        self.wi = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len) 
        self.wf = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)
        self.wo = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)
        # 偏置 bias terms
        self.bg = rand_arr(-0.1, 0.1, mem_cell_ct) 
        self.bi = rand_arr(-0.1, 0.1, mem_cell_ct) 
        self.bf = rand_arr(-0.1, 0.1, mem_cell_ct) 
        self.bo = rand_arr(-0.1, 0.1, mem_cell_ct) 
        # 梯度 diffs (derivative of loss function w.r.t. all parameters)
        self.wg_diff = np.zeros((mem_cell_ct, concat_len)) 
        self.wi_diff = np.zeros((mem_cell_ct, concat_len)) 
        self.wf_diff = np.zeros((mem_cell_ct, concat_len)) 
        self.wo_diff = np.zeros((mem_cell_ct, concat_len)) 
        self.bg_diff = np.zeros(mem_cell_ct) 
        self.bi_diff = np.zeros(mem_cell_ct) 
        self.bf_diff = np.zeros(mem_cell_ct) 
        self.bo_diff = np.zeros(mem_cell_ct) 

    def apply_diff(self, lr = 1):
        self.wg -= lr * self.wg_diff
        self.wi -= lr * self.wi_diff
        self.wf -= lr * self.wf_diff
        self.wo -= lr * self.wo_diff
        self.bg -= lr * self.bg_diff
        self.bi -= lr * self.bi_diff
        self.bf -= lr * self.bf_diff
        self.bo -= lr * self.bo_diff
        # reset diffs to zero
        self.wg_diff = np.zeros_like(self.wg)
        self.wi_diff = np.zeros_like(self.wi) 
        self.wf_diff = np.zeros_like(self.wf) 
        self.wo_diff = np.zeros_like(self.wo) 
        self.bg_diff = np.zeros_like(self.bg)
        self.bi_diff = np.zeros_like(self.bi) 
        self.bf_diff = np.zeros_like(self.bf) 
        self.bo_diff = np.zeros_like(self.bo) 
# 狀態(tài)
class LstmState:
    def __init__(self, mem_cell_ct, x_dim):
        self.g = np.zeros(mem_cell_ct)
        self.i = np.zeros(mem_cell_ct)
        self.f = np.zeros(mem_cell_ct)
        self.o = np.zeros(mem_cell_ct)
        self.s = np.zeros(mem_cell_ct)
        self.h = np.zeros(mem_cell_ct)
        self.bottom_diff_h = np.zeros_like(self.h)
        self.bottom_diff_s = np.zeros_like(self.s)
        self.bottom_diff_x = np.zeros(x_dim)
# LSTM各節(jié)點    
class LstmNode:
    def __init__(self, lstm_param, lstm_state):
        # 狀態(tài)和參數(shù) store reference to parameters and to activations
        self.state = lstm_state
        self.param = lstm_param
        # 輸入x(t) 節(jié)點的非經(jīng)常性輸入 non-recurrent input to node
        self.x = None
        # 輸入x(t)和 h(t-1) 非經(jīng)常性輸入與經(jīng)常性輸入同時進(jìn)行 non-recurrent input concatenated with recurrent input
        self.xc = None
    # 前向傳播
    def bottom_data_is(self, x, s_prev = None, h_prev = None):
        # 首節(jié)點
        if s_prev is None:
            s_prev = np.zeros_like(self.state.s)
        if h_prev is None:
            h_prev = np.zeros_like(self.state.h)
        # 為反向傳播存儲數(shù)據(jù)
        self.s_prev = s_prev
        self.h_prev = h_prev

        # 拼接 x(t) and h(t-1)為 xc
        xc = np.hstack((x,  h_prev))
        # 定義參數(shù)方程
        self.state.g = np.tanh(np.dot(self.param.wg, xc) + self.param.bg)
        self.state.i = sigmoid(np.dot(self.param.wi, xc) + self.param.bi)
        self.state.f = sigmoid(np.dot(self.param.wf, xc) + self.param.bf)
        self.state.o = sigmoid(np.dot(self.param.wo, xc) + self.param.bo)
        self.state.s = self.state.g * self.state.i + s_prev * self.state.f
        self.state.h = self.state.s * self.state.o
        
        self.x = x
        self.xc = xc
# 導(dǎo)數(shù)計算    
    def top_diff_is(self, top_diff_h, top_diff_s):
        # notice that top_diff_s is carried along the constant error carousel
        ds = self.state.o * top_diff_h + top_diff_s
        do = self.state.s * top_diff_h
        di = self.state.g * ds
        dg = self.state.i * ds
        df = self.s_prev * ds

        # diffs w.r.t. vector inside sigma / tanh function
        # sigmoid導(dǎo)數(shù)
        di_input = (1. - self.state.i) * self.state.i * di 
        df_input = (1. - self.state.f) * self.state.f * df 
        do_input = (1. - self.state.o) * self.state.o * do 
        # tanh 導(dǎo)數(shù)
        dg_input = (1. - self.state.g ** 2) * dg

        # 權(quán)重和偏置導(dǎo)數(shù) diffs w.r.t. inputs
        self.param.wi_diff += np.outer(di_input, self.xc)
        self.param.wf_diff += np.outer(df_input, self.xc)
        self.param.wo_diff += np.outer(do_input, self.xc)
        self.param.wg_diff += np.outer(dg_input, self.xc)
        
        self.param.bi_diff += di_input
        self.param.bf_diff += df_input       
        self.param.bo_diff += do_input
        self.param.bg_diff += dg_input       

        # compute bottom diff
        dxc = np.zeros_like(self.xc)
        dxc += np.dot(self.param.wi.T, di_input)
        dxc += np.dot(self.param.wf.T, df_input)
        dxc += np.dot(self.param.wo.T, do_input)
        dxc += np.dot(self.param.wg.T, dg_input)

        # save bottom diffs
        self.state.bottom_diff_s = ds * self.state.f
        self.state.bottom_diff_x = dxc[:self.param.x_dim]
        self.state.bottom_diff_h = dxc[self.param.x_dim:]
# LSTM網(wǎng)絡(luò)
class LstmNetwork():
    def __init__(self, lstm_param):
        self.lstm_param = lstm_param
        self.lstm_node_list = []
        # input sequence
        self.x_list = []
    # 輸出節(jié)點遍歷
    def y_list_is(self, y_list, loss_layer):
        """
        Updates diffs by setting target sequence 
        with corresponding loss layer. 
        Will *NOT* update parameters.  To update parameters,
        call self.lstm_param.apply_diff()
        """
        assert len(y_list) == len(self.x_list)
        idx = len(self.x_list) - 1
        # first node only gets diffs from label ...
        loss = loss_layer.loss(self.lstm_node_list[idx].state.h, y_list[idx])
        diff_h = loss_layer.bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])
        # here s is not affecting loss due to h(t+1), hence we set equal to zero
        diff_s = np.zeros(self.lstm_param.mem_cell_ct)
        self.lstm_node_list[idx].top_diff_is(diff_h, diff_s)
        idx -= 1

        ### ... following nodes also get diffs from next nodes, hence we add diffs to diff_h
        ### we also propagate error along constant error carousel using diff_s
        while idx >= 0:
            loss += loss_layer.loss(self.lstm_node_list[idx].state.h, y_list[idx])
            diff_h = loss_layer.bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])
            diff_h += self.lstm_node_list[idx + 1].state.bottom_diff_h
            diff_s = self.lstm_node_list[idx + 1].state.bottom_diff_s
            self.lstm_node_list[idx].top_diff_is(diff_h, diff_s)
            idx -= 1 

        return loss

    def x_list_clear(self):
        self.x_list = []

    def x_list_add(self, x):
        self.x_list.append(x)
        if len(self.x_list) > len(self.lstm_node_list):
            # need to add new lstm node, create new state mem
            lstm_state = LstmState(self.lstm_param.mem_cell_ct, self.lstm_param.x_dim)
            self.lstm_node_list.append(LstmNode(self.lstm_param, lstm_state))

        # get index of most recent x input
        idx = len(self.x_list) - 1
        if idx == 0:
            # no recurrent inputs yet
            self.lstm_node_list[idx].bottom_data_is(x)
        else:
            s_prev = self.lstm_node_list[idx - 1].state.s
            h_prev = self.lstm_node_list[idx - 1].state.h
            self.lstm_node_list[idx].bottom_data_is(x, s_prev, h_prev)
# 損失函數(shù)
class ToyLossLayer:
    """
    Computes square loss with first element of hidden layer array.
    """
    # 損失函數(shù)定義
    @classmethod
    def loss(self, pred, label):
        return (pred[0] - label) ** 2
    # 節(jié)點輸出求導(dǎo)
    @classmethod
    def bottom_diff(self, pred, label):
        diff = np.zeros_like(pred)
        diff[0] = 2 * (pred[0] - label)
        return diff

輸入一串連續(xù)質(zhì)數(shù),預(yù)估下一個質(zhì)數(shù)

#lstm在輸入一串連續(xù)質(zhì)數(shù)時預(yù)估下一個質(zhì)數(shù)
# 測試
import numpy as np

# from lstm import LstmParam, LstmNetwork, ToyLossLayer

def example_0():
    # learns to repeat simple sequence from random inputs
    np.random.seed(0)
    # 隱藏層節(jié)點數(shù)和詞向量維度
    # parameters for input data dimension and lstm cell count 
    mem_cell_ct = 100
    x_dim = 50

    # 實例化 LSTM 參數(shù)及網(wǎng)絡(luò)
    lstm_param = LstmParam(mem_cell_ct, x_dim) 
    lstm_net = LstmNetwork(lstm_param)
    
    # 前項輸出
    y_list = [-0.5,0.2,0.1, -0.5]
    # y_list個x_dim維0-1數(shù)組
    input_val_arr = [np.random.random(x_dim) for _ in y_list]

    for cur_iter in range(100):
        print("cur iter: ", cur_iter)
        for ind in range(len(y_list)):
            lstm_net.x_list_add(input_val_arr[ind])
            print("y_pred[%d] : %f" % (ind, lstm_net.lstm_node_list[ind].state.h[0]))
        loss = lstm_net.y_list_is(y_list, ToyLossLayer)
        print("loss: ", loss)
        lstm_param.apply_diff(lr=0.1)
        lstm_net.x_list_clear()

if __name__ == "__main__":
    example_0()

參考文章

  1. [譯] 理解 LSTM(Long Short-Term Memory, LSTM) 網(wǎng)絡(luò) - wangduo - 博客園 (cnblogs.com)
    [英原文] Understanding LSTM Networks -- colah's blog
  2. 深入理解RNN與LSTM - 知乎 (zhihu.com)
  3. RNN - LSTM - GRU - 知乎 (zhihu.com)
  4. CNN入門講解:什么是激活函數(shù)(Activation Function) - 知乎 (zhihu.com)
  5. 一文概覽深度學(xué)習(xí)中的激活函數(shù) | 機(jī)器之心 (jiqizhixin.com)
  6. [機(jī)器學(xué)習(xí)] 神經(jīng)網(wǎng)絡(luò)-各大主流激活函數(shù)-優(yōu)缺點 - CSDN
  7. LSTM的推導(dǎo)與實現(xiàn) - liujshi - 博客園 (cnblogs.com)
  8. LSTM結(jié)構(gòu)理解與python實現(xiàn)_FlyingLittlePig的博客-CSDN博客_lstm python
  9. 人人都能看懂的LSTM介紹及反向傳播算法推導(dǎo)(非常詳細(xì)) - 知乎 (zhihu.com)
  10. LSTM模型與前向反向傳播算法 - 劉建平Pinard - 博客園 (cnblogs.com)
  11. 反向傳播算法推導(dǎo)過程(非常詳細(xì)) - 知乎 (zhihu.com)
  12. python - 為什么numpy.zeros和numpy.zeros_like之間的性能差異? - IT工具網(wǎng) (coder.work)
  13. LSTM-基本原理-前向傳播與反向傳播過程推導(dǎo)_SZ-crystal-CSDN博客_lstm反向傳播
  14. LSTM背后的數(shù)學(xué)原理_日積月累,天道酬勤-CSDN博客_lstm數(shù)學(xué)原理
  15. 從零實現(xiàn)循環(huán)神經(jīng)網(wǎng)絡(luò)_日積月累,天道酬勤-CSDN博客
  16. LSTM cell結(jié)構(gòu)的理解和計算_songhk0209的博客-CSDN博客
  17. LSTM神經(jīng)網(wǎng)絡(luò)的詳細(xì)推導(dǎo)及C++實現(xiàn)_新博客:https://aping-dev.com/-CSDN博客
  18. 詞向量維度和隱層神經(jīng)元數(shù)目的關(guān)系_atarik@163.com-CSDN博客
  19. 最小熵原理系列:詞向量的維度應(yīng)該怎么選擇? - 知乎 (zhihu.com)
  20. 詳解LSTM - 知乎 (zhihu.com)
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容