論文-A Neural Probabilistic Language Model(NNLM)

1. 簡稱

論文《A Neural Probabilistic Language Model》簡稱NNLM,作者Yoshua Bengio,經(jīng)典的神經(jīng)語言模型。

2. 摘要

統(tǒng)計語言模型建模的目標是學(xué)習(xí)語言中單詞序列的聯(lián)合概率函數(shù)。由于維數(shù)上的災(zāi)難,這本質(zhì)上是困難的:基于n-gram的傳統(tǒng)但非常成功的方法是通過連接在訓(xùn)練集中看到的非常短的重疊序列來獲得泛化。

我們建議通過學(xué)習(xí)單詞的分布式表示來對抗維數(shù)的災(zāi)難。該模型同時學(xué)習(xí):

  1. 每個單詞的分布式表示
  2. 用這些表示的單詞序列的概率函數(shù)

獲得泛化是因為如果以前從未見過的單詞序列與形成已經(jīng)看過的句子的單詞(在具有附近表示的意義上)的單詞構(gòu)成,則該單詞序列獲得很高的概率。

在合理的時間內(nèi)訓(xùn)練如此大的模型(具有數(shù)百萬個參數(shù))本身就是一個重大的挑戰(zhàn)。

本論文報告了使用神經(jīng)網(wǎng)絡(luò)進行概率函數(shù)的實驗,在兩個文本語料庫上表明,所提出的方法顯著改進了最先進n元語法模型,并且所提出的方法允許利用更長的上下文。

3. 核心

NNLM

更準確地說,神經(jīng)網(wǎng)絡(luò)使用softmax輸出層計算以下函數(shù),該函數(shù)可確??偤蜑?的正概率:

\hat{P}(w_t|w_{t-1},...w_{t-n+1})=\frac{e^{y_{w_t}}}{\sum_ie^{y_i}}\tag{3.1}

y_i是每個輸出單詞i的未歸一化對數(shù)概率,其計算如下,參數(shù)為b,W,U,dH

y=b+Wx+Utanh(d+Hx)\tag{3.2}

在逐個元素地應(yīng)用雙曲正切tanh的情況下,W任選為零(無直接連接),并且x是單詞特征層激活向量,其是來自矩陣C的輸入單詞特征的級聯(lián):

x=(C(w_{t-1}), C(w_{t-2}),..., C(w_{t-n+1}))\tag{3.3}

設(shè)h是隱藏單元的數(shù)量,m是與每個單詞相關(guān)聯(lián)的特征的數(shù)量。當不需要從字特征到輸出的直接連接時,矩陣W被設(shè)置為0。模型的自由參數(shù)是輸出偏差b(具有|V|元素),隱藏層偏差d(具有h元素),隱藏到輸出權(quán)重U(|V|×h矩陣),詞特征到輸出權(quán)重W(|V|×(n?1)m矩陣),隱藏層權(quán)重H(h×(n?1)m矩陣),單詞特征C(|V|×m矩陣):

\theta=(b,d,W,U,H,C)\tag{3.4}

自由參數(shù)的個數(shù)為|V|(1+nm+h)+h(1+(n?1)m)。主導(dǎo)因素為|V|(nm+h)。請注意,在理論上,如果在權(quán)重WH上有權(quán)重衰減,但在C上沒有,那么WH可以向零收斂,而C會爆炸。在實踐中,我們沒有觀察到這種行為時,隨機梯度上升訓(xùn)練。

神經(jīng)網(wǎng)絡(luò)上的隨機梯度上升包括在呈現(xiàn)訓(xùn)練語料庫的第t個單詞之后執(zhí)行以下迭代更新:

\theta\leftarrow\theta\frac{\partial log\hat{P}(w_t|w_{t-1},...,w_{t-n+1})}{\partial\theta}\tag{3.5}

其中ε是“學(xué)習(xí)率”。
注意,在每個示例之后不需要更新或訪問很大一部分參數(shù):單詞特征C(J)的所有單詞j中沒有出現(xiàn)在輸入窗口中。

代碼編寫

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

sentence = """When forty winters shall besiege thy brow,
And dig deep trenches in thy beauty's field,
Thy youth's proud livery so gazed on now,
Will be a totter'd weed of small worth held:
Then being asked, where all thy beauty lies,
Where all the treasure of thy lusty days;
To say, within thine own deep sunken eyes,
Were an all-eating shame, and thriftless praise.
How much more praise deserv'd thy beauty's use,
If thou couldst answer 'This fair child of mine
Shall sum my count, and make my old excuse,'
Proving his beauty by succession thine!
This were to be new made when thou art old,
And see thy blood warm when thou feel'st it cold.""".split()

# 準備詞表與相關(guān)字典
vocab = set(sentence)
print(vocab)
word2index = {w:i for i, w in enumerate(vocab)}
print(word2index)
index2word = {i:w for i, w in enumerate(vocab)}
print(index2word)

# 準備N-gram訓(xùn)練數(shù)據(jù) each tuple is ([word_i-2, word_i-1], target word)
trigrams = [([sentence[i], sentence[i+1]], sentence[i+2]) for i in range(len(sentence)-2)]
print(trigrams[0])


# 模型所需參數(shù)
CONTEXT_SIZE = 2
EMBEDDING_DIM = 10

# 創(chuàng)建模型
class NGramLanguageModler(nn.Module):

    def __init__(self, vocab_size, context_size, embedding_dim, hidden_dim):
        super(NGramLanguageModler, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.linear1 = nn.Linear(context_size * embedding_dim, hidden_dim)
        self.linear2 = nn.Linear(context_size * embedding_dim, vocab_size)
        self.linear3 = nn.Linear(hidden_dim, vocab_size)

    def forward(self, inputs):
        embeds = self.embedding(inputs).view(1, -1)
        out = torch.tanh(self.linear1(embeds))
        out = self.linear3(out) + self.linear2(embeds)
        return out

losses = []
loss_function = nn.CrossEntropyLoss()
model = NGramLanguageModler(len(vocab), CONTEXT_SIZE, EMBEDDING_DIM, 128)
optimizer = optim.SGD(model.parameters(), lr = 0.001)

for epoch in range(10):
    total_loss = 0

    for context, target in trigrams:
        # Step 1. Prepare the inputs to be passed to the model
        context_idx = torch.tensor([[word2index[w]] for w in context], dtype=torch.long)

        # Step 2. Before passing in a new instance, you need to zero out the gradients from the old instance
        model.zero_grad()

        # Step 3. Run forward pass
        out = model(context_idx)

        # Step 4. Compute your loss function.
        loss = loss_function(out, torch.tensor([word2index[target]], dtype=torch.long))

        # Step 5. Do the backword pass and update the gradient
        loss.backward()
        optimizer.step()

        # Get the Python number from a 1-element Tensor by calling tensor.item()
        total_loss += loss.item()
    
    losses.append(total_loss)

print(losses) # The loss decreased every iteration over the training data!

# 結(jié)果
{'and', 'the', 'answer', 'gazed', 'besiege', 'To', "'This", 'mine', 'old', 'Thy', 'own', 'blood', 'now,', 'thy', 'say,', "youth's", 'worth', 'thriftless', 'of', 'Will', 'a', 'use,', 'thine', 'where', 'count,', 'Shall', 'Where', 'sum', 'much', "deserv'd", 'succession', 'new', 'held:', 'to', 'And', 'praise.', 'When', 'livery', 'all-eating', "beauty's", 'within', 'be', 'treasure', 'weed', 'How', 'deep', 'all', 'trenches', 'more', 'eyes,', "feel'st", 'beauty', 'sunken', 'forty', 'winters', 'This', 'shall', 'my', 'thou', 'proud', 'Proving', 'when', 'warm', 'dig', 'shame,', 'lusty', 'in', 'small', 'field,', 'an', 'it', 'couldst', 'make', 'thine!', "excuse,'", 'being', 'Then', 'art', 'brow,', 'see', 'cold.', 'fair', 'were', 'his', 'so', 'lies,', 'made', 'days;', 'child', 'If', 'on', 'praise', 'by', 'asked,', 'old,', "totter'd", 'Were'}
{'and': 0, 'the': 1, 'answer': 2, 'gazed': 3, 'besiege': 4, 'To': 5, "'This": 6, 'mine': 7, 'old': 8, 'Thy': 9, 'own': 10, 'blood': 11, 'now,': 12, 'thy': 13, 'say,': 14, "youth's": 15, 'worth': 16, 'thriftless': 17, 'of': 18, 'Will': 19, 'a': 20, 'use,': 21, 'thine': 22, 'where': 23, 'count,': 24, 'Shall': 25, 'Where': 26, 'sum': 27, 'much': 28, "deserv'd": 29, 'succession': 30, 'new': 31, 'held:': 32, 'to': 33, 'And': 34, 'praise.': 35, 'When': 36, 'livery': 37, 'all-eating': 38, "beauty's": 39, 'within': 40, 'be': 41, 'treasure': 42, 'weed': 43, 'How': 44, 'deep': 45, 'all': 46, 'trenches': 47, 'more': 48, 'eyes,': 49, "feel'st": 50, 'beauty': 51, 'sunken': 52, 'forty': 53, 'winters': 54, 'This': 55, 'shall': 56, 'my': 57, 'thou': 58, 'proud': 59, 'Proving': 60, 'when': 61, 'warm': 62, 'dig': 63, 'shame,': 64, 'lusty': 65, 'in': 66, 'small': 67, 'field,': 68, 'an': 69, 'it': 70, 'couldst': 71, 'make': 72, 'thine!': 73, "excuse,'": 74, 'being': 75, 'Then': 76, 'art': 77, 'brow,': 78, 'see': 79, 'cold.': 80, 'fair': 81, 'were': 82, 'his': 83, 'so': 84, 'lies,': 85, 'made': 86, 'days;': 87, 'child': 88, 'If': 89, 'on': 90, 'praise': 91, 'by': 92, 'asked,': 93, 'old,': 94, "totter'd": 95, 'Were': 96}
{0: 'and', 1: 'the', 2: 'answer', 3: 'gazed', 4: 'besiege', 5: 'To', 6: "'This", 7: 'mine', 8: 'old', 9: 'Thy', 10: 'own', 11: 'blood', 12: 'now,', 13: 'thy', 14: 'say,', 15: "youth's", 16: 'worth', 17: 'thriftless', 18: 'of', 19: 'Will', 20: 'a', 21: 'use,', 22: 'thine', 23: 'where', 24: 'count,', 25: 'Shall', 26: 'Where', 27: 'sum', 28: 'much', 29: "deserv'd", 30: 'succession', 31: 'new', 32: 'held:', 33: 'to', 34: 'And', 35: 'praise.', 36: 'When', 37: 'livery', 38: 'all-eating', 39: "beauty's", 40: 'within', 41: 'be', 42: 'treasure', 43: 'weed', 44: 'How', 45: 'deep', 46: 'all', 47: 'trenches', 48: 'more', 49: 'eyes,', 50: "feel'st", 51: 'beauty', 52: 'sunken', 53: 'forty', 54: 'winters', 55: 'This', 56: 'shall', 57: 'my', 58: 'thou', 59: 'proud', 60: 'Proving', 61: 'when', 62: 'warm', 63: 'dig', 64: 'shame,', 65: 'lusty', 66: 'in', 67: 'small', 68: 'field,', 69: 'an', 70: 'it', 71: 'couldst', 72: 'make', 73: 'thine!', 74: "excuse,'", 75: 'being', 76: 'Then', 77: 'art', 78: 'brow,', 79: 'see', 80: 'cold.', 81: 'fair', 82: 'were', 83: 'his', 84: 'so', 85: 'lies,', 86: 'made', 87: 'days;', 88: 'child', 89: 'If', 90: 'on', 91: 'praise', 92: 'by', 93: 'asked,', 94: 'old,', 95: "totter'd", 96: 'Were'}
(['When', 'forty'], 'winters')
[542.6012270450592, 536.4575519561768, 530.3622291088104, 524.314457654953, 518.3134853839874, 512.3586511611938, 506.44934606552124, 500.58502769470215, 494.7652368545532, 488.98955368995667]

參考文獻

  1. A Neural Probabilistic Language Model
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容