論文鏈接:http://pdfs.semanticscholar.org/b28f/7e2996b6ee2784dd2dbb8212cfa0c79ba9e7.pdf
本文提出一個(gè)基于深度記憶網(wǎng)絡(luò)的aspect level sentiment分類(lèi),不像feature-based SVM和序列神經(jīng)網(wǎng)絡(luò)LSTM,本文的方法可以提取每個(gè)上下文詞的重要度,用于預(yù)測(cè)一個(gè)aspect的sentiment polarity。用多個(gè)計(jì)算層和一個(gè)外部memory。
Introduction:
Given a sentence and an aspect occurring in the sentence, this task aims at inferring the sentiment polarity(e.g. positive, negative, neutral) of the aspect.
本文的方法是data-driven的,不依賴(lài)于句法分析器和情感詞典。本文的方法包括多個(gè)計(jì)算層,參數(shù)共享。
Each layer is a content- and location- based attention model, which first learns the importance/weight of each context word and then utilizes this information of each context word and then utilizes this information to calculate continuous text representation.
最后一層的text representation被看做用于情感分類(lèi)的特征。
Deep memory network for aspect level sentiment classification:
1. Task definition and notation:
給定一個(gè)句子s = {w1, w2, ..., wi, ..., wn}包括n個(gè)單詞和一個(gè)aspect word wi1,這個(gè)wi1出現(xiàn)在句子s中。aspect level的情感分類(lèi)希望確定the sentiment polarity of sentence s towards the aspect wi。(在實(shí)際中,會(huì)有多個(gè)單詞的aspect,例如“battery life”,為了簡(jiǎn)化問(wèn)題本文將aspect定義為單個(gè)單詞)
2. An overview of the approach
詞向量被分為兩部分,aspect representation和context representation,如果aspect是單個(gè)詞例如"food"或"service",aspect representation是aspect詞的embedding。多個(gè)詞的aspect例如“battery”,aspect representation是平均值。本文只用單個(gè)詞的情況。
上下文的vectors是{e1, e2, ..., ei-1, ei+1, ..., en} 堆疊起來(lái),看作是外部memory,n是句子長(zhǎng)度

本文的方法包括多個(gè)layers(hops),each of which contains an attention layer and a linear layer.
在第一個(gè)computational layer(hop)中,將aspect作為輸入,根據(jù)attention層從memory m中選擇重要信息。attention層的輸出和aspect向量的變換求和,結(jié)果作為下一層的輸入,最后一個(gè)hop的輸出作為representation of sentence with regard to the aspect。attention和linear layers的權(quán)值共享。

3. Content attention
The basic idea of attention mechanism is that it assigns a weight/importance to each lower position when computing an upper level representation
輸入:external memory m屬于Rdxk,和一個(gè)aspect vector vaspect屬于Rdx1。輸出是每個(gè)memory的加權(quán)和。

k是memory的大小,ai屬于[0,1],是mi的權(quán)重,ai的總和為1。對(duì)于每個(gè)memory mi,我們使用一個(gè)前向神經(jīng)網(wǎng)絡(luò)來(lái)計(jì)算mi與aspect的語(yǔ)義關(guān)聯(lián)。打分函數(shù)通過(guò)如下計(jì)算,

得到了{(lán)g1, g2, ..., gk},使用softmax計(jì)算importance score

attention模型有兩個(gè)優(yōu)點(diǎn),一個(gè)是可以增加importance score,根據(jù)mi與aspect的語(yǔ)義關(guān)聯(lián)。另一個(gè)優(yōu)點(diǎn)是attention模型是differentiable(可微的?),用端對(duì)端的方式來(lái)訓(xùn)練。
4. Location Attention
Model1:來(lái)自(End-to-End memory network)

vi屬于Rdx1,是wi的location vector。

n是句子長(zhǎng)度,k是hop number, li是wi的location
Model2:Model1的簡(jiǎn)化版本,在不同的hop使用同樣的location vector

Model3:?

將location vector看做是參數(shù),All the position vectors are stacked in a position embedding matrix, which is jointly learned with gradient descent.
Model4:
location vector看作是參數(shù),location representation被看做是一個(gè)neural gates,來(lái)控制有多少部分的單詞語(yǔ)義要寫(xiě)到memory中,使用sigmoid函數(shù)計(jì)算,mi是一個(gè)element-wise multiplication。

5. The need for multiple hops
It is widely accepted that computational models that are composed of multiple processing layers have the ability to learn representations of data with multiple levels of abstraction

T是所有的訓(xùn)練實(shí)例,C是sentiment categories
(s, a)代表了 sentence-aspect pair
Pc(s, a)是預(yù)測(cè)(s,a)的分類(lèi)為c的概率,Pcg(s, a)是1或者0,代表正確答案是否是c