摘要:CNN在sentence modeling和分類取得了state-of-the-art的結(jié)果,但是這些都是處理詞向量sequentially并且忽略long-distance依賴。為了結(jié)合深度學(xué)習(xí)和句子結(jié)構(gòu),本文提出了一種dependency-based convolution approach,使用tree-based n-grams而不是surface ones,因此使用non-local interactions with words。
CNNs被使用在NLP的問題上,例如sequence labeling(Collober et al, 2011),semantic parsing(Yin et al. 2014)? 和search query retrieval(Shen et al., 2014)。更近的是sentence modeling(Kalchbrenner et al. 2014, Kim, 2014)在很多分類問題上,例如sentiment,subjectivity和question-type classification。然而,有一個問題,CNN是基于像素矩陣的方法,只考慮連續(xù)的sequential n-grams而忽視長期以來,例如negation否定,subordination主從關(guān)系,和wh-extraction。
sentiment分析中,researchers結(jié)合了來自syntactic parse tree的long-distance information,一些說有small improvements,另一些說并沒有。。。
本文作者懷疑是因為data sparsity,根據(jù)他們的實驗,tree n-gram比surface n-gram會稀疏很多。但是這個問題被word embedding減輕了。
Dependency-based Convolution:
第i個詞和第(i+j)詞的級聯(lián)操作

然而這個操作不能獲取long-distance relationships,除非增大窗口大小,但是會造成數(shù)據(jù)稀疏問題。
Convolution on Ancestor Paths:

生成一個句子的feature map:


Max-Over-Tree Pooling and Dropout:
公式4可以當(dāng)做pattern detection:only the most similar pattern between the words and the filter could return the maximum activation。
在sequential CNNs中,max-over-time polling(Collobert et al.2011, Kim,2014) 在feature map上操作獲得最大的activation代表整個feature map

本文的DCNNs也pool the maximum activation from feature map.
為了獲取足夠多的variations,隨機設(shè)置filters來detect different structure patterns。
每個filter的高度是numbers of words,寬度是word representation的維度d
each filter will be represented by only one feature after max-over-tree pooling,after a series of convolution with different filter with different height,multiple features carry different structural information become the final representation of the input sentence。
Then, this sentence representation is passed to a fully connected soft-max layer and outputs a? distribution over different label.
Convolution on Siblings:
ancestor paths不能獲取足夠的linguistic phenomena,例如conjunction連接詞, Inspired by higher-order dependency parsing(Mc-Donald and Pereira,2006; Koo and Collins, 2010)
Combined Model:
結(jié)構(gòu)信息不能fully cover sequential information。并且parsing errors直接影響DCNN的performance while sequential n-grams are always correctly observed。
最簡單的結(jié)合的方法是concatenate these representations together,then feed into fully connected soft-max neural networks。

實驗結(jié)果:
