久久久免费精品推荐,青久中天堂综合网

1）What is Sentiment Analysis?##

情感分析（Sentiment analysis），又稱傾向性分析，意見抽?。∣pinion extraction），意見挖掘（Opinion mining），情感挖掘（Sentiment mining），主觀分析（Subjectivity analysis），它是對帶有情感色彩的主觀性文本進行分析、處理、歸納和推理的過程，如從評論文本中分析用戶對“數(shù)碼相機”的“變焦、價格、大小、重量、閃光、易用性”等屬性的情感傾向。

情感分析的意義何在？下面以實際應(yīng)用為例進行直觀的闡述：
? Movie: is this review positive or negative?
? Products: what do people think about the new iPhone?
? Public sentiment: how is consumer confidence? Is despair increasing?
? Politics: what do people think about this candidate or issue?
**? Prediction: ** predict election outcomes or market trends from sentiment
情感分析主要目的就是識別用戶對事物或人的看法、態(tài)度（attitudes：enduring, affectively colored beliefs, dispositions towards objects or persons），參與主體主要包括：
**Holder (source) **of attitude：觀點持有者
**Target (aspect) **of attitude：評價對象
**Type of attitude：評價觀點? From a set of types：Like, love, hate, value, desire, etc.
Or (more commonly) simple weighted polarity: *positive, negative, neutral, *together with strength

Text containing the attitude：評價文本，一般是句子或整篇文檔

更細(xì)更深入的還包括評價屬性，情感詞/極性詞，評價搭配等、
通常，我們面臨的情感分析任務(wù)包括如下幾類：
Simplest task: Is the attitude of this text positive or negative?
More complex: Rank the attitude of this text from 1 to 5
Advanced: Detect the target, source, or complex attitude types

后續(xù)章節(jié)將以Simplest task為例進行介紹。

2）A Baseline Algorithm##

本小節(jié)對影評進行情感分析為例，向大家展示一個簡單、實用的情感分析系統(tǒng)。詳細(xì)見論文: Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, 79—86.
Bo Pang and Lillian Lee. 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. ACL, 271-278
我們面臨的任務(wù)是“Polarity detection: Is an IMDB movie review positive or negative?”，數(shù)據(jù)集為“*Polrity Data 2.0: *http://www.cs.cornell.edu/people/pabo/movie-review-data”.
作者將情感分析當(dāng)作分類任務(wù)，拆分成如下子任務(wù)：
Tokenization：正文提取，過濾時間、電話號碼等，保留大寫字母開頭的字符串，保留表情符號，切詞；
Feature Extraction：直觀上，我們會認(rèn)為形容詞直接決定文本的情感，而Pang和Lee的實驗表明，采用所有詞（unigram）作為特征，可以達到更好的情感分類效果。

其中，需要對否定句進行特別的處理，如句子”I didn’t like this movie”vs “I really like this movie”，unigram只差一個詞，但是有著截然不同的含義。為了有效處理這種情況，Das and Chen (2001)提出了“Add NOT_ to every word between negation and following punctuation”，根據(jù)此規(guī)則可以將句子“didn’t like this movie , but I”轉(zhuǎn)換為“didn’t NOT_like NOT_this NOT_movie, but I”。
另外，在抽取特征時，直觀的感覺“Word occurrence may matter more than word frequency”，這是因為最相關(guān)的情感詞在一些文本片段中僅僅出現(xiàn)一次，詞頻模型起得作用有限，甚至是負(fù)作用，則使用多重伯努利模型事件空間代替多項式事件空間，實驗也的確證明了這一點。所以，論文最終選擇二值特征，即詞的出現(xiàn)與否，代替?zhèn)鹘y(tǒng)的頻率特征。log(freq(w))也是一種值得嘗試的降低頻率干擾的方法。
Classification using different classifiers:如Na?ve Bayes、MaxEnt、SVM，以樸素貝葉斯分類器為例，訓(xùn)練過程如下：

預(yù)測過程如下：

實驗表明，MaxEnt和SVM相比Na?ve Bayes可以得到更好的效果。
最后，通過case review可以總結(jié)下，影評情感分類的難點是什么？
語言表達的含蓄微妙：“If you are reading this because it is your darling fragrance, please wear it at home exclusively, and tape the windows shut.”，“ She runs the gamut of emotions from A to B”。
挫敗感表達方式：先描述開始的期待（不吝贊美之詞），后表達最后失望感受，如“This film should be brilliant. It sounds like a great plot, the actors are** first grade**, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can’t hold up.”，“Well as usual Keanu Reeves is nothing special, but surprisingly, the very talented Laurence Fishbourne is **not so good **either, I was surprised.”。

3）Sentiment Lexicons##

情感分析模型非常依賴于情感詞典抽取特征或規(guī)則，以下羅列了較為流行且成熟的開放情感詞典資源：
GI（The General Inquirer）：該詞典給出了每個詞條非常全面的信息，如詞性，反義詞，褒貶，等，組織結(jié)構(gòu)如下：

詳細(xì)見論文：Philip J. Stone, Dexter C Dunphy, Marshall S. Smith, Daniel M. Ogilvie. 1966. The General Inquirer: A Computer Approach to Content Analysis. MIT Press
LIWC (Linguistic Inquiry and Word Count)：該詞典通過大量正則表達式描述不同類別的情感詞規(guī)律，其類別體系與GI（The General Inquirer）基本一致，組織結(jié)構(gòu)如下：

詳細(xì)見論文：Pennebaker, J.W., Booth, R.J., & Francis, M.E. (2007). Linguistic Inquiry and Word Count: LIWC 2007. Austin, TX
MPQA Subjectivity Cues Lexicon：其中包含Positive words: 2718，Negative words: 4912，組織結(jié)構(gòu)如下圖所示：

詳細(xì)見論文：Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005.
Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.
Bing Liu Opinion Lexicon：其中包含Positive words: 2006，Negative words: 4783，需要特別說明的是，詞典不但包含正常的用詞，還包含了拼寫錯誤、語法變形，俚語以及社交媒體標(biāo)記等，詳細(xì)見論文：Minqing Hu and Bing Liu. Mining and Summarizing Customer Reviews. ACM SIGKDD-2004.
SentiWordNet：其通過對WordNet中的詞條進行情感分類，并標(biāo)注出每個詞條屬于positive和negative類別的權(quán)重大小，組織結(jié)構(gòu)如下：

詳細(xì)見論文：Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010 SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. LREC-2010
以上給出了一系列可用的情感詞典資源，但是，如何選擇一個合適的為我所用呢？這里，通過對比同一詞條在不同詞典之間的分類，衡量詞典資源的不一致程度，如下：

對于在不同詞典中表現(xiàn)不一致的詞條，我們至少可以做兩件事情。第一，review這些詞條，通過少量人工加以糾正；第二，可以得到一些存在褒貶歧義的詞條。
給定一個詞，如何確定其以多大概率出現(xiàn)在某種情感類別文本中呢？以IMDB下不同打分下影評為例，最簡單的方法就是計算每個分?jǐn)?shù)（星的個數(shù)）對應(yīng)的文本中詞條出現(xiàn)的頻率，如下圖所示為Count(“bad”)分布情況：

使用更多的是likelihood公式：

為了使得不同詞條在不同類別下的概率可比，通常使用Scaled likelihood公式代替，如下：

如下圖所示，列出了部分詞條在不同類別下的Scaled likelihood，據(jù)此可以判斷每個詞條的傾向性。
另外，我們通常會有這么一個疑問：否定詞（如not, n’t, no, never）是否更容易出現(xiàn)在negative情感文本中？Potts, Christopher（2011）等通過實驗給出了答案：More negation in negative sentiment，如下圖所示：

4）Learning Sentiment Lexicons##

我們在慶幸和贊揚眾多公開情感詞典為我所用的同時，我們不免還想了解構(gòu)建情感詞典的方法，正所謂知其然知其所以然。一方面在面臨新的情感分析問題，解決新的情感分析任務(wù)時，難免會需要結(jié)合實際需求構(gòu)建或完善情感詞典，另一方面，可以將成熟的詞典構(gòu)建方法應(yīng)用于其他領(lǐng)域，知識無邊界，許多方法都是相通的。
常見的情感詞典構(gòu)建方法是基于半指導(dǎo)的bootstrapping學(xué)習(xí)方法，主要包括兩步：
Use a small amount of information（Seed）A few labeled examples
A few hand-built patterns

To bootstrap a lexicon

接下來，通過相關(guān)的幾篇論文，詳細(xì)闡述下構(gòu)建情感詞典的方法。具體如下：
** 1. Hatzivassiloglou & McKeown：論文見Vasileios Hatzivassiloglou and Kathleen R. McKeown. 1997. Predicting the Semantic Orientation of Adjectives. ACL, 174–181，基于這樣的一種語言現(xiàn)象：“Adjectives conjoined by ‘and’’ have same polarity；Adjectives conjoined by ‘but **‘ do not”，如下示例：
Fair and legitimate, corrupt and brutal
*fair and brutal, *corrupt and legitimate
fair **but **brutal

Hatzivassiloglou & McKeown（1997）提出了基于bootstrapping的學(xué)習(xí)方法，主要包括四步：
Step 1：Label seed set of 1336 adjectives (all >20 in 21 million word WSJ corpus)

初始種子集包括657個 positive words（如adequate central clever famous intelligent remarkable reputed sensitive slender thriving…）和679個 negative words（如contagious drunken ignorant lanky listless primitive strident troublesome unresolved unsuspecting…）
Step 2：Expand seed set to conjoined adjectives，如下圖所示：

Step 3：Supervised classifier assigns “polarity similarity” to each word pair, resulting in graph，如下圖所示：

Step 4：Clustering for partitioning the graph into two

最終，輸出新的情感詞典，如下（加粗詞條為自動挖掘出的詞條）：
Positive: bold decisive disturbing generous good honest important large mature patient peaceful positive proud sound stimulating straightforward strange talented vigorous witty…
Negative: ambiguous cautious cynical evasive harmful hypocritical inefficient insecure irrational irresponsible minor outspoken pleasant reckless risky selfish tedious unsupported vulnerable wasteful…

** 2. Turney Algorithm：**論文見Turney (2002): Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews，具體步驟如下：
Step 1：Extract a phrasal lexicon from reviews，通過規(guī)則抽取的phrasal如下圖所示：

Step 2：Learn polarity of each phrase，那么，如何評價phrase的polarity呢？直觀上，有這樣的結(jié)論：“Positive phrases co-occur more with ‘excellent’，Negative phrases co-occur more with ’poor’”，這時，將問題轉(zhuǎn)換成如何衡量詞條之間的共現(xiàn)關(guān)系？于是，學(xué)者們引入了點互信息（Pointwise mutual information，PMI），它經(jīng)常被用于度量兩個具體事件的相關(guān)程度，公式為：

兩個詞條的PMI公式為：

常用的計算PMI(word1, word2)方法是分別以”word1”，”word2”和”word1 NEAR word2”為query，根據(jù)搜索引擎檢索結(jié)果，得到P(word)和P(word1, word2)，如下：
P(word) = hits(word)/N
P(word1
,word2
) = hits(word1 NEAR word2)/N2

  則有：

那么，計算一個phrase的polarity公式為（excellent和poor也可以使用其它已知極性詞代替）：

Turney Algorithm在410 reviews（from Epinions）的數(shù)據(jù)集上，其中170 (41%) negative，240 (59%) positive，取得了74%的準(zhǔn)確率（baseline為59%，均標(biāo)注為positive）。
Step 3：Rate a review by the average polarity of its phrases

** 3. Using WordNet to learn polarity：**論文見S.M. Kim and E. Hovy. 2004. Determining the sentiment of opinions. COLING 2004，M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of KDD, 2004.該方法步驟如下：
Create positive (“good”) and negative seed-words (“terrible”)
Find Synonyms and Antonyms

Positive Set: Add synonyms of positive words (“well”) and antonyms of negative words
Negative Set: Add synonyms of negative words (“awful”) and antonyms of positive words (”evil”)
Repeat, following chains of synonyms
Filter

以上幾個方法都有較好的領(lǐng)域適應(yīng)性和魯棒性，基本思想可以概括為“Use seeds and semi-supervised learning to induce lexicons”，即：
Start with a seed set of words (‘good’, ‘poor’)
Find other words that have similar polarity:Using “and” and “but”
Using words that occur nearby in the same document
Using WordNet synonyms and antonyms
Use seeds and semi-supervised learning to induce lexicons

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

文本情感度分析

文本情感度分析

1）What is Sentiment Analysis?##

2）A Baseline Algorithm##

3）Sentiment Lexicons##

4）Learning Sentiment Lexicons##

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

文本情感度分析

1）What is Sentiment Analysis?##

2）A Baseline Algorithm##

3）Sentiment Lexicons##

4）Learning Sentiment Lexicons##

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av