最近在使用python中的textblob模板分析電商評(píng)論的情感,途中遇到了一些坑再這里記錄下:
首先給出官方文檔:
https://textblob.readthedocs.io/en/dev/
簡(jiǎn)單地使用其介紹的文檔確實(shí)挺簡(jiǎn)單的
from textblob import TextBlob
train = [
('I love this sandwich.', 'pos'),
('this is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('this is my best work.', 'pos'),
("what an awesome view", 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('he is my sworn enemy!', 'neg'),
('my boss is horrible.', 'neg')
]
test = [
('the beer was good.', 'pos'),
('I do not enjoy my job', 'neg'),
("I ain't feeling dandy today.", 'neg'),
("I feel amazing!", 'pos'),
('Gary is a friend of mine.', 'pos'),
("I can't believe I'm doing this.", 'neg')
]
from textblob.classifiers import NaiveBayesClassifier
cl = NaiveBayesClassifier(train)
cl.classify("This is an amazing library!")
這些代碼都是官方文檔里的,返回的是pos,即"This is an amazing library!"這句話是積極的
但這要來處理大量的文本數(shù)據(jù),單靠這幾個(gè)訓(xùn)練集未免太草率了,于是官方又給出了一個(gè)叫 情感分析儀的東西,使用了nltk 中的語料庫(kù),格式同樣也是有消極和積極兩類,代碼如下:
from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer
blob = TextBlob("I love this library", analyzer=NaiveBayesAnalyzer())
blob.sentiment
Sentiment(classification='pos', p_pos=0.7996209910191279, p_neg=0.2003790089808724)
但執(zhí)行時(shí)報(bào)了錯(cuò),叫你執(zhí)行
import nltk
nltk.download('movie_reviews')
可你執(zhí)行后會(huì)出現(xiàn)這個(gè)錯(cuò)誤:
[nltk_data] Error loading movie_reviews: <urlopen error [WinError
[nltk_data] 10054] 遠(yuǎn)程主機(jī)強(qiáng)迫關(guān)閉了一個(gè)現(xiàn)有的連接。>
意思是你要下載的數(shù)據(jù)資源連接不了,參考網(wǎng)上的解決辦法,有人給出了手動(dòng)加載的辦法https://blog.csdn.net/qq_37891889/article/details/104418106
這里補(bǔ)充下,要打開那個(gè)界面需要執(zhí)行python語句
nltk.download() 此時(shí)就會(huì)彈出那個(gè)界面,把下載的資源壓縮到對(duì)應(yīng)的目錄就可以了
然后再執(zhí)行
from nltk.book import *
加載下
這樣就可以繼續(xù)官方給的代碼了
不過那個(gè) NaiveBayesAnalyzer的方法好慢,自我猜測(cè)是語句出現(xiàn)了重復(fù)執(zhí)行,可惜找不到文檔,建議還是用另一個(gè) PatternAnalyzer試試