基于yelps評(píng)論數(shù)據(jù)的情感分析系統(tǒng)baseline

目標(biāo):

建立一個(gè)簡(jiǎn)單的情感分析系統(tǒng),支持Aspect-Based Sentiment Analysis,即不僅要考慮到 整體的情感,也需要考慮到用戶(hù)對(duì)產(chǎn)品的每個(gè)方面(aspects)的評(píng)價(jià)也需要抽取出來(lái)。而且一個(gè)用戶(hù)評(píng)價(jià)中可能會(huì)存在多個(gè)?aspect,比如”我對(duì)這款產(chǎn)品的電池比較滿(mǎn)意,但它太貴了!” 從這句話(huà)里我們可以得出:?‘電池’:正面,?‘價(jià)格’:負(fù)面”。

輸出內(nèi)容:

Business Name: XXXXX

Overall Rating: X

Detailed Rating:

aspect1: { rating:?XXX, pos: [XXX], neg: [XXX]}?

aspect2: {rating:?XXX, pos: [XXX], neg: [XXX]}?

aspect3: {rating:?XXX, pos: [XXX], neg: [XXX]}

aspect5: {rating:?XXX, pos:[xxx], neg:[xxxx]}

具體內(nèi)容可見(jiàn)我最后的項(xiàng)目輸出(引用內(nèi)容)

過(guò)程:

1.準(zhǔn)備: 數(shù)據(jù)集下載/ python工具包

數(shù)據(jù)集:https://www.yelp.com/dataset/download

yelp_academic_dataset_business.json:?用來(lái)描述一個(gè)?business,包括地理位置,屬性,郵編等信息?

yelp_academic_dataset_review.json:?一個(gè)用戶(hù)對(duì)一個(gè)?business?的評(píng)價(jià),這里包括具體的評(píng)價(jià)文本還有?stars。

python: nltk(其中nltk的corpus語(yǔ)料最好掛梯子下載)

2. 設(shè)計(jì):項(xiàng)目結(jié)構(gòu)

--data/

? ? ? ? ? ? /yelp_academic_dataset_business.json

? ? ? ? ? ? /yelp_academic_dataset_review.json

--main.py: 項(xiàng)目主流程

--model_trainning.py:用于訓(xùn)練并生成判斷評(píng)論極性的分類(lèi)器

--sentence.py:用于封裝并標(biāo)準(zhǔn)化句子的類(lèi)模塊(LEMMATIZER,?ASPECT_EXTRACTOR,WORD_TOKENIZER,POS_tag )

--model.pickle:生成好的 分類(lèi)器的序列化文件


3. Tricks:方法實(shí)現(xiàn)和部分捷徑

1. 內(nèi)存管理

由于review評(píng)論文件比較大(9G左右),在吃進(jìn)內(nèi)存的策略上可能需要多注意,不然內(nèi)存泄漏就會(huì)經(jīng)常光顧你

其中 ? 1.?Dataframe.append(dataframe )如果使用循環(huán)來(lái)append數(shù)據(jù)行,效率會(huì)非常非常低

? ? ? ? ? ? 2. np.array強(qiáng)轉(zhuǎn)dataframe內(nèi)存消耗很大,請(qǐng)不要在load數(shù)據(jù)時(shí)使用

? ? ? ? ? ? 3. python中大對(duì)象并不會(huì)自動(dòng)啟動(dòng)gc機(jī)制,此處我為了保證內(nèi)存不出現(xiàn)泄漏,使用策略:每隔50000行手動(dòng)啟動(dòng)一次gc.collect() 來(lái)手動(dòng)銷(xiāo)毀循環(huán)中建立的字符串對(duì)象

2.抽取核心實(shí)體- 特性(aspects) :?

基于規(guī)則進(jìn)行抽取,即只對(duì)評(píng)論中的名詞及名詞復(fù)數(shù)進(jìn)行抽取,并形成aspect標(biāo)簽

3. 抽取特性對(duì)應(yīng)評(píng)論中的子文本

由于大部分評(píng)論會(huì)很長(zhǎng),且評(píng)論會(huì)帶有多個(gè)aspect,所以此處需要將多個(gè)aspect分離出來(lái),并分別對(duì)其進(jìn)行情感極性的判斷。其中aspect對(duì)應(yīng)著的子文本分離辦法 為使用正則進(jìn)行提?。ㄏ噜彉?biāo)簽之間含有aspect單詞的子文本)

4. 模型的技術(shù)選型

使用nltk的樸素貝葉斯做為baseline: 原因有三 1. 時(shí)間及內(nèi)存開(kāi)銷(xiāo)少 2. 整個(gè)項(xiàng)目的業(yè)務(wù)建模方案,在未引入詞向量和句向量的前提下,基于統(tǒng)計(jì)的貝葉斯方法直觀上也能比較好的契合實(shí)際。3. 實(shí)驗(yàn)效果可行

4. 待改進(jìn)的,未實(shí)現(xiàn)的

1. 文本的清洗:1. 大量非評(píng)論類(lèi)的文本未清洗,例如鏈接,表情 2.單詞的糾錯(cuò)統(tǒng)一

2. aspects的抽取: 1. 模糊概念的實(shí)體統(tǒng)一 2. 動(dòng)名詞短語(yǔ)的捕獲

3. aspect對(duì)應(yīng)短文本的抽取: 抽取規(guī)則最好是設(shè)定在相鄰兩個(gè)aspects之間的最長(zhǎng)子文本(被常用標(biāo)點(diǎn)所分割的)

4. 相同aspect下正向及負(fù)向評(píng)論集合的內(nèi)容多樣性的考量


產(chǎn)出結(jié)果范例

{

? ? 'id': '--1UhMGODdWsrMastO9DZw',

? ? 'content': {

? ? ? ? 'biz_name': 'The Spicy Amigos',

? ? ? ? 'stars': '4.0',

? ? ? ? 'summary': {

? ? ? ? ? ? 'taco': {

? ? ? ? ? ? ? ? 'stars': 3.8747578276357726,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ? ? '. we will definitely be coming back to get our taco fix!',

? ? ? ? ? ? ? ? ? ? ', tacos,',

? ? ? ? ? ? ? ? ? ? '. the shrimp especiale taco is unreal.',

? ? ? ? ? ? ? ? ? ? ', tried the tacos,',

? ? ? ? ? ? ? ? ? ? 'we were in the mood for tacos,',

? ? ? ? ? ? ? ? ? ? 'i have been in search of good grilled steak tacos here in calgary for 2 years.'],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ? ? ". the menus not extremely extensive so don't expect pages of choice but they have a great variety of tacos."]

? ? ? ? ? ? },

? ? ? ? ? ? 'food': {

? ? ? ? ? ? ? ? 'stars': 4.249734391600525,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ? ? 'if you are looking for authentic mexican street food,',

? ? ? ? ? ? ? ? ? ? '.? very fresh and tasty mexican food.',

? ? ? ? ? ? ? ? ? ? ', authentic mexican street food that gives appropriate portions relative to the prices.',

? ? ? ? ? ? ? ? ? ? 'great food,',

? ? ? ? ? ? ? ? ? ? '! the decor is amazing and the food is to die for.',

? ? ? ? ? ? ? ? ? ? ', but the food is still delicious.'],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ? ? ". the sorry excuses for mexican food i've found in canada so far."]

? ? ? ? ? ? },

? ? ? ? ? ? 'price': {

? ? ? ? ? ? ? ? 'stars': 4.555049438951228,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ? ? '.? a little on the pricey side but worth it.',

? ? ? ? ? ? ? ? ? ? ', authentic mexican street food that gives appropriate portions relative to the prices.',

? ? ? ? ? ? ? ? ? ? ', prices are great for real cooked food.',

? ? ? ? ? ? ? ? ? ? ', service and price.',

? ? ? ? ? ? ? ? ? ? ', authentic mexican food for a great price.'],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ? ? '.? honestly i could have taken 1/2 home for later which is why i say good value for the price.']

? ? ? ? ? ? },

? ? ? ? ? ? 'lunch': {

? ? ? ? ? ? ? ? 'stars': 4.374453193350831,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ? ? 'fantastic spot for lunch with great value for your money.',

? ? ? ? ? ? ? ? ? ? '! this spot opens at 11 and i can visualize it backed out the door for lunch.',

? ? ? ? ? ? ? ? ? ? ", it might be a decent lunch spot but nothing spectacular and definitely wouldn't go again for dinner."],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ]

? ? ? ? ? ? }

? ? ? ? }

? ? }

}

{

? ? 'id': '--6MefnULPED_I942VcFNA',

? ? 'content': {

? ? ? ? 'biz_name': "John's Chinese BBQ Restaurant",

? ? ? ? 'stars': '3.0',

? ? ? ? 'summary': {

? ? ? ? ? ? 'pork': {

? ? ? ? ? ? ? ? 'stars': 3.4998970618511223,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ? ? 'the bbq pork is very juicy and i only come here for that.',

? ? ? ? ? ? ? ? ? ? ', is their bbq pork.',

? ? ? ? ? ? ? ? ? ? '. the signature roasted pork was juicy and moist with the sweet tangy taste.',

? ? ? ? ? ? ? ? ? ? ', minced pork meat pie,',

? ? ? ? ? ? ? ? ? ? 'if you want a quick fix for a scrumptious char siu or chinese pork bbq,',

? ? ? ? ? ? ? ? ? ? ', the best roast pork and bbq pork on highway 7.'],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ? ? 'service with boss lady is horrible but the bbq pork is really tasty!']

? ? ? ? ? ? },

? ? ? ? ? ? 'bbq': {

? ? ? ? ? ? ? ? 'stars': 3.3527439562378683,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ? ? 'the bbq pork is very juicy and i only come here for that.',

? ? ? ? ? ? ? ? ? ? '.? the decor itself is your usual chinese bbq house/restaurant and is nothing to go crazy over,',

? ? ? ? ? ? ? ? ? ? 'if you want a quick fix for a scrumptious char siu or chinese pork bbq,',

? ? ? ? ? ? ? ? ? ? ', the best roast pork and bbq pork on highway 7.',

? ? ? ? ? ? ? ? ? ? ". they're well known for the bbq pork,",

? ? ? ? ? ? ? ? ? ? '.? the honey deep fried oysters and bbq pork were excellent too.'],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ? ? 'service with boss lady is horrible but the bbq pork is really tasty!',

? ? ? ? ? ? ? ? ? ? 'i walked by the restaurant more than 5 years ago when i witnessed from the window one of the employees drop a bbq chicken wing on the floor,']

? ? ? ? ? ? },

? ? ? ? ? ? 'place': {

? ? ? ? ? ? ? ? 'stars': 2.9998965552911967,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ? ? '. the place was a little outdated,',

? ? ? ? ? ? ? ? ? ? ', this is theeee place.',

? ? ? ? ? ? ? ? ? ? '.this place has,',

? ? ? ? ? ? ? ? ? ? 'this place is a restaurant and a chinese bbq restaurant.',

? ? ? ? ? ? ? ? ? ? '.? for the prices this place charges,',

? ? ? ? ? ? ? ? ? ? '. compare to other places that sell bbq dishes,'],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ]

? ? ? ? ? ? },

? ? ? ? ? ? 'chines': {

? ? ? ? ? ? ? ? 'stars': 2.9992501874531365,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ? ? 'if you want a quick fix for a scrumptious char siu or chinese pork bbq,',

? ? ? ? ? ? ? ? ? ? 'this place is a restaurant and a chinese bbq restaurant.',

? ? ? ? ? ? ? ? ? ? ', the chinese spare rib with onion,',

? ? ? ? ? ? ? ? ? ? '. beef with gai lan (chinese broccoli?'],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ]

? ? ? ? ? ? }

? ? ? ? }

? ? }

}

{

? ? 'id': '--7zmmkVg-IMGaXbuVd0SQ',

? ? 'content': {

? ? ? ? 'biz_name': 'Primal Brewery',

? ? ? ? 'stars': '4.0',

? ? ? ? 'summary': {

? ? ? ? ? ? 'beer': {

? ? ? ? ? ? ? ? 'stars': 3.9826899536214895,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ? ? '. primal had great beer,',

? ? ? ? ? ? ? ? ? ? ', fantastic beer (try the grim creeper!',

? ? ? ? ? ? ? ? ? ? "if you're a harry potter fan then you will enjoy their variety of butter beers.",

? ? ? ? ? ? ? ? ? ? ". \n\ni wasn't in the mood for beer,",

? ? ? ? ? ? ? ? ? ? 'the hubby and i stopped in for a quick beer before going home.',

? ? ? ? ? ? ? ? ? ? ', the beer.'],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ? ? "the beer is horrible and dave doesn't know the difference between a cat and a dog.",

? ? ? ? ? ? ? ? ? ? '.? i sat at the bar and asked about the type of beer i prefer.']

? ? ? ? ? ? },

? ? ? ? ? ? 'breweri': {

? ? ? ? ? ? ? ? 'stars': 3.817834742296155,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ? ? ', especially if you combine it with a trip to some of the breweries in cornelius.',

? ? ? ? ? ? ? ? ? ? ". it's also conveniently located not far up the same road from crafty beer guys where you can find some of primal's and other local breweries on tap.",

? ? ? ? ? ? ? ? ? ? 'on a recent tour of lake norman area breweries,',

? ? ? ? ? ? ? ? ? ? '. \n\nalso i love it when breweries have "activities".'],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ]

? ? ? ? ? ? },

? ? ? ? ? ? 'food': {

? ? ? ? ? ? ? ? 'stars': 3.6664629742792063,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ? ? '.\n\nthey have a food truck on site,',

? ? ? ? ? ? ? ? ? ? '.? there are often a few food trucks present,',

? ? ? ? ? ? ? ? ? ? ", out front there's a few umbrella-ed tables and a slew of full-sun seats over by the food truck/corn hole arena if you're so inclined.",

? ? ? ? ? ? ? ? ? ? "! the fried pickle chips at the food truck outside were some of the best i've ever had (i've had a lot).",

? ? ? ? ? ? ? ? ? ? '. there was no food truck when we were there but it was raining.',

? ? ? ? ? ? ? ? ? ? ', food trucks,'],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ]

? ? ? ? ? ? },

? ? ? ? ? ? 'place': {

? ? ? ? ? ? ? ? 'stars': 3.9998571479590015,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ? ? 'this is exactly the type of place that huntersville has always needed.',

? ? ? ? ? ? ? ? ? ? '. a little popcorn as i sit by the fireplace drinking my beer!',

? ? ? ? ? ? ? ? ? ? '. relaxing atmosphere with a fire place!',

? ? ? ? ? ? ? ? ? ? ', as they have a delightful fireplace and sitting area.',

? ? ? ? ? ? ? ? ? ? ', a firepit and indoor fireplace,',

? ? ? ? ? ? ? ? ? ? 'primal brewery is a quaint and small place up in huntersville/cornelius.'],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ? ? "i'm a tad reluctant to write a review as i run the risk of spoiling one of the things i love about the place,",

? ? ? ? ? ? ? ? ? ? "i've been meaning to write a review for this place for a while but now seems fitting."]

? ? ? ? ? ? }

? ? ? ? }

? ? }

}

{

? ? 'id': '--9QQLMTbFzLJ_oT-ON3Xw',

? ? 'content': {

? ? ? ? 'biz_name': 'Great Clips',

? ? ? ? 'stars': '3.0',

? ? ? ? 'summary': {

? ? ? ? ? ? 'hair': {

? ? ? ? ? ? ? ? 'stars': 2.8459349280824555,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ? ? '. affordable haircuts and pleasant hair cutters.',

? ? ? ? ? ? ? ? ? ? ', i was greeted by the hair stylist and she asked,'],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ? ? '. would it be ok for me to bring her in to get her hair washed and cut?',

? ? ? ? ? ? ? ? ? ? ". ask for a hair cut i got seated at stephanie's corals chair.",

? ? ? ? ? ? ? ? ? ? ". ask for a hair cut i got seated at stephanie's corals chair."]

? ? ? ? ? ? },

? ? ? ? ? ? 'cut': {

? ? ? ? ? ? ? ? 'stars': 2.7998133457769483,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ? ? '. affordable haircuts and pleasant hair cutters.'],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ? ? ". she told me that they do not cut extensions and that's great clips policy .",

? ? ? ? ? ? ? ? ? ? 'haircut was good and stylist nice but the manager was unpleasant from the time i stepped in until i left.',

? ? ? ? ? ? ? ? ? ? '. would it be ok for me to bring her in to get her hair washed and cut?',

? ? ? ? ? ? ? ? ? ? ", blah blah) she seemed uninterested on my answers (it's fine in a boring person) but she would get upset when i keep asking to cut more on the top,",

? ? ? ? ? ? ? ? ? ? ". ask for a hair cut i got seated at stephanie's corals chair.",

? ? ? ? ? ? ? ? ? ? ". ask for a hair cut i got seated at stephanie's corals chair."]

? ? ? ? ? ? },

? ? ? ? ? ? 'time': {

? ? ? ? ? ? ? ? 'stars': 2.3330741028774584,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ? ? 'tried this place one more time.'],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ? ? 'haircut was good and stylist nice but the manager was unpleasant from the time i stepped in until i left.',

? ? ? ? ? ? ? ? ? ? 'service was quick even though it said waiting time to be 30 mins,',

? ? ? ? ? ? ? ? ? ? '. she slaps some shampoo on the top of my head several times in the same spot and barely moved it around with her finger tips as if she was grossed out by me.',

? ? ? ? ? ? ? ? ? ? '. she slaps some shampoo on the top of my head several times in the same spot and barely moved it around with her finger tips as if she was grossed out by me.']

? ? ? ? ? ? },

? ? ? ? ? ? 'didnt': {

? ? ? ? ? ? ? ? 'stars': 0.0,

? ? ? ? ? ? ? ? 'aspect_pos_review': [

? ? ? ? ? ? ? ? ],

? ? ? ? ? ? ? ? 'aspect_neg_review': [

? ? ? ? ? ? ? ? ]

? ? ? ? ? ? }

? ? ? ? }

? ? }

}

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容