基于股票評(píng)論的情緒分析與股票漲幅的關(guān)系

一、前言

最近學(xué)了基礎(chǔ)爬蟲(chóng),以及在書(shū)本中了解到了Python中情感分析的庫(kù)snownlp,所以便自己寫(xiě)了個(gè)爬蟲(chóng)爬取了一支股票的評(píng)論及漲跌幅,從而進(jìn)行分析兩者間的關(guān)系。

二、爬取股票評(píng)論

我是在東方財(cái)富的股吧去爬取評(píng)論的,但是里面摻雜著一些官方消息等,所以在利用snownlp分析時(shí),官方消息的情感評(píng)分較高,所以對(duì)結(jié)果產(chǎn)生了一點(diǎn)影響,但是問(wèn)題不大,最后還是可以得到想要的結(jié)果的。
代碼如下:

import requests
from lxml import etree
import pandas as pd



headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36'
    }
list_text = []
list_time = []
#爬取相應(yīng)頁(yè)面的評(píng)論
for page in range(1,29):
    url = 'http://guba.eastmoney.com/list,600026_' + str(page) + '.html'
    page_text = requests.get(url=url,headers=headers).text
    tree = etree.HTML(page_text)
    #爬取相應(yīng)的評(píng)論
    list_text_span = tree.xpath('//*[@id="articlelistnew"]/div[@class="articleh normal_post"]/span[3]')
    for i in list_text_span:
        text = i.xpath('./a/@title')[0]
        list_text.append(text)
    #print(text)
    #爬取評(píng)論的時(shí)間
    list_span = tree.xpath('//*[@id="articlelistnew"]/div[@class="articleh normal_post"]')
    for i in list_span:
        time = i.xpath('./span[5]/text()')[0]
        list_time.append(time)
    #print(list_time)
    print("第"+str(page)+"爬取完畢")
data = pd.DataFrame()
#print(list_text)
data['pl'] = list_text
data['time'] = list_time
#print(list_time)
#print(data)
#將文件輸出保存
data.to_csv('600026.csv',index=False,encoding='utf_8_sig')

三、獲取股票近兩個(gè)月的數(shù)據(jù)

利用pandas_datareader來(lái)獲取股票數(shù)據(jù),然后計(jì)算出漲跌幅,同時(shí)將漲跌幅擴(kuò)大五倍,不然漲跌幅的變化不明顯等一下對(duì)比不方便,再將數(shù)據(jù)保存。
代碼如下:

import pandas_datareader.data as webdata
import datetime


#生成數(shù)據(jù)的日期
start_day = datetime.datetime(2020,8,3)
end_day = datetime.datetime(2020,10,16)
#通過(guò)yahoo財(cái)經(jīng)查詢(xún)股票信息(600026)
stock_code = input("輸入股票代碼,股票代碼后面加.sz/.ss:")
stock_info = webdata.get_data_yahoo(stock_code, start_day,end_day)
#計(jì)算出漲跌幅并波動(dòng)擴(kuò)大五倍,不然波動(dòng)太小與情緒對(duì)比不明顯
stock_info['p_change'] = stock_info['Close'].pct_change()*5
#print(stock_info)
#保存數(shù)據(jù)
stock_info.to_csv('60026.csv',encoding='utf_8_sig')

四、可視化對(duì)比

利用snownlp庫(kù)對(duì)評(píng)論進(jìn)行情感分析然后讀取股票的漲跌幅,畫(huà)圖進(jìn)行對(duì)比。
代碼如下:

import pandas as pd
import matplotlib.pyplot as plt
from snownlp import SnowNLP

#讀取數(shù)據(jù)
orig_comments = pd.read_csv('600026.csv')
#print('原始數(shù)據(jù):')
#print(orig_comments.head())

#計(jì)算情緒得分
orig_comments['情緒'] = None
lenorig = len(orig_comments)
i = 0
while(i<lenorig):
    s = SnowNLP(orig_comments.iloc[i,0]).sentiments
    orig_comments.iloc[i,2] = s
    i = i+1
#print("情緒得分")
#print(orig_comments.head())


#去掉time后面的時(shí)間只保留月日
for i in range(0,lenorig):
    orig_comments.iloc[i, 1] = list(orig_comments['time'])[i][0:5]
    #print(list(orig_comments['time'])[i][0:5])
#print(orig_comments)

#計(jì)算每日的評(píng)論平均分
numberByDay = orig_comments['情緒'].groupby(orig_comments['time']).count()
emotionByDay = orig_comments['情緒'].groupby(orig_comments['time']).sum()
markByDay = pd.DataFrame()
markByDay['情緒'] = emotionByDay
markByDay['計(jì)數(shù)'] = numberByDay
markByDay['情緒平均'] = markByDay['情緒']/markByDay['計(jì)數(shù)']
#print(markByDay.head())

#將索引轉(zhuǎn)化為日期
markByDay['order'] = markByDay.index
markByDay['日期'] = None

lenMBD = len(markByDay)
i = 0
while(i<lenMBD):
    markByDay.iloc[i,4] = '2020-' + markByDay.iloc[i,3]
    i = i+1
#print(markByDay)

#讀取600026的漲跌幅
zyMarket = pd.read_csv('60026.csv',encoding='utf-8')
#print(zyMarket)
Market = pd.DataFrame()
Market['日期'] = zyMarket['Date']
Market['中遠(yuǎn)波動(dòng)'] = zyMarket['p_change']
#print(Market)

#將情緒和漲跌幅的日期設(shè)置為索引,將兩張表連接起來(lái)
markByDay.set_index('日期',inplace=True)
Market.set_index('日期',inplace=True)
result = Market.join(markByDay)
#print(result)

#畫(huà)圖對(duì)比
plt.figure(figsize=(10,8))
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
plt.plot(result['中遠(yuǎn)波動(dòng)'],'r-', label='中遠(yuǎn)波動(dòng)',linewidth=3)
plt.plot(result['情緒平均'],'b-', label='情緒波動(dòng)',linewidth=3)
plt.title('兩種波動(dòng)對(duì)比')
plt.xlabel('交易日期', fontsize=20)
plt.ylabel('波動(dòng)率', fontsize=20)
plt.legend()
plt.show()

結(jié)果如下:


image.png

可以發(fā)現(xiàn)情緒波動(dòng)與股票漲跌幅的波動(dòng)相關(guān)。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀(guān)點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容