一、前言
最近學(xué)了基礎(chǔ)爬蟲(chóng),以及在書(shū)本中了解到了Python中情感分析的庫(kù)snownlp,所以便自己寫(xiě)了個(gè)爬蟲(chóng)爬取了一支股票的評(píng)論及漲跌幅,從而進(jìn)行分析兩者間的關(guān)系。
二、爬取股票評(píng)論
我是在東方財(cái)富的股吧去爬取評(píng)論的,但是里面摻雜著一些官方消息等,所以在利用snownlp分析時(shí),官方消息的情感評(píng)分較高,所以對(duì)結(jié)果產(chǎn)生了一點(diǎn)影響,但是問(wèn)題不大,最后還是可以得到想要的結(jié)果的。
代碼如下:
import requests
from lxml import etree
import pandas as pd
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36'
}
list_text = []
list_time = []
#爬取相應(yīng)頁(yè)面的評(píng)論
for page in range(1,29):
url = 'http://guba.eastmoney.com/list,600026_' + str(page) + '.html'
page_text = requests.get(url=url,headers=headers).text
tree = etree.HTML(page_text)
#爬取相應(yīng)的評(píng)論
list_text_span = tree.xpath('//*[@id="articlelistnew"]/div[@class="articleh normal_post"]/span[3]')
for i in list_text_span:
text = i.xpath('./a/@title')[0]
list_text.append(text)
#print(text)
#爬取評(píng)論的時(shí)間
list_span = tree.xpath('//*[@id="articlelistnew"]/div[@class="articleh normal_post"]')
for i in list_span:
time = i.xpath('./span[5]/text()')[0]
list_time.append(time)
#print(list_time)
print("第"+str(page)+"爬取完畢")
data = pd.DataFrame()
#print(list_text)
data['pl'] = list_text
data['time'] = list_time
#print(list_time)
#print(data)
#將文件輸出保存
data.to_csv('600026.csv',index=False,encoding='utf_8_sig')
三、獲取股票近兩個(gè)月的數(shù)據(jù)
利用pandas_datareader來(lái)獲取股票數(shù)據(jù),然后計(jì)算出漲跌幅,同時(shí)將漲跌幅擴(kuò)大五倍,不然漲跌幅的變化不明顯等一下對(duì)比不方便,再將數(shù)據(jù)保存。
代碼如下:
import pandas_datareader.data as webdata
import datetime
#生成數(shù)據(jù)的日期
start_day = datetime.datetime(2020,8,3)
end_day = datetime.datetime(2020,10,16)
#通過(guò)yahoo財(cái)經(jīng)查詢(xún)股票信息(600026)
stock_code = input("輸入股票代碼,股票代碼后面加.sz/.ss:")
stock_info = webdata.get_data_yahoo(stock_code, start_day,end_day)
#計(jì)算出漲跌幅并波動(dòng)擴(kuò)大五倍,不然波動(dòng)太小與情緒對(duì)比不明顯
stock_info['p_change'] = stock_info['Close'].pct_change()*5
#print(stock_info)
#保存數(shù)據(jù)
stock_info.to_csv('60026.csv',encoding='utf_8_sig')
四、可視化對(duì)比
利用snownlp庫(kù)對(duì)評(píng)論進(jìn)行情感分析然后讀取股票的漲跌幅,畫(huà)圖進(jìn)行對(duì)比。
代碼如下:
import pandas as pd
import matplotlib.pyplot as plt
from snownlp import SnowNLP
#讀取數(shù)據(jù)
orig_comments = pd.read_csv('600026.csv')
#print('原始數(shù)據(jù):')
#print(orig_comments.head())
#計(jì)算情緒得分
orig_comments['情緒'] = None
lenorig = len(orig_comments)
i = 0
while(i<lenorig):
s = SnowNLP(orig_comments.iloc[i,0]).sentiments
orig_comments.iloc[i,2] = s
i = i+1
#print("情緒得分")
#print(orig_comments.head())
#去掉time后面的時(shí)間只保留月日
for i in range(0,lenorig):
orig_comments.iloc[i, 1] = list(orig_comments['time'])[i][0:5]
#print(list(orig_comments['time'])[i][0:5])
#print(orig_comments)
#計(jì)算每日的評(píng)論平均分
numberByDay = orig_comments['情緒'].groupby(orig_comments['time']).count()
emotionByDay = orig_comments['情緒'].groupby(orig_comments['time']).sum()
markByDay = pd.DataFrame()
markByDay['情緒'] = emotionByDay
markByDay['計(jì)數(shù)'] = numberByDay
markByDay['情緒平均'] = markByDay['情緒']/markByDay['計(jì)數(shù)']
#print(markByDay.head())
#將索引轉(zhuǎn)化為日期
markByDay['order'] = markByDay.index
markByDay['日期'] = None
lenMBD = len(markByDay)
i = 0
while(i<lenMBD):
markByDay.iloc[i,4] = '2020-' + markByDay.iloc[i,3]
i = i+1
#print(markByDay)
#讀取600026的漲跌幅
zyMarket = pd.read_csv('60026.csv',encoding='utf-8')
#print(zyMarket)
Market = pd.DataFrame()
Market['日期'] = zyMarket['Date']
Market['中遠(yuǎn)波動(dòng)'] = zyMarket['p_change']
#print(Market)
#將情緒和漲跌幅的日期設(shè)置為索引,將兩張表連接起來(lái)
markByDay.set_index('日期',inplace=True)
Market.set_index('日期',inplace=True)
result = Market.join(markByDay)
#print(result)
#畫(huà)圖對(duì)比
plt.figure(figsize=(10,8))
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
plt.plot(result['中遠(yuǎn)波動(dòng)'],'r-', label='中遠(yuǎn)波動(dòng)',linewidth=3)
plt.plot(result['情緒平均'],'b-', label='情緒波動(dòng)',linewidth=3)
plt.title('兩種波動(dòng)對(duì)比')
plt.xlabel('交易日期', fontsize=20)
plt.ylabel('波動(dòng)率', fontsize=20)
plt.legend()
plt.show()
結(jié)果如下:

image.png
可以發(fā)現(xiàn)情緒波動(dòng)與股票漲跌幅的波動(dòng)相關(guān)。