概述
我們將解釋如何建立一個(gè)有LSTM單元的RNN模型來(lái)預(yù)測(cè)S&P500指數(shù)的價(jià)格。 數(shù)據(jù)集可以從Yahoo!下載。 在例子中,使用了從1950年1月3日(Yahoo! Finance可以追溯到的最大日期)的S&P 500數(shù)據(jù)到2017年6月23日。 為了簡(jiǎn)單起見(jiàn),我們只使用每日收盤(pán)價(jià)進(jìn)行預(yù)測(cè)。 同時(shí),我將演示如何使用TensorBoard輕松調(diào)試和模型跟蹤。
關(guān)于RNN和LSTM
RNN的目的使用來(lái)處理序列數(shù)據(jù)。在傳統(tǒng)的神經(jīng)網(wǎng)絡(luò)模型中,是從輸入層到隱含層再到輸出層,層與層之間是全連接的,每層之間的節(jié)點(diǎn)是無(wú)連接的。但是這種普通的神經(jīng)網(wǎng)絡(luò)對(duì)于很多問(wèn)題卻無(wú)能無(wú)力。例如,你要預(yù)測(cè)句子的下一個(gè)單詞是什么,一般需要用到前面的單詞,因?yàn)橐粋€(gè)句子中前后單詞并不是獨(dú)立的。RNN之所以稱為循環(huán)神經(jīng)網(wǎng)路,即一個(gè)序列當(dāng)前的輸出與前面的輸出也有關(guān)。具體的表現(xiàn)形式為網(wǎng)絡(luò)會(huì)對(duì)前面的信息進(jìn)行記憶并應(yīng)用于當(dāng)前輸出的計(jì)算中,即隱藏層之間的節(jié)點(diǎn)不再無(wú)連接而是有連接的,并且隱藏層的輸入不僅包括輸入層的輸出還包括上一時(shí)刻隱藏層的輸出。理論上,RNN能夠?qū)θ魏伍L(zhǎng)度的序列數(shù)據(jù)進(jìn)行處理。
Long Short Term 網(wǎng)絡(luò),一般就叫做 LSTM,是一種 RNN 特殊的類(lèi)型,LSTM區(qū)別于RNN的地方,主要就在于它在算法中加入了一個(gè)判斷信息有用與否的“處理器”,這個(gè)處理器作用的結(jié)構(gòu)被稱為cell。一個(gè)cell當(dāng)中被放置了三扇門(mén),分別叫做輸入門(mén)、遺忘門(mén)和輸出門(mén)。一個(gè)信息進(jìn)入LSTM的網(wǎng)絡(luò)當(dāng)中,可以根據(jù)規(guī)則來(lái)判斷是否有用。只有符合算法認(rèn)證的信息才會(huì)留下,不符的信息則通過(guò)遺忘門(mén)被遺忘。說(shuō)起來(lái)無(wú)非就是一進(jìn)二出的工作原理,卻可以在反復(fù)運(yùn)算下解決神經(jīng)網(wǎng)絡(luò)中長(zhǎng)期存在的大問(wèn)題。目前已經(jīng)證明,LSTM是解決長(zhǎng)序依賴問(wèn)題的有效技術(shù),并且這種技術(shù)的普適性非常高,導(dǎo)致帶來(lái)的可能性變化非常多。各研究者根據(jù)LSTM紛紛提出了自己的變量版本,這就讓LSTM可以處理千變?nèi)f化的垂直問(wèn)題。
數(shù)據(jù)準(zhǔn)備
股票價(jià)格是長(zhǎng)度為NN,定義為p0,p1,...,pN-1,其中pi是第i天的收盤(pán)價(jià),0≤i<N。 我們有一個(gè)大小固定的移動(dòng)窗口w(后面我們將其稱為input_size),每次我們將窗口向右移動(dòng)w個(gè)單位,以使所有移動(dòng)窗口中的數(shù)據(jù)之間沒(méi)有重疊。

我們使用一個(gè)移動(dòng)窗口中的內(nèi)容來(lái)預(yù)測(cè)下一個(gè),而在兩個(gè)連續(xù)的窗口之間沒(méi)有重疊。
我們將建立RNN模型將LSTM單元作為基本的隱藏單元。 我們使用此值從時(shí)間t內(nèi)將第一個(gè)移動(dòng)窗口W0移動(dòng)到窗口Wt:

預(yù)測(cè)價(jià)格在下一個(gè)窗口在Wt+1

我們?cè)噲D學(xué)習(xí)一個(gè)近似函數(shù),


展開(kāi)的RNN
考慮反向傳播(BPTT)是如何工作的,我們通常將RNN訓(xùn)練成一個(gè)“unrolled”的樣式,這樣我們就不需要做太多的傳播計(jì)算,而且可以節(jié)省訓(xùn)練的復(fù)雜性。
以下是關(guān)于Tensorflow教程中input_size的解釋?zhuān)?/p>
By design, the output of a recurrent neural network (RNN) depends on arbitrarily distant inputs. Unfortunately, this makes backpropagation computation difficult. In order to make the learning process tractable, it is common practice to create an “unrolled” version of the network, which contains a fixed number (num_steps) of LSTM inputs and outputs. The model is then trained on this finite approximation of the RNN. This can be implemented by feeding inputs of length num_steps at a time and performing a backward pass after each such input block.
價(jià)格的順序首先被分成不重疊的小窗口。 每個(gè)窗口都包含input_size數(shù)字,每個(gè)數(shù)字被視為一個(gè)獨(dú)立的輸入元素。 然后,任何num_steps連續(xù)的輸入元素被分配到一個(gè)訓(xùn)練輸入中,形成一個(gè)訓(xùn)練
在Tensorfow上的“unrolled”版本的RNN。 相應(yīng)的標(biāo)簽就是它們后面的輸入元素。
例如,如果input_size = 3和num_steps = 2,我們的第一批的訓(xùn)練樣例如下所示:

以下是數(shù)據(jù)格式化的關(guān)鍵部分:
seq = [np.array(seq[i * self.input_size: (i + 1) * self.input_size]) for i in range(len(seq) // self.input_size)]
# Split into groups of `num_steps`
X = np.array([seq[i: i + self.num_steps] for i in range(len(seq) - self.num_steps)])
y = np.array([seq[i + self.num_steps] for i in range(len(seq) - self.num_steps)])
培訓(xùn)/測(cè)試拆分
由于我們總是想預(yù)測(cè)未來(lái),我們以最新的10%的數(shù)據(jù)作為測(cè)試數(shù)據(jù)。
正則化
標(biāo)準(zhǔn)普爾500指數(shù)隨著時(shí)間的推移而增加,導(dǎo)致測(cè)試集中大部分?jǐn)?shù)值超出訓(xùn)練集的范圍,因此模型必須預(yù)測(cè)一些以前從未見(jiàn)過(guò)的數(shù)字。 但這卻不是很理想。

為了解決樣本外的問(wèn)題,我們?cè)诿總€(gè)移動(dòng)窗口中對(duì)價(jià)格進(jìn)行了標(biāo)準(zhǔn)化。 任務(wù)變成預(yù)測(cè)相對(duì)變化率而不是絕對(duì)值。 在t時(shí)刻的標(biāo)準(zhǔn)化滑動(dòng)窗口W't中,所有的值除以最后一個(gè)未知價(jià)格 Wt-1中的最后一個(gè)價(jià)格:

建立模型
定義
- lstm_size:一個(gè)LSTM圖層中的單元數(shù)。
- num_layers:堆疊的LSTM層的數(shù)量。
- keep_prob:?jiǎn)卧裨?dropout 操作中保留的百分比。
- init_learning_rate:開(kāi)始學(xué)習(xí)的速率。
- learning_rate_decay:后期訓(xùn)練時(shí)期的衰減率。
- init_epoch:使用常量init_learning_rate的時(shí)期數(shù)。
- max_epoch:訓(xùn)練次數(shù)在訓(xùn)練中的總數(shù)
- input_size:移動(dòng)窗口的大小/一個(gè)訓(xùn)練數(shù)據(jù)點(diǎn)
- batch_size:在一個(gè)小批量中使用的數(shù)據(jù)點(diǎn)的數(shù)量。
The LSTM model has num_layers stacked LSTM layer(s) and each layer contains lstm_sizenumber of LSTM cells. Then a dropout mask with keep probability keep_prob is applied to the output of every LSTM cell. The goal of dropout is to remove the potential strong dependency on one dimension so as to prevent overfitting.
*T
he training requires max_epoch epochs in total; an epoch is a single full pass of all the training data points. In one epoch, the training data points are split into mini-batches of size batch_size. We send one mini-batch to the model for one BPTT learning. The learning rate is set to init_learning_rate during the first init_epoch epochs and then decayby learning_rate_decay during every succeeding epoch.???*
# Configuration is wrapped in one object for easy tracking and passing.
class RNNConfig():
input_size=1
num_steps=30
lstm_size=128
num_layers=1
keep_prob=0.8
batch_size = 64
init_learning_rate = 0.001
learning_rate_decay = 0.99
init_epoch = 5
max_epoch = 50
config = RNNConfig()
定義圖形
(1) Initialize a new graph first.
import tensorflow as tf
tf.reset_default_graph()
lstm_graph = tf.Graph()
(2) How the graph works should be defined within its scope.
with lstm_graph.as_default():
(3) Define the data required for computation. Here we need three input variables, all defined as
tf.placeholder
because we don’t know what they are at the graph construction stage.
- inputs:
the training data X, a tensor of shape (# data examples, num_steps, input_size); the number of data examples is unknown, so it is None. In our case, it would be batch_sizein training session. Check the input format example if confused. - targets:
the training label y, a tensor of shape (# data examples, input_size). - learning_rate:
a simple float.
# Dimension = (
# number of data examples,
# number of input in one computation step,
# number of numbers in one input
# )
# We don't know the number of examples beforehand, so it is None.
inputs = tf.placeholder(tf.float32, [None, config.num_steps, config.input_size])
targets = tf.placeholder(tf.float32, [None, config.input_size])
learning_rate = tf.placeholder(tf.float32, None)
(4) This function returns one
LSTMCell
with or without dropout operation.
def _create_one_cell():
return tf.contrib.rnn.LSTMCell(config.lstm_size, state_is_tuple=True)
if config.keep_prob < 1.0:
return tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=config.keep_prob)
(5) Let’s stack the cells into multiple layers if needed.
MultiRNNCell
helps connect sequentially multiple simple cells to compose one cell.
cell = tf.contrib.rnn.MultiRNNCell(
[_create_one_cell() for _ in range(config.num_layers)],
state_is_tuple=True
) if config.num_layers > 1 else _create_one_cell()
(6)
tf.nn.dynamic_rnn
constructs a recurrent neural network specified by cell (RNNCell). It returns a pair of (model outpus, state), where the outputs val is of size (batch_size, num_steps, lstm_size) by default. The state refers to the current state of the LSTM cell, not consumed here.
val, _ = tf.nn.dynamic_rnn(cell, inputs, dtype=tf.float32)
(7)
tf.transpose
converts the outputs from the dimension (batch_size, num_steps, lstm_size) to (num_steps, batch_size, lstm_size). Then the last output is picked.
# Before transpose, val.get_shape() = (batch_size, num_steps, lstm_size)
# After transpose, val.get_shape() = (num_steps, batch_size, lstm_size)
val = tf.transpose(val, [1, 0, 2])
# last.get_shape() = (batch_size, lstm_size)
ast = tf.gather(val, int(val.get_shape()[0]) - 1, name="last_lstm_output")
(8) Define weights and biases between the hidden and output layers.
weight = tf.Variable(tf.truncated_normal([config.lstm_size, config.input_size]))
bias = tf.Variable(tf.constant(0.1, shape=[targets_width]))
prediction = tf.matmul(last, weight) + bias
(9) We use mean square error as the loss metric and
the RMSPropOptimizer algorithm
for gradient descent optimization.
loss = tf.reduce_mean(tf.square(prediction - targets))
optimizer = tf.train.RMSPropOptimizer(learning_rate)
minimize = optimizer.minimize(loss)
開(kāi)始訓(xùn)練過(guò)程
(1) To start training the graph with real data, we need to start a tf.session
first.
with tf.Session(graph=lstm_graph) as sess:
(2) Initialize the variables as defined.
tf.global_variables_initializer().run()
(0) The learning rates for training epochs should have been precomputed beforehand. The index refers to the epoch index.
learning_rates_to_use = [
config.init_learning_rate * (
config.learning_rate_decay ** max(float(i + 1 - config.init_epoch), 0.0)
) for i in range(config.max_epoch)]
(3) Each loop below completes one epoch training.
for epoch_step in range(config.max_epoch):
current_lr = learning_rates_to_use[epoch_step]
# Check https://github.com/lilianweng/stock-rnn/blob/master/data_wrapper.py
# if you are curious to know what is StockDataSet and how generate_one_epoch()
# is implemented.
for batch_X, batch_y in stock_dataset.generate_one_epoch(config.batch_size):
train_data_feed = {
nputs: batch_X,
targets: batch_y,
learning_rate: current_lr
}
train_loss, _ = sess.run([loss, minimize], train_data_feed)
(4) Don’t forget to save your trained model at the end.
saver.save(sess, "your_awesome_model_path_and_name", global_step=max_epoch_step)
使用TensorBoard
在沒(méi)有可視化的情況下構(gòu)建圖形就像在黑暗中繪制,非常模糊和容易出錯(cuò)。 Tensorboard提供了圖形結(jié)構(gòu)和學(xué)習(xí)過(guò)程的簡(jiǎn)單可視化。 看看下面這個(gè)案例,非常實(shí)用:
Brief Summary
- Use with [tf.name_scope]
(https://www.tensorflow.org/api_docs/python/tf/name_scope)("your_awesome_module_name")
: to wrap elements working on the similar goal together. - Many tf.*
methods accepts
name=
argument. Assigning a customized name can make your life much easier when reading the graph. - Methods like
tf.summary.scalar
and
tf.summary.histogram
help track the values of variables in the graph during iterations. - In the training session, define a log file using
tf.summary.FileWriter.
with tf.Session(graph=lstm_graph) as sess:
merged_summary = tf.summary.merge_all()
writer = tf.summary.FileWriter("location_for_keeping_your_log_files", sess.graph)
writer.add_graph(sess.graph)
Later, write the training progress and summary results into the file.
_summary = sess.run([merged_summary], test_data_feed)
writer.add_summary(_summary, global_step=epoch_step) # epoch_step in range(config.max_epoch)


結(jié)果
我們?cè)诶又惺褂昧艘韵屡渲谩?/p>
num_layers=1
keep_prob=0.8
batch_size = 64
init_learning_rate = 0.001
learning_rate_decay = 0.99
init_epoch = 5
max_epoch = 100
num_steps=30
總的來(lái)說(shuō)預(yù)測(cè)股價(jià)并不是一件容易的事情。 特別是在正則化后,價(jià)格趨勢(shì)看起來(lái)非常嘈雜。
測(cè)試數(shù)據(jù)中最近200天的預(yù)測(cè)結(jié)果。 模型是用 input_size= 1 和 lstm_size= 32 來(lái)訓(xùn)練的。
image.png
測(cè)試數(shù)據(jù)中最近200天的預(yù)測(cè)結(jié)果。 模型是用 input_size= 1 和 lstm_size= 128 來(lái)訓(xùn)練的。
image.png
測(cè)試數(shù)據(jù)中最近200天的預(yù)測(cè)結(jié)果。 模型是用 input_size= 5 和 lstm_size= 128 來(lái)訓(xùn)練的。

代碼:
stock-rnn/main.py
import os
import pandas as pd
import pprint
import tensorflow as tf
import tensorflow.contrib.slim as slim
from data_model import StockDataSet
from model_rnn import LstmRNN
flags = tf.app.flags
flags.DEFINE_integer("stock_count", 100, "Stock count [100]")
flags.DEFINE_integer("input_size", 5, "Input size [5]")
flags.DEFINE_integer("num_steps", 30, "Num of steps [30]")
flags.DEFINE_integer("num_layers", 1, "Num of layer [1]")
flags.DEFINE_integer("lstm_size", 128, "Size of one LSTM cell [128]")
flags.DEFINE_integer("batch_size", 64, "The size of batch images [64]")
flags.DEFINE_float("keep_prob", 0.8, "Keep probability of dropout layer. [0.8]")
flags.DEFINE_float("init_learning_rate", 0.001, "Initial learning rate at early stage. [0.001]")
flags.DEFINE_float("learning_rate_decay", 0.99, "Decay rate of learning rate. [0.99]")
flags.DEFINE_integer("init_epoch", 5, "Num. of epoches considered as early stage. [5]")
flags.DEFINE_integer("max_epoch", 50, "Total training epoches. [50]")
flags.DEFINE_integer("embed_size", None, "If provided, use embedding vector of this size. [None]")
flags.DEFINE_string("stock_symbol", None, "Target stock symbol [None]")
flags.DEFINE_integer("sample_size", 4, "Number of stocks to plot during training. [4]")
flags.DEFINE_boolean("train", False, "True for training, False for testing [False]")
FLAGS = flags.FLAGS
pp = pprint.PrettyPrinter()
if not os.path.exists("logs"):
os.mkdir("logs")
def show_all_variables():
model_vars = tf.trainable_variables()
slim.model_analyzer.analyze_vars(model_vars, print_info=True)
def load_sp500(input_size, num_steps, k=None, target_symbol=None, test_ratio=0.05):
if target_symbol is not None:
return [
StockDataSet(
target_symbol,
input_size=input_size,
num_steps=num_steps,
test_ratio=test_ratio)
]
# Load metadata of s & p 500 stocks
info = pd.read_csv("data/constituents-financials.csv")
info = info.rename(columns={col: col.lower().replace(' ', '_') for col in info.columns})
info['file_exists'] = info['symbol'].map(lambda x: os.path.exists("data/{}.csv".format(x)))
print info['file_exists'].value_counts().to_dict()
info = info[info['file_exists'] == True].reset_index(drop=True)
info = info.sort('market_cap', ascending=False).reset_index(drop=True)
if k is not None:
info = info.head(k)
print "Head of S&P 500 info:\n", info.head()
# Generate embedding meta file
info[['symbol', 'sector']].to_csv(os.path.join("logs/metadata.tsv"), sep='\t', index=False)
return [
StockDataSet(row['symbol'],
input_size=input_size,
num_steps=num_steps,
test_ratio=0.05)
for _, row in info.iterrows()]
def main(_):
pp.pprint(flags.FLAGS.__flags)
# gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
run_config = tf.ConfigProto()
run_config.gpu_options.allow_growth = True
with tf.Session(config=run_config) as sess:
rnn_model = LstmRNN(
sess,
FLAGS.stock_count,
lstm_size=FLAGS.lstm_size,
num_layers=FLAGS.num_layers,
num_steps=FLAGS.num_steps,
input_size=FLAGS.input_size,
keep_prob=FLAGS.keep_prob,
embed_size=FLAGS.embed_size,
)
show_all_variables()
stock_data_list = load_sp500(
FLAGS.input_size,
FLAGS.num_steps,
k=FLAGS.stock_count,
target_symbol=FLAGS.stock_symbol,
)
if FLAGS.train:
rnn_model.train(stock_data_list, FLAGS)
else:
if not rnn_model.load()[0]:
raise Exception("[!] Train a model first, then run test mode")
if __name__ == '__main__':
tf.app.run()

