TensorFlow Wide & Deep Learning Tutorial#
TensorFlow 廣度和深度學(xué)習(xí)教程#
In the previous TensorFlow Linear Model Tutorial, we trained a logistic regression model to predict the probability that the individual has an annual income of over 50,000 dollars using the Census Income Dataset. TensorFlow is great for training deep neural networks too, and you might be thinking which one you should choose—Well, why not both? Would it be possible to combine the strengths of both in one model?
在之前的TensorFlow 線性模型教程里,我們已經(jīng)訓(xùn)練了一個(gè)logistic回歸模型去使用人口普查收入數(shù)據(jù)來預(yù)測(cè)市民年收入能否達(dá)到50,000美金的概率。TensorFlow同樣在訓(xùn)練深度神經(jīng)網(wǎng)絡(luò)中表現(xiàn)優(yōu)秀,然后你也許會(huì)想或許哪一種更適合——那,為什么不兩者一起使用呢?有可能將這兩者的長處結(jié)合在一個(gè)模型里嗎?
In this tutorial, we'll introduce how to use the TF.Learn API to jointly train a wide linear model and a deep feed-forward neural network. This approach combines the strengths of memorization and generalization. It's useful for generic large-scale regression and classification problems with sparse input features (e.g., categorical features with a large number of possible feature values). If you're interested in learning more about how Wide & Deep Learning works, please check out our research paper.
在這篇教程,我們會(huì)介紹如何去使用TF.Learn API去共同訓(xùn)練一個(gè)廣度線性模型和一個(gè)深度前饋神經(jīng)網(wǎng)絡(luò)。這種方法結(jié)合了記憶和泛化的優(yōu)勢(shì)。它在通用大規(guī)模回歸和稀疏輸入特征的分類問題(例如分類特征有一個(gè)很大的可能值域)上十分有效。如果你有興趣想學(xué)習(xí)廣度和深度學(xué)習(xí)是如何工作的,可以查看我們的這篇研究論文。
![][01]
The figure above shows a comparison of a wide model (logistic regression with sparse features and transformations), a deep model (feed-forward neural network with an embedding layer and several hidden layers), and a Wide & Deep model (joint training of both). At a high level, there are only 3 steps to configure a wide, deep, or Wide & Deep model using the TF.Learn API:
- Select features for the wide part: Choose the sparse base columns and crossed columns you want to use.
- Select features for the deep part: Choose the continuous columns, the embedding dimension for each categorical column, and the hidden layer sizes.
- Put them all together in a Wide & Deep model (DNNLinearCombinedClassifier).
And that's it! Let's go through a simple example.
上圖展示了一個(gè)廣度模型(擁有稀疏特征和轉(zhuǎn)換的logistic回歸),一個(gè)深度模型(擁有嵌入層和多個(gè)隱藏層的前饋神經(jīng)網(wǎng)絡(luò))和一個(gè)廣度和深度模型(聯(lián)合訓(xùn)練)的對(duì)比。在高層級(jí)里,這里使用TF.Learn API只需要三個(gè)步驟去配置一個(gè)廣度,深度或廣度&深度模型。
- 為廣度部分選擇特征:選擇你想要使用的稀疏基本列和交叉列。
- 為深度部分選擇特征:選擇連續(xù)列,每一個(gè)分列類的嵌入層和隱藏層的大小。
- 在廣度和深度模型中將他們結(jié)合在一起(使用DNNLinearCombinedClassifier)
Setup#
安裝
To try the code for this tutorial:
Install TensorFlow if you haven't already.
Download the tutorial code.
Install the pandas data analysis library. tf.learn doesn't require pandas, but it does support it, and this tutorial uses pandas.
嘗試一下這篇教程的代碼:
安裝TensorFlow,如果你還沒安裝的話。
下載教程代碼.
安裝pandas數(shù)據(jù)分析庫。tf.learn并不依賴pandas,但是其支持它,并且此教程也會(huì)使用pandas.
To install pandas:
Get pip:
為了安裝pandas:
獲取pip:
Ubuntu/Linux 64-bit
$ sudo apt-get install python-pip python-dev
Mac OS X
$ sudo easy_install pip
$ sudo easy_install --upgrade six
>Use **pip** to install pandas:
使用**pip**去安裝pandas:
>```
$ sudo pip install pandas
If you have trouble installing pandas, consult the instructions on the pandas site.
如果你在安裝pandas的途中遇到了問題,可以在pandas的官網(wǎng)上參閱說明
Execute the tutorial code with the following command to train the linear model described in this tutorial:
使用以下的命令去執(zhí)行教程代碼以訓(xùn)練教程描述的線性模型:
$ python wide_n_deep_tutorial.py --model_type=wide_n_deep
>Read on to find out how this code builds its linear model.
通讀并理解這個(gè)代碼是如何建立線性模型的。
>#Define Base Feature Columns#
#定義基本特征列#
>First, let's define the base categorical and continuous feature columns that we'll use. These base columns will be the building blocks used by both the wide part and the deep part of the model.
首先,讓我們來定義要使用的基本分類特征列和基本連續(xù)特征列。這些基本列會(huì)用于構(gòu)建這個(gè)模型的廣度部分和深度部分。
>```
import tensorflow as tf
># Categorical base columns.
gender = tf.contrib.layers.sparse_column_with_keys(column_name="gender", keys=["Female", "Male"])
race = tf.contrib.layers.sparse_column_with_keys(column_name="race", keys=[
"Amer-Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White"])
education = tf.contrib.layers.sparse_column_with_hash_bucket("education", hash_bucket_size=1000)
relationship = tf.contrib.layers.sparse_column_with_hash_bucket("relationship", hash_bucket_size=100)
workclass = tf.contrib.layers.sparse_column_with_hash_bucket("workclass", hash_bucket_size=100)
occupation = tf.contrib.layers.sparse_column_with_hash_bucket("occupation", hash_bucket_size=1000)
native_country = tf.contrib.layers.sparse_column_with_hash_bucket("native_country", hash_bucket_size=1000)
># Continuous base columns.
age = tf.contrib.layers.real_valued_column("age")
age_buckets = tf.contrib.layers.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
education_num = tf.contrib.layers.real_valued_column("education_num")
capital_gain = tf.contrib.layers.real_valued_column("capital_gain")
capital_loss = tf.contrib.layers.real_valued_column("capital_loss")
hours_per_week = tf.contrib.layers.real_valued_column("hours_per_week")
The Wide Model: Linear Model with Crossed Feature Columns#
廣度模型:擁有交叉特征列的線性模型#
The wide model is a linear model with a wide set of sparse and crossed feature columns:
這個(gè)廣度模型是一個(gè)擁有稀疏集和交叉特征列的線性模型:
wide_columns = [
gender, native_country, education, occupation, workclass, relationship, age_buckets,
tf.contrib.layers.crossed_column([education, occupation], hash_bucket_size=int(1e4)),
tf.contrib.layers.crossed_column([native_country, occupation], hash_bucket_size=int(1e4)),
tf.contrib.layers.crossed_column([age_buckets, education, occupation], hash_bucket_size=int(1e6))]
>Wide models with crossed feature columns can memorize sparse interactions between features effectively. That being said, one limitation of crossed feature columns is that they do not generalize to feature combinations that have not appeared in the training data. Let's add a deep model with embeddings to fix that.
擁有交叉特征列的廣度模型能夠有效記憶稀疏特征之間的稀疏交互。雖說如此,交叉特征列的一個(gè)限制在于他并不推廣到?jīng)]有在訓(xùn)練集出現(xiàn)過的特征組合。所以讓我們來添加一個(gè)帶嵌入的深度模型去修復(fù)這個(gè)問題。
>#The Deep Model: Neural Network with Embeddings#
#深度部分:帶嵌入的神經(jīng)網(wǎng)絡(luò)#
>The deep model is a feed-forward neural network, as shown in the previous figure. Each of the sparse, high-dimensional categorical features are first converted into a low-dimensional and dense real-valued vector, often referred to as an embedding vector. These low-dimensional dense embedding vectors are concatenated with the continuous features, and then fed into the hidden layers of a neural network in the forward pass. The embedding values are initialized randomly, and are trained along with all other model parameters to minimize the training loss. If you're interested in learning more about embeddings, check out the TensorFlow tutorial on [Vector Representations of Words](https://www.tensorflow.org/versions/r0.12/tutorials/word2vec/index.html), or [Word Embedding](https://en.wikipedia.org/wiki/Word_embedding) on Wikipedia.
這個(gè)深度模式是前饋神經(jīng)網(wǎng)絡(luò),即上圖顯示的那張。每一個(gè)稀疏,高維的分類特征是首先被轉(zhuǎn)化成低維和密集的實(shí)值向量,通常被稱為嵌入向量。這些低維密集嵌入向量與連續(xù)特征相連接,然后在前向傳遞過程中反饋進(jìn)神經(jīng)網(wǎng)絡(luò)的隱藏層。嵌入值是隨機(jī)初始化的,并與所有其他模型參數(shù)一起訓(xùn)練以最小化訓(xùn)練損失。如果你有興趣想要了解有關(guān)嵌入,可以查看TensorFlow教程的[單詞的向量表現(xiàn)法](https://www.tensorflow.org/versions/r0.12/tutorials/word2vec/index.html)或維基百科上的[Word Embedding](https://en.wikipedia.org/wiki/Word_embedding)條目。
>We'll configure the embeddings for the categorical columns using embedding_column, and concatenate them with the continuous columns:
我們將使用嵌入列來對(duì)分類列進(jìn)行嵌入操作,并且使其與連續(xù)列進(jìn)行關(guān)聯(lián):
>```
deep_columns = [
tf.contrib.layers.embedding_column(workclass, dimension=8),
tf.contrib.layers.embedding_column(education, dimension=8),
tf.contrib.layers.embedding_column(gender, dimension=8),
tf.contrib.layers.embedding_column(relationship, dimension=8),
tf.contrib.layers.embedding_column(native_country, dimension=8),
tf.contrib.layers.embedding_column(occupation, dimension=8),
age, education_num, capital_gain, capital_loss, hours_per_week]
The higher the dimension of the embedding is, the more degrees of freedom the model will have to learn the representations of the features. For simplicity, we set the dimension to 8 for all feature columns here. Empirically, a more informed decision for the number of dimensions is to start with a value on the order of ![][log] or ![][k4] where ![][n] is the number of unique features in a feature column and ![][k] is a small constant (usually smaller than 10).
嵌入的維度越高,模型將擁有更高的自由度去學(xué)習(xí)表示特征。為了簡單起見,我們?cè)诖藢⑺刑卣髁械木S度設(shè)置為8。根據(jù)經(jīng)驗(yàn)來看,更明智的確定維度數(shù)量的方法是以一個(gè)值大約是 ![][log] 或 ![][k4]為起點(diǎn)的數(shù)量,其中![][n]是特征列中唯一特征的數(shù)量和![][k]是一個(gè)很小的常數(shù)(通常小于10)。
Through dense embeddings, deep models can generalize better and make predictions on feature pairs that were previously unseen in the training data. However, it is difficult to learn effective low-dimensional representations for feature columns when the underlying interaction matrix between two feature columns is sparse and high-rank. In such cases, the interaction between most feature pairs should be zero except a few, but dense embeddings will lead to nonzero predictions for all feature pairs, and thus can over-generalize. On the other hand, linear models with crossed features can memorize these “exception rules” effectively with fewer model parameters.
通過復(fù)雜的嵌入,深層模型可以更好地推廣,并對(duì)以前在訓(xùn)練集中未曾出現(xiàn)過的特征對(duì)進(jìn)行預(yù)測(cè)。然而,他卻很難在兩個(gè)特征列的底層交互矩陣即稀疏又高秩的時(shí),其難以有效的學(xué)習(xí)特征列的低維表示。在這種情況下,特多數(shù)特征的交互應(yīng)該為0或很少。但是密集嵌入會(huì)導(dǎo)致讓所有特征作出非0預(yù)測(cè),并導(dǎo)致過度推廣。 另一方面,帶交叉特征的線性模型能夠在使用少量模型參數(shù)中能夠有效的記憶這些“例外事件”。
Now, let's see how to jointly train wide and deep models and allow them to complement each other’s strengths and weaknesses.
現(xiàn)在,讓我們看看如何共同訓(xùn)練廣度和深度模型并使其互補(bǔ)。
Combining Wide and Deep Models into One#
將廣度和深度模型合為一體#
The wide models and deep models are combined by summing up their final output log odds as the prediction, then feeding the prediction to a logistic loss function. All the graph definition and variable allocations have already been handled for you under the hood, so you simply need to create a DNNLinearCombinedClassifier:
廣度模型和深度模型通過將它們的最終輸出的對(duì)數(shù)幾率相加和作為預(yù)測(cè)結(jié)果,然后將預(yù)測(cè)結(jié)果反饋到對(duì)數(shù)損失函數(shù)。所有的圖定義和變量分配已經(jīng)自動(dòng)處理了,因此你只需要簡單的創(chuàng)建一個(gè)DNNLinearCombinedClassifier:
import tempfile
model_dir = tempfile.mkdtemp()
m = tf.contrib.learn.DNNLinearCombinedClassifier(
model_dir=model_dir,
linear_feature_columns=wide_columns,
dnn_feature_columns=deep_columns,
dnn_hidden_units=[100, 50])
>#Training and Evaluating The Model#
#訓(xùn)練和評(píng)估模型#
>Before we train the model, let's read in the Census dataset as we did in the [TensorFlow Linear Model tutorial](https://www.tensorflow.org/versions/master/tutorials/wide/). The code for input data processing is provided here again for your convenience:
在我們開始訓(xùn)練這個(gè)模型之前,我們先按照我們之間在[TensorFlow線性模型教程](http://www.itdecent.cn/p/6868fc1f65d0)所做過的那樣,讀入人口普查數(shù)據(jù)集。為方便起見,這里再次提供輸入數(shù)據(jù)的處理代碼:
>```
import pandas as pd
import urllib
># Define the column names for the data sets.
COLUMNS = ["age", "workclass", "fnlwgt", "education", "education_num",
"marital_status", "occupation", "relationship", "race", "gender",
"capital_gain", "capital_loss", "hours_per_week", "native_country", "income_bracket"]
LABEL_COLUMN = 'label'
CATEGORICAL_COLUMNS = ["workclass", "education", "marital_status", "occupation",
"relationship", "race", "gender", "native_country"]
CONTINUOUS_COLUMNS = ["age", "education_num", "capital_gain", "capital_loss",
"hours_per_week"]
># Download the training and test data to temporary files.
# Alternatively, you can download them yourself and change train_file and
# test_file to your own paths.
train_file = tempfile.NamedTemporaryFile()
test_file = tempfile.NamedTemporaryFile()
urllib.urlretrieve("http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/adult.data", train_file.name)
urllib.urlretrieve("http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/adult.test", test_file.name)
># Read the training and test data sets into Pandas dataframe.
df_train = pd.read_csv(train_file, names=COLUMNS, skipinitialspace=True)
df_test = pd.read_csv(test_file, names=COLUMNS, skipinitialspace=True, skiprows=1)
df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
df_test[LABEL_COLUMN] = (df_test['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
>def input_fn(df):
# Creates a dictionary mapping from each continuous feature column name (k) to
# the values of that column stored in a constant Tensor.
continuous_cols = {k: tf.constant(df[k].values)
for k in CONTINUOUS_COLUMNS}
# Creates a dictionary mapping from each categorical feature column name (k)
# to the values of that column stored in a tf.SparseTensor.
categorical_cols = {k: tf.SparseTensor(
indices=[[i, 0] for i in range(df[k].size)],
values=df[k].values,
shape=[df[k].size, 1])
for k in CATEGORICAL_COLUMNS}
# Merges the two dictionaries into one.
feature_cols = dict(continuous_cols.items() + categorical_cols.items())
# Converts the label column into a constant Tensor.
label = tf.constant(df[LABEL_COLUMN].values)
# Returns the feature columns and the label.
return feature_cols, label
>def train_input_fn():
return input_fn(df_train)
>def eval_input_fn():
return input_fn(df_test)
After reading in the data, you can train and evaluate the model:
在讀入數(shù)據(jù)之后,你可以開始訓(xùn)練和評(píng)估這個(gè)模型了:
m.fit(input_fn=train_input_fn, steps=200)
results = m.evaluate(input_fn=eval_input_fn, steps=1)
for key in sorted(results):
print "%s: %s" % (key, results[key])
>The first line of the output should be something like **accuracy: 0.84429705**. We can see that the accuracy was improved from about 83.6% using a wide-only linear model to about 84.4% using a Wide & Deep model. If you'd like to see a working end-to-end example, you can download our [example code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/wide_n_deep_tutorial.py).
輸出的第一行應(yīng)該是類似于**accuracy: 0.84429705**。我們可以看到準(zhǔn)確率從廣度模型的83.6%中提升到了廣度和深度模型的84.4%。如果你想要一個(gè)完整的代碼,你可以下載我們的[完整代碼](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/wide_n_deep_tutorial.py)。
>Note that this tutorial is just a quick example on a small dataset to get you familiar with the API. Wide & Deep Learning will be even more powerful if you try it on a large dataset with many sparse feature columns that have a large number of possible feature values. Again, feel free to take a look at our [research paper](http://arxiv.org/abs/1606.07792) for more ideas about how to apply Wide & Deep Learning in real-world large-scale maching learning problems.
要注意的是這個(gè)教程只是提一個(gè)在小數(shù)據(jù)集上熟悉API的簡單例子。如果有許多具有大量可能的特征值的稀疏特征列的大型數(shù)據(jù)集上嘗試,廣度和深度學(xué)習(xí)會(huì)變得更加強(qiáng)大。再次,隨時(shí)可以看看我們的[研究論文](http://arxiv.org/abs/1606.07792)以了解更多關(guān)于如何應(yīng)用廣度和深度學(xué)習(xí)在現(xiàn)實(shí)世界大規(guī)模機(jī)器學(xué)習(xí)問題的想法
> 原文:https://www.tensorflow.org/versions/master/tutorials/wide_and_deep/index.html
[01]:https://www.tensorflow.org/versions/master/images/wide_n_deep.svg
[n]:http://latex.codecogs.com/png.latex?n
[k]:http://latex.codecogs.com/png.latex?k
[log]:http://latex.codecogs.com/png.latex?\log_2(n)
[k4]:http://latex.codecogs.com/png.latex?k\sqrt[4]n