wide&deep模型的原理就不再具體介紹了。
本文,我們基于該模型實(shí)現(xiàn)對電信客戶數(shù)據(jù)集的電信客戶流失預(yù)測,數(shù)據(jù)集下載地址為:https://www.kaggle.com/blastchar/telco-customer-churn/download
假設(shè)我們已對原始數(shù)據(jù)做了前期處理,得到的數(shù)據(jù)如下圖所示:
可見,我們已將原始數(shù)據(jù)中的字符串所代表的取值類型轉(zhuǎn)換成用整數(shù)表示了。
下面,我們先準(zhǔn)備訓(xùn)練數(shù)據(jù)。
import tensorflow as tf
import pandas as pd
from sklearn.model_selection import train_test_split
data = pd.read_csv(r'D:\new-telco-customer-churn.csv')
train, test = train_test_split(data, test_size=0.2, random_state=40)
train_y = train.pop('Churn')
test_y = test.pop('Churn')
下面,我們再定義特征列。
# 連續(xù)數(shù)值特征
tenure = tf.feature_column.numeric_column('tenure')
MonthlyCharges = tf.feature_column.numeric_column('MonthlyCharges')
TotalCharges = tf.feature_column.numeric_column('TotalCharges')
# 離散型特征
CATEGORICAL_COLUMNS = [
? ? ? ? ? ? ? ? ? ? ? 'gender', 'SeniorCitizen', 'Partner',
? ? ? ? ? ? ? ? ? ? ? 'Dependents', 'PhoneService', 'MultipleLines',
? ? ? ? ? ? ? ? ? ? ? 'InternetService', 'OnlineSecurity', 'OnlineBackup',
? ? ? ? ? ? ? ? ? ? ? 'DeviceProtection', 'TechSupport', 'StreamingTV',
? ? ? ? ? ? ? ? ? ? ? 'StreamingMovies', 'Contract', 'PaperlessBilling',
? ? ? ? ? ? ? ? ? ? ? 'PaymentMethod'
????????????????????????]
vocabulary = {}
for feature_name in CATEGORICAL_COLUMNS:
? ? vocabulary[feature_name] = train[feature_name].unique()
tenure_buckets = tf.feature_column.bucketized_column(tenure, boundaries=[0, 6, 12, 24, 36, 48, 80])
MonthlyCharges_buckets = tf.feature_column.bucketized_column(MonthlyCharges, boundaries=[10, 25, 40, 55, 70, 90, 120])
TotalCharges_buckets = tf.feature_column.bucketized_column(TotalCharges, boundaries=[0, 500, 1000, 2000, 4000, 6000, 9000])
# 構(gòu)建base_columns?
base_columns = [tenure_buckets, MonthlyCharges_buckets, TotalCharges_buckets]
for feature_name in CATEGORICAL_COLUMNS:
? ? ????temp_feature = tf.feature_column.indicator_column(
? ? ? ? ????????tf.feature_column.categorical_column_with_vocabulary_list(
????????????????feature_name,vocabulary[feature_name])
????????????????)
? ? ????base_columns.append(temp_feature)
crossed_columns = [
? ? tf.feature_column.crossed_column([tenure_buckets, MonthlyCharges_buckets], hash_bucket_size=36),
? ? tf.feature_column.crossed_column([tenure_buckets, TotalCharges_buckets], hash_bucket_size=16)
]
wide_columns = base_columns + crossed_columns
deep_columns = [tenure, MonthlyCharges, TotalCharges]
下面,創(chuàng)建模型和輸入函數(shù)。
from tensorflow import keras
model_wd = tf.estimator.DNNLinearCombinedClassifier(
? ? linear_feature_columns=wide_columns,
? ? linear_optimizer=keras.optimizers.Ftrl(learning_rate=0.001, l2_regularization_strength=1.0),? ? ?
? ? dnn_feature_columns=deep_columns,
? ? dnn_optimizer=keras.optimizers.Adagrad(learning_rate=0.1),? ?
? ? dnn_hidden_units=[64,32]? ? ? ?# 設(shè)置隱藏層的參數(shù)
? )
def input_fn(X, y, n_epochs=None, shuffle=True):
? ? dataset = tf.data.Dataset.from_tensor_slices((dict(X), y))
? ? if shuffle:
? ? ? ????dataset = dataset.shuffle(500)
? ? dataset = dataset.repeat(n_epochs)
? ? dataset = dataset.batch(100)
? ? return dataset
現(xiàn)在,我們可以訓(xùn)練模型了。
model_wd.train(input_fn=lambda:input_fn(train, train_y),max_steps=10000)
然后,在測試集上評估一下效果。
result = model_wd.evaluate(input_fn=lambda:input_fn(test, test_y, shuffle=False, n_epochs=1))
如果效果不錯(cuò),我們可以應(yīng)用該模型對新的樣本進(jìn)行 predict 了。