推薦系統(tǒng)遇上深度學(xué)習(xí)系列：
推薦系統(tǒng)遇上深度學(xué)習(xí)(一)--FM模型理論和實(shí)踐：http://www.itdecent.cn/p/152ae633fb00
推薦系統(tǒng)遇上深度學(xué)習(xí)(二)--FFM模型理論和實(shí)踐:http://www.itdecent.cn/p/781cde3d5f3d
推薦系統(tǒng)遇上深度學(xué)習(xí)(三)--DeepFM模型理論和實(shí)踐：
http://www.itdecent.cn/p/6f1c2643d31b
推薦系統(tǒng)遇上深度學(xué)習(xí)(四)--多值離散特征的embedding解決方案：http://www.itdecent.cn/p/4a7525c018b2
推薦系統(tǒng)遇上深度學(xué)習(xí)(五)--Deep&Cross Network模型理論和實(shí)踐：
http://www.itdecent.cn/p/77719fc252fa
推薦系統(tǒng)遇上深度學(xué)習(xí)(六)--PNN模型理論和實(shí)踐：http://www.itdecent.cn/p/be784ab4abc2
推薦系統(tǒng)遇上深度學(xué)習(xí)(七)--NFM模型理論和實(shí)踐：
http://www.itdecent.cn/p/4e65723ee632

1、引言

在CTR預(yù)估中，為了解決稀疏特征的問題，學(xué)者們提出了FM模型來建模特征之間的交互關(guān)系。但是FM模型只能表達(dá)特征之間兩兩組合之間的關(guān)系，無法建模兩個(gè)特征之間深層次的關(guān)系或者說多個(gè)特征之間的交互關(guān)系，因此學(xué)者們通過Deep Network來建模更高階的特征之間的關(guān)系。

因此 FM和深度網(wǎng)絡(luò)DNN的結(jié)合也就成為了CTR預(yù)估問題中主流的方法。有關(guān)FM和DNN的結(jié)合有兩種主流的方法，并行結(jié)構(gòu)和串行結(jié)構(gòu)。兩種結(jié)構(gòu)的理解以及實(shí)現(xiàn)如下表所示：

結(jié)構(gòu)	描述	常見模型
并行結(jié)構(gòu)	FM部分和DNN部分分開計(jì)算，只在輸出層進(jìn)行一次融合得到結(jié)果	DeepFM，DCN，Wide&Deep
串行結(jié)構(gòu)	將FM的一次項(xiàng)和二次項(xiàng)結(jié)果(或其中之一)作為DNN部分的輸入，經(jīng)DNN得到最終結(jié)果	PNN,NFM,AFM

今天介紹的AFM模型(Attentional Factorization Machine)，便是串行結(jié)構(gòu)中一種網(wǎng)絡(luò)模型。

2、AFM模型介紹

我們首先來回顧一下FM模型，F(xiàn)M模型用n個(gè)隱變量來刻畫特征之間的交互關(guān)系。這里要強(qiáng)調(diào)的一點(diǎn)是，n是特征的總數(shù)，是one-hot展開之后的，比如有三組特征，兩個(gè)連續(xù)特征，一個(gè)離散特征有5個(gè)取值，那么n=7而不是n=3.

順便回顧一下化簡過程：

可以看到，不考慮最外層的求和，我們可以得到一個(gè)K維的向量。

不難發(fā)現(xiàn)，在進(jìn)行預(yù)測時(shí)，F(xiàn)M會(huì)讓一個(gè)特征固定一個(gè)特定的向量，當(dāng)這個(gè)特征與其他特征做交叉時(shí)，都是用同樣的向量去做計(jì)算。這個(gè)是很不合理的，因?yàn)椴煌奶卣髦g的交叉，重要程度是不一樣的。如何體現(xiàn)這種重要程度，之前介紹的FFM模型是一個(gè)方案。另外，結(jié)合了attention機(jī)制的AFM模型，也是一種解決方案。

關(guān)于什么是attention model？本文不打算詳細(xì)贅述，我們這里只需要知道的是，attention機(jī)制相當(dāng)于一個(gè)加權(quán)平均，attention的值就是其中權(quán)重，判斷不同特征之間交互的重要性。

剛才提到了，attention相等于加權(quán)的過程，因此我們的預(yù)測公式變?yōu)椋?/p>

圓圈中有個(gè)點(diǎn)的符號(hào)代表的含義是element-wise product，即：

因此，我們在求和之后得到的是一個(gè)K維的向量，還需要跟一個(gè)向量p相乘，得到一個(gè)具體的數(shù)值。

可以看到，AFM的前兩部分和FM相同，后面的一項(xiàng)經(jīng)由如下的網(wǎng)絡(luò)得到：

圖中的前三部分：sparse iput，embedding layer，pair-wise interaction layer，都和FM是一樣的。而后面的兩部分，則是AFM的創(chuàng)新所在，也就是我們的Attention net。Attention背后的數(shù)學(xué)公式如下：

總結(jié)一下，不難看出AFM只是在FM的基礎(chǔ)上添加了attention的機(jī)制，但是實(shí)際上，由于最后的加權(quán)累加，二次項(xiàng)并沒有進(jìn)行更深的網(wǎng)絡(luò)去學(xué)習(xí)非線性交叉特征，所以AFM并沒有發(fā)揮出DNN的優(yōu)勢，也許結(jié)合DNN可以達(dá)到更好的結(jié)果。

3、代碼實(shí)現(xiàn)

終于到了激動(dòng)人心的代碼實(shí)戰(zhàn)環(huán)節(jié)了，本文的代碼有不對的的地方或者改進(jìn)之處還望大家多多指正。

本文的github地址為：
https://github.com/princewen/tensorflow_practice/tree/master/recommendation/Basic-AFM-Demo

本文的代碼根據(jù)之前DeepFM的代碼進(jìn)行改進(jìn)，我們只介紹模型的實(shí)現(xiàn)部分，其他數(shù)據(jù)處理的細(xì)節(jié)大家可以參考我的github上的代碼.

在介紹之前，我們先定義幾個(gè)維度，方便下面的介紹：
Embedding Size：K
Batch Size：N
Attention Size ：A
Field Size (這里是field size 不是feature size?。。。。?F

模型輸入

模型的輸入主要有下面幾個(gè)部分:

self.feat_index = tf.placeholder(tf.int32,
                                 shape=[None,None],
                                 name='feat_index')
self.feat_value = tf.placeholder(tf.float32,
                               shape=[None,None],
                               name='feat_value')

self.label = tf.placeholder(tf.float32,shape=[None,1],name='label')
self.dropout_keep_deep = tf.placeholder(tf.float32,shape=[None],name='dropout_deep_deep')

feat_index是特征的一個(gè)序號(hào)，主要用于通過embedding_lookup選擇我們的embedding。feat_value是對應(yīng)的特征值，如果是離散特征的話，就是1，如果不是離散特征的話，就保留原來的特征值。label是實(shí)際值。還定義了dropout來防止過擬合。

權(quán)重構(gòu)建

權(quán)重主要分以下幾部分，偏置項(xiàng)，一次項(xiàng)權(quán)重，embeddings，以及Attention部分的權(quán)重。除Attention部分的權(quán)重如下：

def _initialize_weights(self):
    weights = dict()

    #embeddings
    weights['feature_embeddings'] = tf.Variable(
        tf.random_normal([self.feature_size,self.embedding_size],0.0,0.01),
        name='feature_embeddings')
    weights['feature_bias'] = tf.Variable(tf.random_normal([self.feature_size,1],0.0,1.0),name='feature_bias')
    weights['bias'] = tf.Variable(tf.constant(0.1),name='bias')

Attention部分的權(quán)重我們詳細(xì)介紹一下，這里共有四個(gè)部分，分別對應(yīng)公式中的w，b，h和p。

weights['attention_w'] 的維度為 K * A，
weights['attention_b'] 的維度為 A，
weights['attention_h'] 的維度為 A，
weights['attention_p'] 的維度為 K * 1

# attention part
glorot = np.sqrt(2.0 / (self.attention_size + self.embedding_size))

weights['attention_w'] = tf.Variable(np.random.normal(loc=0,scale=glorot,size=(self.embedding_size,self.attention_size)),
                                     dtype=tf.float32,name='attention_w')

weights['attention_b'] = tf.Variable(np.random.normal(loc=0,scale=glorot,size=(self.attention_size,)),
                                     dtype=tf.float32,name='attention_b')

weights['attention_h'] = tf.Variable(np.random.normal(loc=0,scale=1,size=(self.attention_size,)),
                                     dtype=tf.float32,name='attention_h')


weights['attention_p'] = tf.Variable(np.ones((self.embedding_size,1)),dtype=np.float32)

Embedding Layer
這個(gè)部分很簡單啦，是根據(jù)feat_index選擇對應(yīng)的weights['feature_embeddings']中的embedding值，然后再與對應(yīng)的feat_value相乘就可以了：

# Embeddings
self.embeddings = tf.nn.embedding_lookup(self.weights['feature_embeddings'],self.feat_index) # N * F * K
feat_value = tf.reshape(self.feat_value,shape=[-1,self.field_size,1])
self.embeddings = tf.multiply(self.embeddings,feat_value) # N * F * K

Attention Net
Attention部分的實(shí)現(xiàn)嚴(yán)格按照上面給出的數(shù)學(xué)公式：

這里我們一步步來實(shí)現(xiàn)。

對于得到的embedding向量，我們首先需要兩兩計(jì)算其element-wise-product。即：

通過嵌套循環(huán)的方式得到的結(jié)果需要通過stack將其變?yōu)橐粋€(gè)tenser，此時(shí)的維度為(F * F - 1 / 2) * N* K，因此我們需要一個(gè)轉(zhuǎn)置操作，來得到維度為 N * (F * F - 1 / 2) * K的element-wize-product結(jié)果。

element_wise_product_list = []
for i in range(self.field_size):
    for j in range(i+1,self.field_size):
        element_wise_product_list.append(tf.multiply(self.embeddings[:,i,:],self.embeddings[:,j,:])) # None * K

self.element_wise_product = tf.stack(element_wise_product_list) # (F * F - 1 / 2) * None * K
self.element_wise_product = tf.transpose(self.element_wise_product,perm=[1,0,2],name='element_wise_product') # None * (F * F - 1 / 2) *  K

得到了element-wise-product之后，我們接下來計(jì)算：

計(jì)算之前，我們需要先對element-wise-product進(jìn)行reshape，將其變?yōu)槎S的tensor，在計(jì)算完之后再變換回三維tensor，此時(shí)的維度為 N * (F * F - 1 / 2) * A：

self.attention_wx_plus_b = tf.reshape(tf.add(tf.matmul(tf.reshape(self.element_wise_product,shape=(-1,self.embedding_size)),
                                                       self.weights['attention_w']),
                                             self.weights['attention_b']),
                                      shape=[-1,num_interactions,self.attention_size]) # N * ( F * F - 1 / 2) * A

然后我們計(jì)算：

此時(shí)的維度為 N * ( F * F - 1 / 2) * 1

self.attention_exp = tf.exp(tf.reduce_sum(tf.multiply(tf.nn.relu(self.attention_wx_plus_b),
                                               self.weights['attention_h']),
                                   axis=2,keep_dims=True)) # N * ( F * F - 1 / 2) * 1

然后計(jì)算：

這一層相當(dāng)于softmax了，不過我們還是用基本的方式寫出來：

self.attention_exp_sum = tf.reduce_sum(self.attention_exp,axis=1,keep_dims=True) # N * 1 * 1

self.attention_out = tf.div(self.attention_exp,self.attention_exp_sum,name='attention_out')  # N * ( F * F - 1 / 2) * 1

最后，我們計(jì)算得到經(jīng)attention net加權(quán)后的二次項(xiàng)結(jié)果：

self.attention_x_product = tf.reduce_sum(tf.multiply(self.attention_out,self.element_wise_product),axis=1,name='afm') # N * K

self.attention_part_sum = tf.matmul(self.attention_x_product,self.weights['attention_p']) # N * 1

得到預(yù)測輸出
為了得到預(yù)測輸出，除Attention part的輸出外，我們還需要兩部分，分別是偏置項(xiàng)和一次項(xiàng)：

# first order term
self.y_first_order = tf.nn.embedding_lookup(self.weights['feature_bias'], self.feat_index)
self.y_first_order = tf.reduce_sum(tf.multiply(self.y_first_order, feat_value), 2)

# bias
self.y_bias = self.weights['bias'] * tf.ones_like(self.label)

而我們的最終輸出如下：

# out
self.out = tf.add_n([tf.reduce_sum(self.y_first_order,axis=1,keep_dims=True),
                     self.attention_part_sum,
                     self.y_bias],name='out_afm')

剩下的代碼就不介紹啦！
好啦，本文只是提供一個(gè)引子，有關(guān)AFM的知識(shí)大家可以更多的進(jìn)行學(xué)習(xí)呦。

參考文獻(xiàn)：

https://zhuanlan.zhihu.com/p/33540686

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

推薦系統(tǒng)遇上深度學(xué)習(xí)(八)--AFM模型理論和實(shí)踐

推薦系統(tǒng)遇上深度學(xué)習(xí)(八)--AFM模型理論和實(shí)踐

1、引言

2、AFM模型介紹

3、代碼實(shí)現(xiàn)

參考文獻(xiàn)：

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

推薦系統(tǒng)遇上深度學(xué)習(xí)(八)--AFM模型理論和實(shí)踐

1、引言

2、AFM模型介紹

3、代碼實(shí)現(xiàn)

參考文獻(xiàn)：

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

1、引言

2、AFM模型介紹

3、代碼實(shí)現(xiàn)