tf.keras.layers.Attention

點乘注意力層, 又叫Luong-style attention.

tf.keras.layers.Attention(
    use_scale=False, **kwargs
)

query' shape: [batch_size, Tq, dim], value's shape: [batch_size, Tv, dim], key's shape: [batch_size, Tv, dim], 計算的步驟如下:

  1. 計算點乘注意力分數[batch_size, Tq, Tv]: scores = tf.matmul(query, key, transpose_b=True)
  2. 計算softmax: distribution = tf.nn.softmax(scores)
  3. 對value加權求和: tf.matmul(distribution, value), 得到shape為[batch_size, Tq, dim]的輸出.
參數
use_scale 如果為 True, 將會創(chuàng)建一個標量的變量對注意力分數進行縮放.
causal Boolean. 可以設置為 True 用于解碼器的自注意力. 它會添加一個mask, 使位置i 看不到未來的信息.
dropout 0到1之間的浮點數. 對注意力分數的dropout

調用參數:

inputs:

  • query: [batch_size, Tq, dim]
  • value: [batch_size, Tv, dim]
  • key: [batch_size, Tv, dim], 如果沒有給定, 則默認key=value

mask:

  • query_mask: [batch_size, Tq], 如果給定, mask==False的位置輸出為0.
  • value_mask: [batch_size, Tv], 如果給定, mask==False的位置不會對輸出產生貢獻.

training: 是否啟用dropout

示例:

# Variable-length int sequences.
query_input = tf.keras.Input(shape=(None,), dtype='int32')
value_input = tf.keras.Input(shape=(None,), dtype='int32')

# Embedding lookup.
token_embedding = tf.keras.layers.Embedding(max_tokens, dimension)
# Query embeddings of shape [batch_size, Tq, dimension].
query_embeddings = token_embedding(query_input)
# Value embeddings of shape [batch_size, Tv, dimension].
value_embeddings = token_embedding(value_input)

# CNN layer.
cnn_layer = tf.keras.layers.Conv1D(
    filters=100,
    kernel_size=4,
    # Use 'same' padding so outputs have the same shape as inputs.
    padding='same')
# Query encoding of shape [batch_size, Tq, filters].
query_seq_encoding = cnn_layer(query_embeddings)
# Value encoding of shape [batch_size, Tv, filters].
value_seq_encoding = cnn_layer(value_embeddings)

# Query-value attention of shape [batch_size, Tq, filters].
query_value_attention_seq = tf.keras.layers.Attention()(
    [query_seq_encoding, value_seq_encoding])

# Reduce over the sequence axis to produce encodings of shape
# [batch_size, filters].
query_encoding = tf.keras.layers.GlobalAveragePooling1D()(
    query_seq_encoding)
query_value_attention = tf.keras.layers.GlobalAveragePooling1D()(
    query_value_attention_seq)

# Concatenate query and document encodings to produce a DNN input layer.
input_layer = tf.keras.layers.Concatenate()(
    [query_encoding, query_value_attention])

# Add DNN layers, and create Model.
# ...
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

友情鏈接更多精彩內容