20.RNN模型: 電影評(píng)論情感分析

  • 問題描述
    預(yù)測(cè)用戶的電影評(píng)論是積極還是消極的。
  • 流程
    1.文本數(shù)據(jù)-token化:將單詞轉(zhuǎn)化為int型的index
%%time
from tensorflow.python.keras.preprocessing.text import Tokenizer
num_words = 10000
tokenizer = Tokenizer(num_words=num_words)
tokenizer.fit_on_texts(data_text)
x_train_tokens = tokenizer.texts_to_sequences(x_train_text)
x_test_tokens = tokenizer.texts_to_sequences(x_test_text)

%%time可以獲得單元格運(yùn)行花費(fèi)的時(shí)間
2.padding和truncating
將不同長(zhǎng)度的文本轉(zhuǎn)化為長(zhǎng)度一致的輸入,并獲得idx2word的字典

pad = "pre"
x_train_pad = pad_sequences(x_train_tokens, maxlen=max_tokens,
                           padding=pad, truncating=pad)
x_test_pad = pad_sequences(x_test_tokens, maxlen=max_tokens,
                           padding=pad, truncating=pad)
idx = tokenizer.word_index
inverse_map = dict(zip(idx.values(), idx.keys()))

3.建立RNN模型
第一層是Embedding,接三層GRU,最后是FC層

model = Sequential()
embedding_size = 8
model.add(Embedding(input_dim=num_words,
                   output_dim=embedding_size,
                   input_length=max_tokens,
                   name="layer_embedding"))
model.add(GRU(units=16, return_sequences=True))
model.add(GRU(units=8, return_sequences=True))
model.add(GRU(units=4))
model.add(Dense(1, activation="sigmoid"))
optimizer = Adam(lr=1e-3)
model.compile(loss="binary_crossentropy",
             optimizer=optimizer,
             metrics=["accuracy"])
model.fit(x_train_pad, y_train,
         validation_split=0.05,
         epochs=1,
         batch_size=64)
# GRU的參數(shù):
# units: 正整數(shù),輸出空間的維度。
# return_sequences: 布爾值。是返回輸出序列中的最后一個(gè)輸出,還是全部序列。

4.新數(shù)據(jù)的預(yù)測(cè)

text1 = "This movie is fantastic! I really like it because it is so good!"
text2 = "Good movie!"
text3 = "Maybe I like this movie."
text4 = "Meh ..."
text5 = "If I were a drunk teenager then this movie might be good."
text6 = "Bad movie!"
text7 = "Not a good movie!"
text8 = "This movie really sucks! Can I get my money back please?"
texts = [text1, text2, text3, text4, text5, text6, text7, text8]

tokens = tokenizer.texts_to_sequences(texts)
tokens_pad = pad_sequences(tokens, maxlen=max_tokens,
                          padding=pad,
                          truncating=pad)

model.predict(tokens_pad)

5.查看embedding層的權(quán)重

layer_embedding = model.get_layer("layer_embedding")
weights_embedding = layer_embedding.get_weights()[0]
token_good = tokenizer.word_index["good"]
token_great = tokenizer.word_index["great"]
print(weights_embedding[token_good])
print(weights_embedding[token_great])
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容