NLP(二十三)序列標(biāo)注算法評估模塊seqeval的使用

??在NLP中,序列標(biāo)注算法是常見的深度學(xué)習(xí)模型,但是,對于序列標(biāo)注算法的評估,我們真的熟悉嗎?
??在本文中,筆者將會序列標(biāo)注算法的模型效果評估方法和seqeval的使用。

序列標(biāo)注算法的模型效果評估

??在序列標(biāo)注算法中,一般我們會形成如下的序列列表,如下:

['O', 'O', 'B-MISC', 'I-MISC', 'B-MISC', 'I-MISC', 'O', 'B-PER', 'I-PER']

一般序列標(biāo)注算法的格式有BIO,IOBES,BMES等。其中,實體指的是從B開頭標(biāo)簽開始的,同一類型(比如:PER/LOC/ORG)的,非O的連續(xù)標(biāo)簽序列。
??常見的序列標(biāo)注算法的模型效果評估指標(biāo)有準(zhǔn)確率(accuracy)、查準(zhǔn)率(percision)、召回率(recall)、F1值等,計算的公式如下:

  • 準(zhǔn)確率: accuracy = 預(yù)測對的元素個數(shù)/總的元素個數(shù)
  • 查準(zhǔn)率:precision = 預(yù)測正確的實體個數(shù) / 預(yù)測的實體總個數(shù)
  • 召回率:recall = 預(yù)測正確的實體個數(shù) / 標(biāo)注的實體總個數(shù)
  • F1值:F1 = 2 *準(zhǔn)確率 * 召回率 / (準(zhǔn)確率 + 召回率)

??舉個例子,我們有如下的真實序列y_true和預(yù)測序列y_pred,如下:

y_true = ['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O', 'B-PER', 'I-PER']
y_pred = ['O', 'O', 'B-MISC', 'I-MISC', 'B-MISC', 'I-MISC', 'O', 'B-PER', 'I-PER']

列表中一個有9個元素,其中預(yù)測對的元素個數(shù)為6個,那么準(zhǔn)確率為2/3。標(biāo)注的實體總個數(shù)為2個,預(yù)測的實體總個數(shù)為3個,預(yù)測正確的實體個數(shù)為1個,那么precision=1/3, recall=1/2, F1=0.4。

seqeval的使用

??一般我們的序列標(biāo)注算法,是用conlleval.pl腳本實現(xiàn),但這是用perl語言實現(xiàn)的。在Python中,也有相應(yīng)的序列標(biāo)注算法的模型效果評估的第三方模塊,那就是seqeval,其官網(wǎng)網(wǎng)址為:https://pypi.org/project/seqeval/0.0.3/ 。
??seqeval支持BIO,IOBES標(biāo)注模式,可用于命名實體識別,詞性標(biāo)注,語義角色標(biāo)注等任務(wù)的評估。
??官網(wǎng)文檔中給出了兩個例子,筆者修改如下:
??例子1:

# -*- coding: utf-8 -*-
from seqeval.metrics import f1_score
from seqeval.metrics import precision_score
from seqeval.metrics import accuracy_score
from seqeval.metrics import recall_score
from seqeval.metrics import classification_report

y_true = ['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O', 'B-PER', 'I-PER']
y_pred = ['O', 'O', 'B-MISC', 'I-MISC', 'B-MISC', 'I-MISC', 'O', 'B-PER', 'I-PER']

print("accuary: ", accuracy_score(y_true, y_pred))
print("p: ", precision_score(y_true, y_pred))
print("r: ", recall_score(y_true, y_pred))
print("f1: ", f1_score(y_true, y_pred))
print("classification report: ")
print(classification_report(y_true, y_pred))

輸出結(jié)果如下:

accuary:  0.6666666666666666
p:  0.3333333333333333
r:  0.5
f1:  0.4
classification report: 
           precision    recall  f1-score   support

     MISC       0.00      0.00      0.00         1
      PER       1.00      1.00      1.00         1

micro avg       0.33      0.50      0.40         2
macro avg       0.50      0.50      0.50         2

??例子2:

# -*- coding: utf-8 -*-
from seqeval.metrics import f1_score
from seqeval.metrics import precision_score
from seqeval.metrics import accuracy_score
from seqeval.metrics import recall_score
from seqeval.metrics import classification_report

y_true = [['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER']]
y_pred =  [['O', 'O', 'B-MISC', 'I-MISC', 'B-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER']]

print("accuary: ", accuracy_score(y_true, y_pred))
print("p: ", precision_score(y_true, y_pred))
print("r: ", recall_score(y_true, y_pred))
print("f1: ", f1_score(y_true, y_pred))
print("classification report: ")
print(classification_report(y_true, y_pred))

輸出結(jié)果同上。

在Keras中使用seqeval

??筆者一年多年寫過文章:用深度學(xué)習(xí)實現(xiàn)命名實體識別(NER), 我們對模型訓(xùn)練部分的代碼加以改造,使之在訓(xùn)練過程中能輸出F1值。
??在Github上下載項目DL_4_NER,網(wǎng)址為:https://github.com/percent4/DL_4_NER 。修改utils.py中的文件夾路徑,以及模型訓(xùn)練部分的代碼(DL_4_NER/Bi_LSTM_Model_training.py)如下:

# -*- coding: utf-8 -*-
import pickle
import numpy as np
import pandas as pd
from utils import BASE_DIR, CONSTANTS, load_data
from data_processing import data_processing
from keras.utils import np_utils, plot_model
from keras.models import Sequential
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Bidirectional, LSTM, Dense, Embedding, TimeDistributed


# 模型輸入數(shù)據(jù)
def input_data_for_model(input_shape):

    # 數(shù)據(jù)導(dǎo)入
    input_data = load_data()
    # 數(shù)據(jù)處理
    data_processing()
    # 導(dǎo)入字典
    with open(CONSTANTS[1], 'rb') as f:
        word_dictionary = pickle.load(f)
    with open(CONSTANTS[2], 'rb') as f:
        inverse_word_dictionary = pickle.load(f)
    with open(CONSTANTS[3], 'rb') as f:
        label_dictionary = pickle.load(f)
    with open(CONSTANTS[4], 'rb') as f:
        output_dictionary = pickle.load(f)
    vocab_size = len(word_dictionary.keys())
    label_size = len(label_dictionary.keys())

    # 處理輸入數(shù)據(jù)
    aggregate_function = lambda input: [(word, pos, label) for word, pos, label in
                                            zip(input['word'].values.tolist(),
                                                input['pos'].values.tolist(),
                                                input['tag'].values.tolist())]

    grouped_input_data = input_data.groupby('sent_no').apply(aggregate_function)
    sentences = [sentence for sentence in grouped_input_data]

    x = [[word_dictionary[word[0]] for word in sent] for sent in sentences]
    x = pad_sequences(maxlen=input_shape, sequences=x, padding='post', value=0)
    y = [[label_dictionary[word[2]] for word in sent] for sent in sentences]
    y = pad_sequences(maxlen=input_shape, sequences=y, padding='post', value=0)
    y = [np_utils.to_categorical(label, num_classes=label_size + 1) for label in y]

    return x, y, output_dictionary, vocab_size, label_size, inverse_word_dictionary


# 定義深度學(xué)習(xí)模型:Bi-LSTM
def create_Bi_LSTM(vocab_size, label_size, input_shape, output_dim, n_units, out_act, activation):
    model = Sequential()
    model.add(Embedding(input_dim=vocab_size + 1, output_dim=output_dim,
                        input_length=input_shape, mask_zero=True))
    model.add(Bidirectional(LSTM(units=n_units, activation=activation,
                                 return_sequences=True)))
    model.add(TimeDistributed(Dense(label_size + 1, activation=out_act)))
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model


# 模型訓(xùn)練
def model_train():

    # 將數(shù)據(jù)集分為訓(xùn)練集和測試集,占比為9:1
    input_shape = 60
    x, y, output_dictionary, vocab_size, label_size, inverse_word_dictionary = input_data_for_model(input_shape)
    train_end = int(len(x)*0.9)
    train_x, train_y = x[0:train_end], np.array(y[0:train_end])
    test_x, test_y = x[train_end:], np.array(y[train_end:])

    # 模型輸入?yún)?shù)
    activation = 'selu'
    out_act = 'softmax'
    n_units = 100
    batch_size = 32
    epochs = 10
    output_dim = 20

    # 模型訓(xùn)練
    lstm_model = create_Bi_LSTM(vocab_size, label_size, input_shape, output_dim, n_units, out_act, activation)
    lstm_model.fit(train_x, train_y, validation_data=(test_x, test_y), epochs=epochs, batch_size=batch_size, verbose=1)


model_train()

模型訓(xùn)練的結(jié)果如下(中間過程省略):

......
12598/12598 [==============================] - 26s 2ms/step - loss: 0.0075 - acc: 0.9981 - val_loss: 0.2131 - val_acc: 0.9592

??我們修改代碼,在lstm_model.fit那一行修改代碼如下:

    lables = ['O', 'B-MISC', 'I-MISC', 'B-ORG', 'I-ORG', 'B-PER', 'B-LOC', 'I-PER', 'I-LOC', 'sO']
    id2label = dict(zip(range(len(lables)), lables))
    callbacks = [F1Metrics(id2label)]
    lstm_model.fit(train_x, train_y, validation_data=(test_x, test_y), epochs=epochs,
                   batch_size=batch_size, verbose=1, callbacks=callbacks)

此時輸出結(jié)果為:

12598/12598 [==============================] - 26s 2ms/step - loss: 0.0089 - acc: 0.9978 - val_loss: 0.2145 - val_acc: 0.9560
 - f1: 95.40
           precision    recall  f1-score   support

     MISC     0.9707    0.9833    0.9769     15844
      PER     0.9080    0.8194    0.8614      1157
      LOC     0.7517    0.8095    0.7795       677
      ORG     0.8290    0.7289    0.7757       745
       sO     0.7757    0.8300    0.8019       100

micro avg     0.9524    0.9556    0.9540     18523
macro avg     0.9520    0.9556    0.9535     18523

這就是seqeval的強(qiáng)大之處。
??關(guān)于seqeval在Keras的使用,有不清楚的地方可以參考該項目的Github網(wǎng)址:https://github.com/chakki-works/seqeval 。

總結(jié)

??感謝大家的閱讀,本次分享到此結(jié)束。
??歡迎大家關(guān)注我的微信公眾號:Python爬蟲與算法

參考網(wǎng)址

  1. 序列標(biāo)注的準(zhǔn)確率和召回率計算: https://zhuanlan.zhihu.com/p/56582082
  2. seqeval官方文檔: https://pypi.org/project/seqeval/0.0.3/
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容