五月天激情性爱网站,欧亚视频在线观看免费

rasa_nlu模型訓練

當我們準備好了rasa_nlu需要的訓練數(shù)據(jù)后，就可以開始訓練rasa_nlu模型。github貢獻的中文rasa_nlu的配置文件常見的有兩種，一種是 Rasa_NLU_Chi貢獻的基于mitie預訓練中文詞向量模型，yml配置文件如下：

language: "zh"

pipeline:
- name: "nlp_mitie"
  model: "data/total_word_feature_extractor_zh.dat"
- name: "tokenizer_jieba"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_entity_featurizer_regex"
- name: "intent_classifier_mitie"

而total_word_feature_extractor_zh.dat清一色的都是使用的原作者基于wiki百科訓練的數(shù)據(jù)模型；第二種是直接使用tensorflow_embedding，詞向量轉(zhuǎn)換后使用cos余弦相似度實現(xiàn)意圖區(qū)分，配置文件如下：

 language: "zh"

 pipeline:
 - name: "tokenizer_jieba"
 - name: "ner_crf"
 - name: "intent_featurizer_count_vectors"
   OOV_token: oov
   token_pattern: '(?u)\b\w+\b'
 - name: "intent_classifier_tensorflow_embedding"

官網(wǎng)提出的建議是如果訓練數(shù)據(jù)小于1000條采用第一種方案，如果訓練數(shù)據(jù)大于1000條采用第二種方案，第二種方案存在的問題是oov（未登錄詞）問題。

rasa_nlu自定義component

ner_bilstm_crf

上面兩套yml配置是比較常見的，但是在slot filling精確度上有時候不是很準確，所以我自定義了一套component，可以實現(xiàn)bilstm+ crf 和idcnn + crf兩套實體識別的模型，然后將代碼rasa_nlu_gao部署在pypi上，可以通過

pip install rasa-nlu-gao

將依賴install。在rasa_chatbot_cn這個demo中可以使用這兩套模型，具體yml配置如下：

language: "zh"

 pipeline:
   - name: "tokenizer_jieba"

   - name: "intent_featurizer_count_vectors"
     token_pattern: '(?u)\b\w+\b'
   - name: "intent_classifier_tensorflow_embedding"

   - name: "ner_bilstm_crf"
     lr: 0.001
     char_dim: 100
     lstm_dim: 100
     batches_per_epoch: 10
     seg_dim: 20
     num_segs: 4
     batch_size: 200
     tag_schema: "iobes"
     model_type: "bilstm" # 模型支持兩種idcnn膨脹卷積模型或bilstm雙向lstm模型
     clip: 5
     optimizer: "adam"
     dropout_keep: 0.5
     steps_check: 100

jieba_pseg_extractor

上面對slot filling的精度進行了提高，但是前提還是需要有大量的訓練數(shù)據(jù)，如果訓練數(shù)據(jù)不多的話還是建議使用ner_crf。在項目過程中還遇到了一個問題，就是在人名識別卡住了，我們總不可能在訓練數(shù)據(jù)上寫滿人名做訓練，這不切實際。好在jieba有詞性標注這個功能，幫我們實現(xiàn)人名的識別。然后我將jieba.posseg 實現(xiàn)在了rasa_nlu中，自定義了一個組件jieba_pseg_extractor，也是需要通過pip install rasa-nlu-gao下載，具體的yml配置文件如下：

language: "zh"

 pipeline:
 - name: "tokenizer_jieba"
 - name: "ner_crf"
 - name: "jieba_pseg_extractor"
   part_of_speech: ["nr", "ns", "nt"]
 - name: "intent_featurizer_count_vectors"
   OOV_token: oov
   token_pattern: '(?u)\b\w+\b'
 - name: "intent_classifier_tensorflow_embedding"

意外的驚喜是只要jieba能夠?qū)崿F(xiàn)的實體識別，這里都可以支持。除了可以人名識別，還可以做機構(gòu)名識別，地名識別等等。

總結(jié)

上面是我在項目過程中自定義的兩套components并將其部署到了pypi上，可以方便使用和下載，當然還會繼續(xù)維護。其實rasa這套框架很好，特別方便自定義組件。源碼也很好閱讀。后續(xù)文章會分享rasa-core實現(xiàn)中的坑和解決方案。原創(chuàng)文章，轉(zhuǎn)載請說明出處

Recommand

liveportraitweb
novelling
mmaudio
audiox

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

rasa對話系統(tǒng)踩坑記（二）

rasa對話系統(tǒng)踩坑記（二）

rasa_nlu模型訓練

rasa_nlu自定義component

ner_bilstm_crf

jieba_pseg_extractor

總結(jié)

Recommand

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

rasa對話系統(tǒng)踩坑記（二）

rasa_nlu模型訓練

rasa_nlu自定義component

ner_bilstm_crf

jieba_pseg_extractor

總結(jié)

Recommand

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av