- elastic默認(rèn)的分詞插件對(duì)中文支持不好,比如對(duì)
中華人民共和國(guó)進(jìn)行分詞的時(shí)候:
1.使用默認(rèn)分詞插件,會(huì)分別搜索中、華、人、民、共、和、國(guó)
2. 推薦使用大名鼎鼎的ik分詞器, 地址:https://github.com/medcl/elasticsearch-analysis-ik/
3. 安裝方法如前面的一片文章所述,建議使用ik_smarter就夠了,上面的分詞會(huì)變成: 中華、人民、共和國(guó)
4. 使用ik之后,還是有一個(gè)問(wèn)題,對(duì)于中文用戶,很多人在搜索的時(shí)候不一定會(huì)切換到中文的輸入法,也就是輸入的是拼音, 但是ik并不支持拼音搜索,這樣搜出來(lái)的結(jié)果是英文或者是不準(zhǔn)確的,所以需要使用拼音分詞插件:https://github.com/medcl/elasticsearch-analysis-pinyin
5. 安裝方法依舊如前面所述
- 安裝插件之后,需要重新更新一下mapping,以實(shí)現(xiàn)拼音+多音字的搜索結(jié)果:
topic = \
{
"settings": {
"analysis": {
"analyzer": {
"ik_pinyin_analyzer": {
"type":"custom",
"tokenizer": "ik_smart",
"filter": ["my_pinyin","word_delimiter"]
}
},
"filter": {
"my_pinyin": {
"type": "pinyin",
"keep_first_letter": False,
"keep_full_pinyin": True,
"keep_none_chinese": True,
"keep_none_chinese_in_first_letter": True,
"keep_original": False,
"limit_first_letter_length": 16,
"lowercase": True,
"trim_whitespace": True,
}
}
}
},
"mappings" : {
"topic" : {
"properties" : {
"creator" : {
"type" : "string",
"index": "not_analyzed"
},
"postCount" : {
"type" : "integer",
"index": "not_analyzed"
},
"followNum" : {
"type" : "integer",
"index": "not_analyzed"
},
"creatTime" : {
"type" : "date",
"index": "not_analyzed"
},
"tagName": {
"type": "text",
"index": "analyzed",
"store": "no",
"analyzer": "ik_pinyin_analyzer",
"term_vector": "with_positions_offsets",
"boost": 10,
"fields" : {
"untouch": {
"type": "keyword"
}
}
}
}
}
}
}
-
搜索實(shí)現(xiàn)的結(jié)果如下圖所示:
Paste_Image.png
