人人插人人摸,18+在线观看国

分詞

一個tokenizer（分詞器）接收一個字符流，將之分割為獨立的tokens（詞元，通常是獨立的單詞），然后輸出tokens流。
例如：whitespace tokenizer遇到空白字符時分割文本。它會將文本“Quick brown fox!”分割為[Quick,brown,fox!]。

該tokenizer（分詞器）還負(fù)責(zé)記錄各個terms(詞條)的順序或position位置（用于phrase短語和word proximity詞近鄰查詢），以及term（詞條）所代表的原始word（單詞）的start（起始）和end（結(jié)束）的character offsets（字符串偏移量）（用于高亮顯示搜索的內(nèi)容）。

elasticsearch提供了很多內(nèi)置的分詞器，可以用來構(gòu)建custom analyzers（自定義分詞器）。
關(guān)于分詞器： https://www.elastic.co/guide/en/elasticsearch/reference/7.6/analysis.html

POST _analyze
{
  "analyzer": "standard",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

執(zhí)行結(jié)果：

{
  "tokens" : [
    {
      "token" : "the",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "2",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<NUM>",
      "position" : 1
    },
    {
      "token" : "quick",
      "start_offset" : 6,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "brown",
      "start_offset" : 12,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "foxes",
      "start_offset" : 18,
      "end_offset" : 23,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "jumped",
      "start_offset" : 24,
      "end_offset" : 30,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "over",
      "start_offset" : 31,
      "end_offset" : 35,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "the",
      "start_offset" : 36,
      "end_offset" : 39,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "lazy",
      "start_offset" : 40,
      "end_offset" : 44,
      "type" : "<ALPHANUM>",
      "position" : 8
    },
    {
      "token" : "dog's",
      "start_offset" : 45,
      "end_offset" : 50,
      "type" : "<ALPHANUM>",
      "position" : 9
    },
    {
      "token" : "bone",
      "start_offset" : 51,
      "end_offset" : 55,
      "type" : "<ALPHANUM>",
      "position" : 10
    }
  ]
}

安裝ik分詞器

所有的語言分詞，默認(rèn)使用的都是“Standard Analyzer”，但是這些分詞器針對于中文的分詞，并不友好。為此需要安裝中文的分詞器。

Mac下因為文件夾下有.DS_Store文件導(dǎo)致安裝分詞器有點問題，可以先啟動容器后進(jìn)入容器內(nèi)部進(jìn)行安裝

docker exec -it elasticsearch /bin/bash #進(jìn)入容器
/usr/share/elasticsearch/bin
./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
elasticsearch-plugin list  # 列出我們所有安裝的插件，看有沒有ik

CentOS環(huán)境按下面安裝
在 https://github.com/medcl/elasticsearch-analysis-ik/releases 找對應(yīng)es版本下載
在前面安裝的elasticsearch時，我們已經(jīng)將elasticsearch容器的“/usr/share/elasticsearch/plugins”目錄，映射到宿主機的 /mydata/elasticsearch/plugins 目錄下，所以比較方便的做法就是下載“/elasticsearch-analysis-ik-7.4.2.zip”文件，然后解壓到該文件夾下即可。安裝完畢后，需要重啟elasticsearch容器。

cd /mydata/elasticsearch/plugins
mkdir ik
cd /mydata/elasticsearch/plugins/ik
# 如果沒有wget 命令先安裝wget：yum -y install wget
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
unzip elasticsearch-analysis-ik-7.4.2.zip
# 如果報 unzip: command not found的錯誤就執(zhí)行下：yum install -y unzip zip 
chmod -R 777 ik
docker restart elasticsearch #重啟elasticsearch
docker exec -it elasticsearch /bin/bash #進(jìn)入容器
cd /usr/share/elasticsearch/plugins  #看有沒有ik目錄
cd /usr/share/elasticsearch/bin
elasticsearch-plugin -h
elasticsearch-plugin list  # 列出我們所有安裝的插件，看有沒有ik

還可以采用如下的方式。
查看elasticsearch版本號：

[root@hadoop-104 ~]# curl http://localhost:9200
{
  "name" : "0adeb7852e00",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "9gglpP0HTfyOTRAaSe2rIg",
  "version" : {
    "number" : "7.6.2",      #版本號為7.6.2
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "ef48eb35cf30adf4db14086e8aabd07ef6fb113f",
    "build_date" : "2020-03-26T06:34:37.794943Z",
    "build_snapshot" : false,
    "lucene_version" : "8.4.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}
[root@hadoop-104 ~]#

進(jìn)入es容器內(nèi)部plugin目錄：docker exec -it 容器id /bin/bash

[root@hadoop-104 ~]# docker exec -it elasticsearch /bin/bash
[root@0adeb7852e00 elasticsearch]# 
[root@0adeb7852e00 elasticsearch]# pwd
/usr/share/elasticsearch
#下載ik7.4.2
[root@0adeb7852e00 elasticsearch]# wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
[root@0adeb7852e00 elasticsearch]# unzip elasticsearch-analysis-ik-7.4.2.zip -d ink
Archive:  elasticsearch-analysis-ik-7.4.2.zip
   creating: ik/config/
  inflating: ik/config/main.dic      
  inflating: ik/config/quantifier.dic  
  inflating: ik/config/extra_single_word_full.dic  
  inflating: ik/config/IKAnalyzer.cfg.xml  
  inflating: ik/config/surname.dic   
  inflating: ik/config/suffix.dic    
  inflating: ik/config/stopword.dic  
  inflating: ik/config/extra_main.dic  
  inflating: ik/config/extra_stopword.dic  
  inflating: ik/config/preposition.dic  
  inflating: ik/config/extra_single_word_low_freq.dic  
  inflating: ik/config/extra_single_word.dic  
  inflating: ik/elasticsearch-analysis-ik-7.6.2.jar  
  inflating: ik/httpclient-4.5.2.jar  
  inflating: ik/httpcore-4.4.4.jar   
  inflating: ik/commons-logging-1.2.jar  
  inflating: ik/commons-codec-1.9.jar  
  inflating: ik/plugin-descriptor.properties  
  inflating: ik/plugin-security.policy  
[root@0adeb7852e00 elasticsearch]#
#移動到plugins目錄下
[root@0adeb7852e00 elasticsearch]# mv ik plugins/
[root@0adeb7852e00 elasticsearch]# rm -rf elasticsearch-analysis-ik-7.4.2.zip

測試分詞器

使用默認(rèn)分詞器

GET my_index/_analyze
{
   "text":"我是中國人"
}

執(zhí)行結(jié)果：

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "中",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "國",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "人",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    }
  ]
}

使用ik分詞器

GET my_index/_analyze
{
   "analyzer": "ik_smart", 
   "text":"我是中國人"
}

或者

GET my_index/_analyze
{
   "analyzer": "ik_max_word", 
   "text":"我是中國人"
}

輸出結(jié)果：

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中國人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

05-ElasticSearch分詞

05-ElasticSearch分詞

分詞

安裝ik分詞器

測試分詞器

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

05-ElasticSearch分詞

分詞

安裝ik分詞器

測試分詞器

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av