elasticsearch分词器-Toy模板网

这篇具有很好参考价值的文章主要介绍了elasticsearch分词器。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

内置分词器

Standard Analyzer - 默认分词器，英文按单词词切分，并小写处理
Simple Analyzer - 按照单词切分(符号被过滤), 小写处理
Stop Analyzer - 小写处理，停用词过滤(the,a,is)
Whitespace Analyzer - 按照空格切分，不转小写
Keyword Analyzer - 不分词，直接将输入当作输出

内置分词器测试

标准分词器：按照单词分词英文统一转为小写过滤标点符号中文单字分词

POST /_analyze
{
  "analyzer": "standard",
  "text": "this is a , good Man 中华人民共和国"
}

Simple 分词器：英文按照单词分词英文统一转为小写去掉符号中文按照空格进行分词

POST /_analyze
{
  "analyzer": "simple",
  "text": "this is a , good Man 中华人民共和国"
}

Whitespace 分词器：中文英文按照空格分词英文不会转为小写不去掉标点符号

POST /_analyze
{
  "analyzer": "whitespace",
  "text": "this is a , good Man"
}

创建索引设置分词

PUT /索引名
{
  "settings": {},
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "standard" //显示指定分词器
      }
    }
  }
}

中文分词器

在ES中支持中文分词器非常多如 smartCN、IK 等，推荐的就是 IK分词器

IK分词器的版本要你安装ES的版本一致

1. 下载对应版本
- [es@linux ~]$ wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.14.0/elasticsearch-analysis-ik-7.14.0.zip

# 2. 解压
- [es@linux ~]$ unzip elasticsearch-analysis-ik-6.2.4.zip #先使用yum install -y unzip

# 3. 移动到es安装目录的plugins目录中
- [es@linux ~]$ ls elasticsearch-6.2.4/plugins/
 [es@linux ~]$ mv elasticsearch elasticsearch-6.2.4/plugins/
 [es@linux ~]$ ls elasticsearch-6.2.4/plugins/
   elasticsearch
 [es@linux ~]$ ls elasticsearch-6.2.4/plugins/elasticsearch/
  commons-codec-1.9.jar    config                               httpclient-4.5.2.jar    plugin-descriptor.properties
  commons-logging-1.2.jar  elasticsearch-analysis-ik-6.2.4.jar  httpcore-4.4.4.jar
  
# 4. 重启es生效

# 5. 本地安装ik配置目录为  
- es安装目录中/plugins/analysis-ik/config/IKAnalyzer.cfg.xml

IK有两种颗粒度的拆分：文章来源地址https://www.toymoban.com/news/detail-799756.html