es7.x Es常用核心知识快捷版1（分词和text和keyword）-Toy模板网

这篇具有很好参考价值的文章主要介绍了es7.x Es常用核心知识快捷版1（分词和text和keyword）。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

一分词

1.1 分词

1.1.1 查看分词

standard标准分析器是将每个字都分出来；

而ik_max_word是最细粒度的分词，将所有可能的词都分出来；

ik_smart 是最粗粒度的分词；

ik_smart

优点:特征是粗略快速的将文字进行分词,占用空间小,查询速度快

缺点:分词的颗粒度大,可能跳过一些重要分词,导致查询结果不全面,查全率低。

ik_max_word

优点:特征是详细的文字片段进行分词,查询时查全率高,不容易遗漏数据

缺点:因为分词太过详细,导致有一些无用分词,占用空间较大,查询速度慢standard是ES默认的分词器,"analyzer": "standard"是可以省略的

1.1.2 几种分词比较

1.使用 ik_max_word 分词

### 请求

GET http://localhost:9200/_analyze

{

"text":"中华人民共和国人民大会堂",

"analyzer":"ik_max_word"

}

响应结果：

{

"tokens": [

{

"token": "中华人民共和国",

"start_offset": 0,

"end_offset": 7,

"type": "CN_WORD",

"position": 0

{

"token": "中华人民",

"start_offset": 0,

"end_offset": 4,

"type": "CN_WORD",

"position": 1

{

"token": "中华",

"start_offset": 0,

"end_offset": 2,

"type": "CN_WORD",

"position": 2

{

"token": "华人",

"start_offset": 1,

"end_offset": 3,

"type": "CN_WORD",

"position": 3

{

"token": "人民共和国",

"start_offset": 2,

"end_offset": 7,

"type": "CN_WORD",

"position": 4

{

"token": "人民",

"start_offset": 2,

"end_offset": 4,

"type": "CN_WORD",

"position": 5

{

"token": "共和国",

"start_offset": 4,

"end_offset": 7,

"type": "CN_WORD",

"position": 6

{

"token": "共和",

"start_offset": 4,

"end_offset": 6,

"type": "CN_WORD",

"position": 7

{

"token": "国人",

"start_offset": 6,

"end_offset": 8,

"type": "CN_WORD",

"position": 8

{

"token": "人民大会堂",

"start_offset": 7,

"end_offset": 12,

"type": "CN_WORD",

"position": 9

{

"token": "人民大会",

"start_offset": 7,

"end_offset": 11,

"type": "CN_WORD",

"position": 10

{

"token": "人民",

"start_offset": 7,

"end_offset": 9,

"type": "CN_WORD",

"position": 11

{

"token": "大会堂",

"start_offset": 9,

"end_offset": 12,

"type": "CN_WORD",

"position": 12

{

"token": "大会",

"start_offset": 9,

"end_offset": 11,

"type": "CN_WORD",

"position": 13

{

"token": "会堂",

"start_offset": 10,

"end_offset": 12,

"type": "CN_WORD",

"position": 14

}

]

}

2.使用standard分词器

### 请求

GET http://localhost:9200/_analyze

{

"text":"中华人民共和国人民大会堂",

"analyzer":"standard"

}

响应结果：

{

"tokens": [

{

"token": "中",

"start_offset": 0,

"end_offset": 1,

"type": "<IDEOGRAPHIC>",

"position": 0

{

"token": "华",

"start_offset": 1,

"end_offset": 2,

"type": "<IDEOGRAPHIC>",

"position": 1

{

"token": "人",

"start_offset": 2,

"end_offset": 3,

"type": "<IDEOGRAPHIC>",

"position": 2

{

"token": "民",

"start_offset": 3,

"end_offset": 4,

"type": "<IDEOGRAPHIC>",

"position": 3

{

"token": "共",

"start_offset": 4,

"end_offset": 5,

"type": "<IDEOGRAPHIC>",

"position": 4

{

"token": "和",

"start_offset": 5,

"end_offset": 6,

"type": "<IDEOGRAPHIC>",

"position": 5

{

"token": "国",

"start_offset": 6,

"end_offset": 7,

"type": "<IDEOGRAPHIC>",

"position": 6

{

"token": "人",

"start_offset": 7,

"end_offset": 8,

"type": "<IDEOGRAPHIC>",

"position": 7

{

"token": "民",

"start_offset": 8,

"end_offset": 9,

"type": "<IDEOGRAPHIC>",

"position": 8

{

"token": "大",

"start_offset": 9,

"end_offset": 10,

"type": "<IDEOGRAPHIC>",

"position": 9

{

"token": "会",

"start_offset": 10,

"end_offset": 11,

"type": "<IDEOGRAPHIC>",

"position": 10

{

"token": "堂",

"start_offset": 11,

"end_offset": 12,

"type": "<IDEOGRAPHIC>",

"position": 11

}

]

}

3.使用 ik_smart分词

### 请求

GET http://localhost:9200/_analyze

{

"text":"中华人民共和国人民大会堂",

"analyzer":"ik_smart"

}

响应结果：

{

"tokens": [

{

"token": "中华人民共和国",

"start_offset": 0,

"end_offset": 7,

"type": "CN_WORD",

"position": 0

{

"token": "人民大会堂",

"start_offset": 7,

"end_offset": 12,

"type": "CN_WORD",

"position": 1

}

]

}

https://www.jianshu.com/p/e8e6874799f6

https://www.bilibili.com/read/cv17912145/

1.1.3 入库和查询指定分词器

1.创建或者更新文档时，会对文档进行分词，可以指定分词

创建index mapping时指定search_analyzer

es7.x Es常用核心知识快捷版1（分词和text和keyword）

不指定分词时，会使用默认的standard

明确字段是否需要分词，不需要分词的字段将type设置为keyword，可以节省空间和提高写性能。

2.搜索：查询时，对查询语句分词

查询时通过analyzer指定分词器

es7.x Es常用核心知识快捷版1（分词和text和keyword）

es的分词器analyzer_51CTO博客_es分词器

1.2 text与keyword类型

1.2.1 两种类型说明

ES5.0及以后的版本取消了string类型，将原先的string类型拆分为text和keyword两种类型。

1.如果字段是text类型，存入的数据会先进行分词，然后将分完词的词组存入索引，但是text类型的数据不能用来过滤、排序和聚合等操作。

2.keyword则不会进行分词，直接存储。常常被用来过滤、排序和聚合。

3.es自动生成的该字段的mapping是text + keyword(es版本7.9.0)。

1.2.2 字段同时具有keyword和text属性

当直接保存一个字符串字段时，es自动生成的该字段的mapping是text + keyword(es版本7.9.0)。

{

"city_info": {

"mappings": {

"properties": {

"address": {

"type": "text",

"fields": {

"keyword": {

"type": "keyword",

"ignore_above": 256

}

"cityName": {

"type": "text",

"fields": {

"keyword": {

"type": "keyword",

"ignore_above": 256

}

1.创建索引

es7.x Es常用核心知识快捷版1（分词和text和keyword）

2.直接添加文档

es7.x Es常用核心知识快捷版1（分词和text和keyword）

3.查看mapping

es7.x Es常用核心知识快捷版1（分词和text和keyword）

1.2.3 给text类型添加keyword属性

如果在创建index的时候给某个字段指定了类型text，但是之后又想给它追加上keyword，方便以后按完整字符串搜索。可以通过PUT命令实现。

es7.x Es常用核心知识快捷版1（分词和text和keyword）

例子如下：

es7.x Es常用核心知识快捷版1（分词和text和keyword）文章来源地址https://www.toymoban.com/news/detail-490530.html

到了这里，关于es7.x Es常用核心知识快捷版1（分词和text和keyword）的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！

es7.x Es常用核心知识快捷版1（分词和text和keyword）

一分词

1.1 分词

1.1.1 查看分词

1.1.2 几种分词比较

1.1.3 入库和查询指定分词器

1.2 text与keyword类型

1.2.1 两种类型说明

1.2.2 字段同时具有keyword和text属性

1.2.3 给text类型添加keyword属性

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

支付宝扫一扫领取红包，优惠每天领

二维码1

二维码2

es7.x Es常用核心知识快捷版1（分词和text和keyword）

一 分词

1.1 分词

1.1.1 查看分词

1.1.2 几种分词比较

1.1.3 入库和查询指定分词器

1.2 text与keyword类型

1.2.1 两种类型说明

1.2.2 字段同时具有keyword和text属性

1.2.3 给text类型添加keyword属性

相关文章

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

支付宝扫一扫领取红包，优惠每天领

二维码1

二维码2

一分词