Elasticsearch中提供了一个叫N-gram tokenizer的分词器,官方介绍如下
N-gram tokenizer
The ngram
tokenizer first breaks text down into words whenever it encounters one of a list of specified characters, then it emits N-grams of each word of the specified length.
N-grams are like a sliding window that moves across the word - a continuous sequence of characters of the specified length. They are useful for querying languages that don’t use spaces or that have long compound words, like German.
Example output
With the default settings, the ngram
tokenizer treats the initial text as a single token and produces N-grams with minimum length 1
and maximum length 2
:文章来源:https://www.toymoban.com/news/detail-414980.html
POST _analyze
{
"tokenizer": "ngram",
"text": "Quick Fox"
}
The above sentence 文章来源地址https://www.toymoban.com/news/detail-414980.html
到了这里,关于Elasticsearch对数字,英文字母等的分词N-gram tokenizer的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!