elasticsearch
elasticsearch结合kibana、Logstash、Beats,也就是elastic stack (ELK)。被广泛应用在日志数据分析、实时监控等领域。
什么是elasticsearch?
- 一个开源的分布式搜索引擎,可以用来实现搜索、日志统计、分析系统监控等功能
什么是elastic stack (ELK) ?
- 是以elasticsearch为核心的技术栈,包括beats、Logstash、kibana、elasticsearch
什么是Lucene?
- 是Apache的开源搜索引擎类库,提供了搜索引擎的核心API
正向索引跟倒排索引
什么是文档和词条?
- 每一条数据就是一个文档
- 对文档中的内容分词,得到的词语就是词条
什么是正向索引?
- 基于文档id创建索引。查询词条时必须先找到文档,而后判断是否包含词条
什么是倒排索引?
- 对文档内容分词,对词条创建索引,并记录词条所在文档的信息。查询时先根据词条查询到文档id,而后获取到文档
ES/Mysql区别
分词器
ik分词器
详情可见
POST /_analyze
{
"text": ["马化腾是一个人啊,奥力给啊额!"],
"analyzer": "ik_max_word"
}
pinyin分词器
配置地址:https://github.com/medcl/elasticsearch-analysis-pinyin
POST /_analyze
{
"text": ["马化腾是一个人啊,奥力给啊额!"],
"analyzer": "pinyin"
}
自定义分词器
es中分词器的组成包含三部分:
- character filters:在tokenizer之前对文本进行处理。例如:删除字符,替代字符
- tokenizer:将文本按照一定的规则切割成词条 (term)。例如:keyword,就是不分词;还有ik_smart
- tokenizer filter:将tokenizer输出的词条做进一步处理。例如:大小写转换、同义词处理、拼音处理等
在创建索引库的时候,通过settings
来配置自定义的分词器
-
settings
:索引库配置
settins可以指定三部分:
- character filter:特殊字符分词器
- tokenizer:分词器
- filter:拼音分词器
不规定一定要都使用,视情况而定
mapping要指定 创建索引库的分词器 和 搜索分词器
- “analyzer”: “myAnalyzer”,
- “search_analyzer”: “ik_max_word”
为什么要分开指定?
因为拼音分词器在创建索引库的时候使用,比如下面的狮子,柿子。创建的时候分为:shizi,sz,狮子跟柿子,因为使用了拼音分词器所以狮子跟柿子都有shizi,sz。用户在搜索的时候如果使用了拼音分词器:搜索=shizi,就会根据shizi在索引库里找,找到柿子跟狮子。所以搜索的时候就不能带着拼音分词器,应该使用ik分词器,通过ik分词器去索引库里根据拼音分词器查找
#自定义分词器
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"myAnalyzer":{
"tokenizer":"ik_max_word",
"filter": "py" //指定拼音分词器的名称
}
},
//拼音分词器名称
"filter": {
"py":{
"type": "pinyin", //类型
"keep_full_pinyin": false,//当启用这个选项,如: 刘德华 >[ liu , de , hua ),默认值:真的
"keep_joined_full_pinyin": true,//当启用此选项时,例如: 刘德华 >[ liudehua ],默认:false
"keep_original": true,//当启用此选项时,将保留原始输入,默认值:false
"limit_first_letter_length": 16,//set first_letter结果的最大长度,默认值:16
"remove_duplicated_term": true,//当此选项启用时,重复项将被删除以保存索引,例如: de的 > de ,默认:false,注:职位相关查询可能会受到影响
"none_chinese_pinyin_tokenize" :false //非中国字母分解成单独的拼音词如果拼音,默认值:true,如:liu , de , hua , a , li , ba , ba , 13 , zhuang , han ,注意: keep_none_chinese 和 keep_none_chinese_together 应该启用
}
}
}
},
"mappings": {
"properties": {
"name":{
"type": "text",
"analyzer": "myAnalyzer",
"search_analyzer": "ik_max_word"
}
}
}
}
索引库
Mapping属性
# 创建索引库
PUT /firsttable
{
"mappings": {
"properties": {
"info": {
"type": "text",
"analyzer": "ik_max_word"
},
"age": {
"type": "integer"
},
"Weight": {
"type": "double"
},
"isMarried": {
"type": "boolean"
},
"email": {
"type": "keyword",
"index": false
},
"score": {
"type": "double"
},
"name": {
"type": "object",
"properties": {
"firstName": {
"type": "keyword"
},
"lastName": {
"type": "keyword"
}
}
}
}
}
}
# 查询索引库
GET /firsttable
# 修改索引库,不能改只能增加
PUT /firsttable/_mapping
{
"properties":{
"age2":{
"type": "double"
}
}
}
# 删除
DELETE /firsttable
文档
# 新增文档
POST /firsttable/_doc/1
{
"info": "未婚男性",
"age": "20",
"Weight": "21.3",
"isMarried": false,
"email": "213@qq.com",
"score": "21.2",
"name": {
"firstName": "张",
"lastName": "三"
}
}
#查询文档
GET /firsttable/_doc/1
#删除文档
DELETE /firsttable/_doc/1
#修改文档
#1.全量修改,会删除旧文档,添加新文档
PUT /firsttable/_doc/1
{
"info": "未婚男性222",
"age": "20",
"Weight": "21.3",
"isMarried": false,
"email": "213@qq.com",
"score": "21.2",
"name": {
"firstName": "张",
"lastName": "三"
}
}
#2.局部修改
POST /firsttable/_update/1
{
"doc": {
"info": "未婚男性333"
}
}
RestClient操作
DSL语句
#hotel
PUT /hotel
{
"mappings":{
"properties":{
"id":{
"type": "keyword"
},
"name":{
"type": "text",
"analyzer": "ik_max_word"
},
"address":{
"type": "keyword",
"index": false,
"copy_to": "{all}"
},
"price":{
"type": "double"
},
"score":{
"type": "integer"
},
"brand":{
"type": "keyword",
"copy_to": "{all}"
},
"city":{
"type": "keyword",
"copy_to": "{all}"
},
"starName":{
"type": "keyword"
},
"business":{
"type": "keyword"
},
"location":{
"type": "geo_point"
},
"pic":{
"type": "keyword"
},
"all":{
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
- 引入依赖
<properties>
<java.version>1.8</java.version>
<elasticsearch.version>7.12.1</elasticsearch.version>
</properties>
<!-- es的javaRestLeveClient依赖-->
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>
- 初始化ResthighLevelClient
@SpringBootTest
class HotelDemoApplicationTests {
private RestHighLevelClient client;
@Test
void contextLoads() {
System.out.println(client);
}
@BeforeEach
void setUp(){
this.client = new RestHighLevelClient(
RestClient.builder(
HttpHost.create("http://192.168.163.129:9200")));
}
@AfterEach
void clear() throws IOException {
this.client.close();
}
}
索引库操作
创建索引库
@Test
void contextLoads() throws IOException {
// 1.创建request对象
CreateIndexRequest request = new CreateIndexRequest("hotel");
// 2.准备dsl语句,MAPPING_HOTEL是String类型的创建hotel的Dsl语句
request.source(MAPPING_HOTEL,XContentType.JSON);
// 3.发送请求,indices拿到的是操作索引库的所有方法:put del post get
client.indices().create(request,RequestOptions.DEFAULT);
}
删除索引库
@Test
public void testDel() throws IOException {
DeleteIndexRequest hotel = new DeleteIndexRequest("hotel");
client.indices().delete(hotel,RequestOptions.DEFAULT);
}
判断索引库是否存在
@Test
public void testExists() throws IOException {
GetIndexRequest hotel = new GetIndexRequest("hotel");
System.out.println(client.indices().exists(hotel, RequestOptions.DEFAULT));
}
文档操作
新增文档
@Test
public void testAddData() throws IOException {
// 从数据库里查出数据
Hotel hotel = hotelService.getById(61083L);
// 转化成索引库的结构
HotelDoc hotelDoc = new HotelDoc(hotel);
// 封装Dsl语句,根据索引库名称跟id新增文档
IndexRequest request = new IndexRequest("hotel").id(hotelDoc.getId().toString());
// 文档数据,JSON数据
request.source(JSON.toJSONString(hotelDoc), XContentType.JSON);
client.index(request, RequestOptions.DEFAULT);
}
查询文档
@Test
public void testGet() throws IOException {
GetRequest request = new GetRequest("hotel").id("61083");
GetResponse response = client.get(request, RequestOptions.DEFAULT);
String jsonStr = response.getSourceAsString();
HotelDoc hotelDoc = JSON.parseObject(jsonStr, HotelDoc.class);
System.out.println(hotelDoc);
}
删除文档
@Test
public void testDel() throws IOException {
DeleteRequest request = new DeleteRequest("hotel").id("61083");
DeleteResponse response = client.delete(request, RequestOptions.DEFAULT);
System.out.println(response.status());
}
修改文档
@Test
public void testUpdate() throws IOException {
UpdateRequest request = new UpdateRequest("hotel","61083");
request.doc(
"score", "18",
"city", "东莞"
);
UpdateResponse response = client.update(request, RequestOptions.DEFAULT);
System.out.println(response.status());
}
批量新增文档
@Test
public void testBulk() throws IOException {
QueryWrapper<Hotel> wrapper = new QueryWrapper<>();
// wrapper.last("limit 5");
List<Hotel> list = hotelService.list(wrapper);
BulkRequest request = new BulkRequest("hotel");
for (Hotel item: list){
HotelDoc hotelDoc = new HotelDoc(item);
request.add(
new IndexRequest("hotel")
.id(item.getId().toString())
.source(JSON.toJSONString(hotelDoc),XContentType.JSON));
}
client.bulk(request,RequestOptions.DEFAULT);
}
DSL查询
查询所有:查询出所有数据,一般测试用。例如:match_all
全文检索(fulltext)查询:利用分词器对用户输入内容分词,然后去倒排索引库中匹配。例如:
match_query
multi_match_query
精确查询:根据精确词条值查找数据,一般是查找keyword、数值、日期、boolean等类型字段。例如:
ids
range,根据值的范围查询
term,根据词条精确值查询
地理 (geo)查询::根据经纬度查询。例如:
geo_distance
geo_bounding_box
复合(compound)查询::复合查询可以将上述各种查询条件组合起来,合并查询条件。例如:
bool
function_score
查询所有
GET /hotel/_search
{
"explain":true,# 查看分片所在位置
"query": {
"查询类型": {
"查询条件": "条件值"
}
}
}
//查询所有
GET /hotel/_search
{
"query": {
"match_all": {}
}
}
全文检索
# match查询
GET /hotel/_search
{
"query": {
"match": {
"all": "上海如家"
}
}
}
# multi_match查询,跟match查询是有一点区别,match是匹配一个字段,但是multi_match是拿值去匹配规定的字段,如果match的all刚好是multi_match规定的字段,那这个时候match跟multi_match就是一样的
GET /hotel/_search
{
"query": {
"multi_match": {
"query": "上海如家",
"fields": ["brand", "name", "address"]
}
}
}
精确查询
# term 精确查询,根据词条精确值查询
GET /hotel/_search
{
"query": {
"term": {
"city": {
"value": "上海"
}
}
}
}
# range查询,根据值的范围查询
GET /hotel/_search
{
"query": {
"range": {
"price": {
"gte": 100,
"lte": 200
}
}
}
}
地理查询
# distance查询,根据坐标距离查询
GET /hotel/_search
{
"query": {
"geo_distance": {
"distance": "3km",
"location": "31.21, 121.5"
}
}
}
# box查询,根据提供的坐标作矩阵查询
GET /hotel/_search
{
"query": {
"geo_bounding_box": {
"location":{
"top_left": {
"lat": 31.3,
"lon": 121.5
},
"bottom_right": {
"lat": 30.3,
"lon": 121.7
}
}
}
}
}
复合查询
# function_score,查询city=上海,定义brand=如家 的酒店权重=10,将查询结果中匹配到的如家酒店的得分*10,其他酒店不变,而显示的时候是根据得分排序的,所以如家酒店的排名就在前面
GET /hotel/_search
{
"query": {
"function_score": {
"query": {
"match": {
"city": "上海"
}
},
"functions": [
{
"filter": {
"term": {"brand": "如家"}
},
"weight":10
}
],
"boost_mode": "multiply"
}
}
}
boold查询的逻辑关系:
- must:必须匹配的条件,可以理解为 ”与“
- should:选择性匹配的条件,可以理解为 ”或“
- must_not:必须不匹配的条件,不参与打分,可以理解为 ”非“
- filter:必须匹配的条件,不参与打分
# bool查询,查询名字是如家,价格低于400,距离31.21,121.5周围10km以内的酒店
# filter,must_not放在match外面是不参与算分的,只有放在match里面才会参与算分,但是参与算分性能会下降
GET /hotel/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "如家"
}
}
],
"must_not": [
{
"range": {
"price": {
"gt":400
}
}
}
],
"filter": [
{
"geo_distance": {
"distance": "10km",
"location": {
"lat": 31.21,
"lon": 121.5
}
}
}
]
}
}
}
相关性算法
排序
一旦开启了排序就不会再打分了
# sort排序查询,查询brand=如家,按照得分降序,得分一样按价格升序
GET /hotel/_search
{
"query": {
"match": {
"brand": "如家"
}
},
"sort": [
{
"score": {
"order": "desc"
},
"price": {
"order": "asc"
}
}
]
}
# sort查询,查询坐标附近的酒店按照升序排序,显示单位为km
GET /hotel/_search
{
"query": {
"match": {
"brand": "如家"
}
},
"sort": [
{
"_geo_distance": {
"location": {
"lat": 31.240417 ,
"lon": 121.503134
},
"order": "asc",
"unit": "km"
}
}
]
}
分页
ES默认只返回top10的数据,想要查询到更多数据就需要修改分页参数了。
ES通过修改from,size参数来控制要返回的分页结果
ES受限于倒排索引,每次分页查询都是查出全部数据,然后截取数据,比如查询990-1000的数据,就需要查询出1000条数据,截取出最后10条数据
ES是支持分布式的,为了尽可能多的存储数据肯定会采用分布式ES,而每个分片都会有自己的数据,那么如果使用分页查询990-1000的数据咋办,是不是要每个分片都查询自己的前1000条数据,那如何判断哪些数据是拿来用的?比如10个分片每个分片查询1000条数据,取后10条数据,那也有100条数据,怎么办?实际上ES会将这十个分片的总记录合并起来,即1w条记录数,重新排序1000条数据,取990-1000
# 分页查询
# sort查询
GET /hotel/_search
{
"query": {
"match": {
"brand": "如家"
}
},
"sort": [
{
"score": {
"order": "desc"
},
"price": {
"order": "asc"
}
}
],
"from": 0,
"size": 2
}
深度分页问题
深度分页解决方案
建议使用search after:
- 优点:没有查询上限(单词查询的size不超过1w)
- 缺点:只能向后主页查询,不支持随机翻页
- 场景:没有随机翻页需求的搜索,例如:手机向下滚动翻页
高亮显示
这里查询的是all,而all是由字段copy_to来的,但是fields中高亮字段的是name,ES默认采用的是查询字段跟高亮字段一致,可以使用require_field_match
修改配置
# 高亮查询
GET /hotel/_search
{
"query": {
"match": {
"all": "上海如家"
}
},
"highlight": {
"fields": {
"name": {
"require_field_match": "false"
}
}
}
}
RestClient查询操作
查询所有matchAll
@Test
public void testMatchALl() throws IOException {
SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.matchAllQuery());
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 解析响应结果文档,获取hits
SearchHits searchHits = response.getHits();
// 获取记录总条数
long value = searchHits.getTotalHits().value;
System.err.println("<=====共有条"+value+"数据====>");
// 获取hits里的文档数组
SearchHit[] hits = searchHits.getHits();
for (SearchHit hit : hits) {
String jsonStr = hit.getSourceAsString();
HotelDoc hotelDoc = JSONObject.parseObject(jsonStr, HotelDoc.class);
System.err.println("hotelDoc---> " + hotelDoc);
}
}
//查询所有
GET /hotel/_search
{
"query": {
"match_all": {}
}
}
全文检索
/**
* 全文检索match
* @throws IOException
*/
@Test
public void testMatch() throws IOException {
SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.matchQuery("all","上海如家"));
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}
GET /hotel/_search
{
"query": {
"match": {
"all": "上海如家"
}
}
}
/**
* 全文检索multiMatch
* @throws IOException
*/
@Test
public void testMultiMatch() throws IOException {
SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.multiMatchQuery("上海如家","brand","name","address"));
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}
GET /hotel/_search
{
"query": {
"multi_match": {
"query": "上海如家",
"fields": ["brand", "name", "address"]
}
}
}
精确查询
/**
* 精确查询term
* @throws IOException
*/
@Test
public void testTerm() throws IOException {
SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.termQuery("city","上海"));
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}
GET /hotel/_search
{
"query": {
"term": {
"city": {
"value": "上海"
}
}
}
}
/**
* 范围查询range
* @throws IOException
*/
@Test
public void testRange() throws IOException {
SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.rangeQuery("price").gte(100).lte(200));
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}
GET /hotel/_search
{
"query": {
"range": {
"price": {
"gte": 100,
"lte": 200
}
}
}
}
地理查询
/**
* 地理查询Distance
* @throws IOException
*/
@Test
public void testDistance() throws IOException {
SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.geoDistanceQuery("location").distance("3km").point(31.21,121.5));
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}
# distance查询,根据坐标距离查询
GET /hotel/_search
{
"query": {
"geo_distance": {
"distance": "3km",
"location": "31.21, 121.5"
}
}
}
复合查询
/**
* 组合查询bool
* @throws IOException
*/
@Test
public void testBool() throws IOException {
SearchRequest request = new SearchRequest("hotel");
// 准备DSL
// 准备BoolQueryBuilder
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
// 添加must
boolQuery.must(QueryBuilders.matchQuery("name","如家"));
// 添加mustNot
boolQuery.mustNot(QueryBuilders.rangeQuery("price").gt("400"));
// 添加filter
boolQuery.filter(QueryBuilders.geoDistanceQuery("location").distance("10km").point( 31.21,121.5));
request.source().query(boolQuery);
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}
GET /hotel/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "如家"
}
}
],
"must_not": [
{
"range": {
"price": {
"gt":400
}
}
}
],
"filter": [
{
"geo_distance": {
"distance": "10km",
"location": {
"lat": 31.21,
"lon": 121.5
}
}
}
]
}
}
}
/**
* 复合查询FunctionScore
* @throws IOException
*/
@Test
public void testFunctionScore() throws IOException {
SearchRequest request = new SearchRequest("hotel");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// 创建match语句
QueryBuilder queryBuilder = QueryBuilders.matchQuery("city", "上海");
// 创建function语句
FunctionScoreQueryBuilder.FilterFunctionBuilder[] filterFunctionBuilders = {
new FunctionScoreQueryBuilder.FilterFunctionBuilder(
QueryBuilders.termQuery("brand", "如家"),
new WeightBuilder().setWeight(10)
)
};
// 把function跟query放到一个functionScoreQuery里
FunctionScoreQueryBuilder functionScoreQueryBuilder = QueryBuilders.functionScoreQuery(queryBuilder, filterFunctionBuilders);
searchSourceBuilder.query(functionScoreQueryBuilder);
request.source(searchSourceBuilder);
SearchResponse response = client.search(request,RequestOptions.DEFAULT);
}
GET /hotel/_search
{
"query": {
"function_score": {
"query": {
"match": {
"city": "上海"
}
},
"functions": [
{
"filter": {
"term": {"brand": "如家"}
},
"weight":10
}
],
"boost_mode": "multiply"
}
}
}
排序
/**
* 排序sort和分页
* @throws IOException
*/
@Test
public void testSort() throws IOException {
SearchRequest request = new SearchRequest("hotel");
MatchQueryBuilder query = QueryBuilders.matchQuery("brand", "如家");
// 两个排序
FieldSortBuilder score = SortBuilders.fieldSort("score").order(SortOrder.DESC);
FieldSortBuilder price = SortBuilders.fieldSort("price").order(SortOrder.ASC);
// 把两个排序放到一个sort里
List<SortBuilder<?>> builders = new ArrayList<>();
builders.add(score);
builders.add(price);
request.source().sort(builders);
request.source().query(query);
request.source().from(0);
request.source().size(2);
SearchResponse response = client.search(request,RequestOptions.DEFAULT);
SearchHits searchHits = response.getHits();
System.out.println(searchHits.getTotalHits());
}
GET /hotel/_search
{
"query": {
"match": {
"brand": "如家"
}
},
"sort": [
{
"score": {
"order": "desc"
},
"price": {
"order": "asc"
}
}
],
"from": 0,
"size": 2
}
高亮
/**
* 高亮
* @throws IOException
*/
@Test
public void testHighLight() throws IOException {
SearchRequest request = new SearchRequest("hotel");
MatchQueryBuilder query = QueryBuilders.matchQuery("all", "如家");
HighlightBuilder highlightBuilder = new HighlightBuilder().field("name").requireFieldMatch(false);
request.source().highlighter(highlightBuilder);
request.source().query(query);
SearchResponse response = client.search(request,RequestOptions.DEFAULT);
SearchHits searchHits = response.getHits();
System.out.println(searchHits.getTotalHits());
SearchHit[] hits = searchHits.getHits();
for (SearchHit hit : hits) {
String jsonStr = hit.getSourceAsString();
HotelDoc hotelDoc = JSONObject.parseObject(jsonStr, HotelDoc.class);
Map<String, HighlightField> highlightFields = hit.getHighlightFields();
HighlightField highlightField = highlightFields.get("name");
if (highlightField!=null){
String name = highlightField.getFragments()[0].string();
hotelDoc.setName(name);
}
System.out.println(hotelDoc);
}
}
GET /hotel/_search
{
"query": {
"match": {
"all": "上海如家"
}
},
"highlight": {
"fields": {
"name": {
"require_field_match": "false"
}
}
}
}
数据聚合
聚合可以实现对文档数据的统计,分析,运算,常见聚合有:
- 桶(Bucket)聚合:用来对文档做分组
- TermAggregation:按照文档字段子分组
- Date Histogram:按照日期接替分组,例如:一周一组,一月一组
- 度量(Metric)聚合:计算值
- AVG:求平均值
- Max:求最大值
- Min:求最小值
- Stats:同时求:max,min,avg,sum等
- 管道(pipeline)聚合:其他聚合的结果为基础做聚合
参加聚合的字段类型必须是不能分词的:keyword,数值,日志,布尔
Bucket桶
默认情况下,Bucket聚合会统计Bucket内的文档数量,记为:_count
,并且按照_count
降序排序
默认情况下,Bucket聚合是对索引库的所有文档做聚合,可以限定要聚合的文档范围,只要添加query条件即可
聚合三要素:
- 聚合名称
- 聚合类型
- 聚合字段
聚合配置属性:
- size:聚合结果数量
- order:聚合结果排序方式
- field:聚合字段
# bucket聚合
GET /hotel/_search
{
#限制聚合文档的范围
"query": {
"range": {
"price": {
"gte": 200,
"lte": 1000
}
}
},
"size": 1,
"aggs": {
"demo": {
"terms": {
"field": "brand",
# 修改排序方式
"order": {
"_count": "asc"
},
"size": 20
}
}
}
}
Metrics聚合
# Metrics聚合
GET /hotel/_search
{
"size": 0,
"aggs": {
#主聚合,聚合名称:demo,聚合类型是terms,聚合字段是brand,按照子聚合metricsAgg.avg的结果降序排序,显示20个结果
"demo": {
"terms": {
"field": "brand",
"order": {
"metricsAgg.avg": "desc"
}
"size": 20
},
#子聚合,在上面的聚合结果基础上,继续聚合,聚合名称是metricsAgg,聚合类型是stats,对score字段聚合
#求每个品牌的得分情况,min/max/avg/sum
"aggs": {
"metricsAgg": {
"stats": {
"field": "score"
}
}
}
}
}
}
自动补全
拼音分词
elasticsearch提供了ompletion Suggester查询来实现自动补全功能。这个查询会匹配以用户输入内容开头的词条并返回。为了提高补全查询的效率,对于文档中字段的类型有一些约束:
- 参与补全查询的字段必须是completion类型
#创建索引库
PUT /test2
{
"mappings":{
"properties": {
"title": {
"type": "completion"
}
}
}
}
POST /test2/_doc/1
{
"title":["Sony","WH1000"],
"id":1
}
POST /test2/_doc/2
{
"title":["SKny","PH1000"],
"id":1
}
POST /test2/_doc/3
{
"title":["Nony","sH1000"],
"id":1
}
#自动补全
GET /test2/_search
{
"suggest": {
"mySuggest": {
"text": "so",
"completion": {
"field": "title",
"skip_duplicates": true,
"size":10
}
}
}
}
#hotel
PUT /hotel
{
"mappings":{
"properties":{
"id":{
"type": "keyword"
},
"address":{
"type": "keyword",
"copy_to": "all"
},
"price":{
"type": "double"
},
"score":{
"type": "integer"
},
"brand":{
"type": "keyword",
"copy_to": "all"
},
"city":{
"type": "keyword",
"copy_to": "all"
},
"starName":{
"type": "keyword"
},
"business":{
"type": "keyword"
},
"location":{
"type": "geo_point"
},
"pic":{
"type": "keyword"
},
"name":{
"type": "text",
"analyzer": "text_analyzere",
"search_analyzer": "ik_smart",
"copy_to": "all"
},
#all是搜索字段,添加文档的时候采用text_analyzere,最大粒度分词和拼音分词,搜索的时候就采用最大粒度搜索,根据用户的输入逐个拆分
"all":{
"type": "text",
"analyzer": "text_analyzere",
"search_analyzer": "ik_max_word"
},
#额外添加的字段,用来专门处理自动补全的,类型是completion,在新增文档的时候,从数据库中查询的数据,就已经把需要的数据放到suggestion这个字段里了,是个数组
"suggestion":{
"type": "completion",
"analyzer": "completion_analyzere"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"text_analyzere":{
"tokenizer":"ik_max_word",
"filter":"py"
},
"completion_analyzere":{
"tokenizer":"keyword",
"filter":"py"
}
},
"filter": {
"py":{
"type": "pinyin",
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"remove_duplicated_term": true,
"none_chinese_pinyin_tokenize" :false
}
}
}
}
}
RestClient操作
/**
* 自动补全查询
*/
@Test
public void testSuggestion() {
try {
SearchRequest request = new SearchRequest("hotel");
request.source().suggest(new SuggestBuilder()
.addSuggestion("mySuggestion",
SuggestBuilders
.completionSuggestion("suggestion")
.prefix("s")
.skipDuplicates(true)
.size(10)));
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
CompletionSuggestion mySuggestion = response.getSuggest().getSuggestion("mySuggestion");
List<CompletionSuggestion.Entry.Option> list = mySuggestion.getOptions();
for (CompletionSuggestion.Entry.Option option : list) {
System.err.println(option.getText().string());
}
} catch (IOException e) {
System.out.println(e);
}
}
@Data
@NoArgsConstructor
public class HotelDoc {
省略
private Object distance;
private Boolean isAD;
private List<String> suggestion;
public HotelDoc(Hotel hotel) {
省略
if (this.business.contains("、")){
String[] arr = this.business.split("、");
this.suggestion = new ArrayList<>();
this.suggestion.add(this.brand);
Collections.addAll(this.suggestion,arr);
}else if (this.business.contains("/")){
String[] arr = this.business.split("/");
this.suggestion = new ArrayList<>();
this.suggestion.add(this.brand);
Collections.addAll(this.suggestion,arr);
}else {
this.suggestion = Arrays.asList(this.brand,this.business);
}
}
}
数据同步
elasticsearch中的酒店数据来自于mysql数据库,因此mysql数据发生改变时,elasticsearch也必须跟着改变,这个就是elasticsearch与mysql之间的数据同步
异步通知
监听binlog
同步调用:
- 优点:实现简单,粗暴
- 缺点:业务耦合度高
异步通知:
- 优点:低耦合,实现难度一般
- 缺点:依赖mg的可靠性
监听binlog:
- 优点:完全解除服务间耦合
- 缺点:开启binlog增加数据库负担、实现复杂度高
ES集群
ES集群脑裂
master eligible节点的作用是什么?
- 参与集群选主
- 主节点可以管理集群状态、管理分片信息、处理创建和删除索引库的请求
data节点的作用是什么?
- 数据的CRUD
coordinator节点的作用是什么?
- 路由请求到其它节点
- 合并查询到的结果,返回给用户
ES集群的分布式存储
ES集群的分布式查询
分布式新增如何确定分片?
- coordinating node根据id做hash运算,得到结果对shard数量取余,余数就是对应的分片
分布式查询文章来源:https://www.toymoban.com/news/detail-410355.html
- 分散阶段:coordinating node将查询请求分发给不同分片
- 收集阶段:将查询结果汇总到coordinating node ,整理并返回给用户
故障转移
文章来源地址https://www.toymoban.com/news/detail-410355.html
到了这里,关于Elasticsearch从结构到集群一站式学习的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!