Elasticsearch(八)搜索---搜索辅助功能（上）--指定搜索返回字段，结果计数和分页

这篇具有很好参考价值的文章主要介绍了Elasticsearch(八)搜索---搜索辅助功能（上）--指定搜索返回字段，结果计数和分页。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

一、前言

前面我们已经将ES的基础操作（索引，映射，文档）学习过了，从这一章开始，我们便开始学习ES的最大的功能—搜索
ES为用户提供了丰富的搜索功能：既有基本的搜索功能，又有搜索建议功能；既有常用的普通类型的匹配功能，又有基于地理位置的搜索功能；既提供了分页搜索功能，又提供了搜索的调试分析功能等等。这些都会在这一大章中学习到。但是考虑到搜索涉及到的章节确实非常多，于是我仍然像之前基础操作一样，拆解成一些章节供大家更容易吸收学习
那么这一节我们主要学习ES的搜索辅助功能。例如，为优化搜索功能，需要指定搜索的一部分字段内容。为了更好地呈现结果，需要用到结果计数和分页功能；当遇到性能瓶颈时，需要剖析搜索各个环节的耗时；面对不符合预期的搜索结果时，需要分析各个文档的评分细节。

二、指定搜索返回字段

考虑到性能问题，需要对搜索结果进行“瘦身”----指定返回搜索字段。在ES中，通过_source子句可以设定返回结果的字段。_source指向一个JSON数组，数组中的元素是希望返回的字段名称。
在此之前，为了后面的学习，我们需要将hotel的索引彻底换一下，这里推荐大家先删除hotel索引，然后重新建立Hotel索引及映射关系，然后通过bulk批量插入值：
删除hotel索引后定义hotel索引的结构DSL如下：

DELETE /hotel

PUT /hotel
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "city": {
        "type": "keyword"
      },
      "price": {
        "type": "double"
      },
      "create_time": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      },
      "amenities": {
        "type": "text"
      },
      "full_room": {
        "type": "boolean"
      },
      "location": {
        "type": "geo_point"
      },
      "praise": {
        "type": "integer"
      }
    }
  }
}

然后在索引中批量新增如下数据：

POST /_bulk
{"index":{"_index":"hotel","_id":"001"}}
{"title":"文雅酒店","city":"北京","price":"558.00","create_time":"2020-03-29 21:00:00","amenities":"浴池,普通停车场/充电停车场","full_room":true,"location":{"lat":36.940243,"lon":120.39400},"praise":10}
{"index":{"_index":"hotel","_id":"002"}}
{"title":"京盛酒店","city":"北京","price":"337.00","create_time":"2020-07-29 13:00:00","amenities":"充电停车场/可升降停车场","full_room":false,"location":{"lat":39.911543,"lon":116.4030},"praise":60}
{"index":{"_index":"hotel","_id":"003"}}
{"title":"文雅文化酒店","city":"天津","price":"260.00","create_time":"2021-02-27 22:00:00","amenities":"提供假日party,免费早餐,浴池，充电停车场","full_room":true,"location":{"lat":39.186555,"lon":117.162767},"praise":30}
{"index":{"_index":"hotel","_id":"004"}}
{"title":"京盛集团酒店","city":"上海","price":"800.00","create_time":"2021-05-29 21:35:00","amenities":"浴池(假日需预订),室内游泳池,普通停车场/充电停车场","full_room":true,"location":{"lat":36.940243,"lon":120.39400},"praise":100}
{"index":{"_index":"hotel","_id":"005"}}
{"title":"京盛精选酒店","city":"南昌","price":"300.00","create_time":"2021-07-29 22:50:00","amenities":"室内游泳池,普通停车场","full_room":false,"location":{"lat":39.918229,"lon":116.422011},"praise":20}

下面的DSL指定搜索结果只返回title和city字段：

GET /hotel/_search
{
  "_source": ["title","city"],
  "query": {
    "term": {
      "city": {
        "value": "北京"
      }
    }
  }
}

执行上述DSL后，搜索结果如下：
Elasticsearch(八)搜索---搜索辅助功能（上）--指定搜索返回字段，结果计数和分页
在上述搜索结果中，每个命中文档的_source结构体中只包含指定的city和title两个字段的数据。
在Java客户端中，通过调用searchSourceBuilder.fetchSource()方法可以设定搜索返回的字段，该方法接收两个参数，即需要的字段数组和不需要的字段数组。
我们先在service创建一个搜索接口，并且设定只返回title,city两个字段：

public List<Hotel> queryBySource(HotelDocRequest hotelDocRequest) throws IOException {
		String indexName = hotelDocRequest.getIndexName();
		if (CharSequenceUtil.isBlank(indexName)) {
			throw new SearchException("索引名不能为空");
		}
		Hotel hotel = hotelDocRequest.getHotel();
		if (ObjectUtil.isEmpty(hotel)) {
			throw new SearchException("搜索条件不能为空");
		}
		SearchRequest searchRequest = new SearchRequest(indexName);
		String city = hotel.getCity();
		//创建搜索builder
		SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
		//构建query
		searchSourceBuilder.query(new TermQueryBuilder("city",city));
		//设定希望返回的字段数组
		searchSourceBuilder.fetchSource(new String[]{"title","city"},null);
		searchRequest.source(searchSourceBuilder);
		ArrayList<Hotel> resultList = new ArrayList<>();
		SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
		RestStatus status = searchResponse.status();
		if (status != RestStatus.OK) {
			return Collections.emptyList();
		}
		SearchHits searchHits = searchResponse.getHits();
		for (SearchHit searchHit : searchHits) {
			Hotel hotelResult = new Hotel();
			hotelResult.setId(searchHit.getId());   //文档_id
			hotelResult.setIndex(searchHit.getIndex());   //索引名称
			hotelResult.setScore(searchHit.getScore());   //文档得分
			//转换为Map
			Map<String, Object> dataMap = searchHit.getSourceAsMap();
			hotelResult.setTitle((String) dataMap.get("title"));
			hotelResult.setCity((String) dataMap.get("city"));
			resultList.add(hotelResult);
		}
		return resultList;
	}

然后在controller中调用service接口：

@PostMapping("/query/source")
	public FoundationResponse<String> queryHotelsBySource(@RequestBody HotelDocRequest hotelDocRequest) {
		try {
			List<Hotel> hotelList = esQueryService.queryBySource(hotelDocRequest);
			if (CollUtil.isNotEmpty(hotelList)) {
				return FoundationResponse.success(hotelList.toString());
			} else {
				return FoundationResponse.success("no data");
			}
		} catch (IOException e) {
			log.warn("搜索发生异常，原因为:{}", e.getMessage());
			return FoundationResponse.error(100, e.getMessage());
		} catch (Exception e) {
			log.error("服务发生异常，原因为:{}", e.getMessage());
			return FoundationResponse.error(100, e.getMessage());
		}
	}

postman调用该接口：
Elasticsearch(八)搜索---搜索辅助功能（上）--指定搜索返回字段，结果计数和分页

三、结果计数

为提升搜索体验，需要给前段传递搜索匹配结果的文档条数，即需要对搜索结果进行计数。针对这个要求，ES提供了_count功能，在该API中，用户提供query子句用于结果匹配，而ES会返回匹配的文档条数。类似于RDBMS中的SELECT COUNT(*) FROM XXX WHERE XXX…
下面的DSL将返回城市为"北京"的酒店条数：

GET /hotel/_count
{
  "query": {   //计数的查询条件
    "match": {
        "city": "北京"
    }
  }
}

执行上述DSL后，返回的信息如下：
Elasticsearch(八)搜索---搜索辅助功能（上）--指定搜索返回字段，结果计数和分页
由结果可知，ES不仅返回了匹配的文档数量（值为2），并且还返回了和分片相关的元数据，如总共扫描的分片个数，以及成功、失败、跳过的分片个数等。
在Java客户端中，通过CountRequest执行_count API,然后调用CountRequest对象的source()方法设置查询逻辑。countRequest.source()方法返回CountResponse对象，通过countResponse.getCount()方法可以得到匹配的文档条数。
我们首先在service层创建根据城市获取搜索条数的API：

	public long getCityCount(HotelDocRequest hotelDocRequest) throws IOException {
		String indexName = hotelDocRequest.getIndexName();
		if (CharSequenceUtil.isBlank(indexName)) {
			throw new SearchException("索引名不能为空");
		}
		Hotel hotel = hotelDocRequest.getHotel();
		if (ObjectUtil.isEmpty(hotel)) {
			throw new SearchException("搜索条件不能为空");
		}
		//客户端的count请求
		CountRequest countRequest = new CountRequest(indexName);
		String city = hotel.getCity();
		//创建搜索builder
		SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
		//构建query
		searchSourceBuilder.query(new TermQueryBuilder("city",city));
		countRequest.source(searchSourceBuilder); //设置查询
		CountResponse countResponse = client.count(countRequest, RequestOptions.DEFAULT);
		return countResponse.getCount();
	}

然后controller调用service:

@PostMapping("/query/count")
	public FoundationResponse<Long> queryCount(@RequestBody HotelDocRequest hotelDocRequest) {
		try {
			Long count = esQueryService.getCityCount(hotelDocRequest);
			return FoundationResponse.success(count);
		} catch (IOException e) {
			log.warn("搜索发生异常，原因为:{}", e.getMessage());
			return FoundationResponse.error(100, e.getMessage());
		} catch (Exception e) {
			log.error("服务发生异常，原因为:{}", e.getMessage());
			return FoundationResponse.error(100, e.getMessage());
		}
	}

postman调用该接口：
Elasticsearch(八)搜索---搜索辅助功能（上）--指定搜索返回字段，结果计数和分页

四、结果分页

在实际的搜索应用中，分页是必不可少的功能。在默认情况下，ES返回前10个搜索匹配的文档。用户可以通过设置from和size来定义搜索位置和每页显示的文档数量，from表示查询结果的起始下标，默认值为0，size表示从起始下标开始返回的文档个数，默认值为10.下面的DSL将返回下标从0开始的20个结果：

GET /hotel/_search
{
  "_source": ["title","city"],
  "from": 0,   //设置搜索的起始位置
  "size": 20,  //设置搜索返回的文档个数
  "query": {   //搜索条件
    "term": {
      "city": {
        "value": "北京"
      }
    }
  }
}

在默认情况下，用户最多可以取得10000个文档，即from为0时，size参数最大为10000，如果该请求超过该值，ES返回如下报错信息：
Elasticsearch(八)搜索---搜索辅助功能（上）--指定搜索返回字段，结果计数和分页
对于普通的搜索应用来说，size设为10000已经足够用了。如果确实需要返回多于10000条数据，可以适当修改max_result_window的值。以下示例将hotel索引的最大窗口值修改为20000：

PUT /hotel/_settings
{
  "index":{
    "max_result_window":20000   //设定搜索返回的文档个数
  }
}

注意，如果将配置修改得很大，一定要有足够强大的硬件作为支撑。
作为一个分布式搜索引擎，一个ES索引的数据分布在多个分片中，而这些分片又分配在不同的节点上。一个带有分页的搜索请求往往会跨越多个分片，每个分片必须在内存中构建一个长度为from+size的、按照得分排序的有序队列，用以存储命中的文档。然后这些分片对应的队列数据都会传递给协调节点，协调节点将各个队列的数据进行汇总，需要提供一个长度为（分片总数）*(from+size)的队列用以进行全局排序，然后再按照用户的请求从from位置开始查找，找到size个文档后进行返回。
基于上述原理，ES不适合深翻页。什么是深翻页呢？简而言之就是请求的from值很大。假设在一个3个分片的索引中进行搜索请求，参数from和size的值分别为1000和10，其响应过程如下图：
Elasticsearch(八)搜索---搜索辅助功能（上）--指定搜索返回字段，结果计数和分页
当深翻页的请求过多时会增加各个分片所在节点的内存和CPU消耗。尤其是协调节点，随着页码的增加和并发请求的增多，该节点需要对这些请求涉及的分片数据进行汇总和排序，过多的数据会导致协调节点资源耗尽而停止服务。
作为搜索引擎，ES更适合的场景是对数据进行搜索，而不是进行大规模的数据遍历。一般情况下，只需要返回前1000条数据即可，没有必要取到10000条数据。如果确实有大规模数据遍历的需求，可以参考使用scroll模式或者考虑使用其他存储引擎。
在Java客户端中，可以调用SearchSourceBuilder的from和size()方法来设定from和size参数。这里，我是用一种平常开发设置分页参数的一种方法,我们知道，类似mysql，我们都是通过offset,limit参数去控制从哪开始，查多少这样一个场景，其实ES和这个是一样的。我们可以建立一个共同的分页接口Pageable并写入获取Offset和Limit这两个参数的方法：

package com.mbw.request;

public interface Pagable {

	int getOffset();

	int getLimit();

	boolean isAutoCount();
}

然后就是写一个分页条件类，因为前端一般分页参数输入的是pageNo和pageSize来控制分页，熟悉分页的应该都了解，offset和limit可以通过这两个参数计算获取，下面是该条件类的主要代码：

package com.mbw.request;

import java.io.Serializable;


/**
 * 查询条件对象基类
 */
public class PageCondition implements Serializable, Pagable {

	private static final long serialVersionUID = 1L;

	public static final int DEFAULT_PAGE_NO = 1;
	public static final int DEFAULT_PAGE_SIZE = 10;

	protected int pageNo = DEFAULT_PAGE_NO;
	protected int pageSize = DEFAULT_PAGE_SIZE;
	protected boolean autoCount = true;

	public PageCondition() {
	}

	public PageCondition(int pageNo, int pageSize) {
		this.pageNo = pageNo < 1 ? DEFAULT_PAGE_NO : pageNo;
		this.pageSize = pageSize < 2 ? DEFAULT_PAGE_SIZE : pageSize;
	}

	public int getEnd(){
		return getLimit()+getOffset();
	}

	@Override
	public int getOffset() {
		return (pageNo - 1) * pageSize;
	}

	@Override
	public int getLimit() {
		return pageSize;
	}

	public void setPageNo(int pageNo) {
		this.pageNo = pageNo;
	}

	public void setPageSize(int pageSize) {
		this.pageSize = pageSize;
	}

	/**
	 * 查询对象时是否自动另外执行count查询获取总记录数, 默认为false.
	 */
	@Override
	public boolean isAutoCount() {
		return autoCount;
	}

	/**
	 * 查询对象时是否自动另外执行count查询获取总记录数.
	 */
	public void setAutoCount(final boolean autoCount) {
		this.autoCount = autoCount;
	}

	public int getPageNo() {
		return pageNo;
	}

	public int getPageSize() {
		return pageSize;
	}

}

这样我们就可以通过pageNo和pageSize去控制offset和limit了，然后我们只需要调用SearchSourceBuilder的from和size方法即可，我们这边沿用之前指定搜索返回字段的service接口：

	public List<Hotel> queryBySource(HotelDocRequest hotelDocRequest) throws IOException {
		String indexName = hotelDocRequest.getIndexName();
		if (CharSequenceUtil.isBlank(indexName)) {
			throw new SearchException("索引名不能为空");
		}
		Hotel hotel = hotelDocRequest.getHotel();
		if (ObjectUtil.isEmpty(hotel)) {
			throw new SearchException("搜索条件不能为空");
		}
		SearchRequest searchRequest = new SearchRequest(indexName);
		String city = hotel.getCity();
		//创建搜索builder
		SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
		//构建query
		searchSourceBuilder.query(new TermQueryBuilder("city",city));
		//设置分页参数
		searchSourceBuilder.from(hotelDocRequest.getOffset());
		searchSourceBuilder.size(hotelDocRequest.getLimit());
		//设定希望返回的字段数组
		searchSourceBuilder.fetchSource(new String[]{"title","city"},null);
		searchRequest.source(searchSourceBuilder);
		ArrayList<Hotel> resultList = new ArrayList<>();
		SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
		RestStatus status = searchResponse.status();
		if (status != RestStatus.OK) {
			return Collections.emptyList();
		}
		SearchHits searchHits = searchResponse.getHits();
		for (SearchHit searchHit : searchHits) {
			Hotel hotelResult = new Hotel();
			hotelResult.setId(searchHit.getId());   //文档_id
			hotelResult.setIndex(searchHit.getIndex());   //索引名称
			hotelResult.setScore(searchHit.getScore());   //文档得分
			//转换为Map
			Map<String, Object> dataMap = searchHit.getSourceAsMap();
			hotelResult.setTitle((String) dataMap.get("title"));
			hotelResult.setCity((String) dataMap.get("city"));
			resultList.add(hotelResult);
		}
		return resultList;
	}

那么如果我现在什么都不输入，那么肯定会是用默认值pageNo=1,pageSize=10,意味着Offset=0,limit=10.那这样查出来肯定还是之前的2条，假设前端把pageSize改成1，那么postman调用应该就只有第一条了：
Elasticsearch(八)搜索---搜索辅助功能（上）--指定搜索返回字段，结果计数和分页文章来源地址https://www.toymoban.com/news/detail-445901.html