ElasticSearch系列 - SpringBoot整合ES:实现分页搜索 from+size、search after、scroll

这篇具有很好参考价值的文章主要介绍了ElasticSearch系列 - SpringBoot整合ES:实现分页搜索 from+size、search after、scroll。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

01. 数据准备

ElasticSearch 向 my_index 索引中索引了 12 条文档:

PUT /my_index/_doc/1
{
  "title": "文雅酒店",
  "content": "青岛",
  "price": 556
}

PUT /my_index/_doc/2
{
  "title": "金都嘉怡假日酒店",
  "content": "北京",
  "price": 337
}

PUT /my_index/_doc/3
{
  "title": "金都欣欣酒店",
  "content": "天津",
  "price": 200
}

PUT /my_index/_doc/4
{
  "title": "金都酒店",
  "content": "上海",
  "price": 300
}

PUT /my_index/_doc/5
{
  "title": "自如酒店",
  "content": "南京",
  "price": 400
}

PUT /my_index/_doc/6
{
  "title": "如家酒店",
  "content": "杭州",
  "price": 500
}


PUT /my_index/_doc/7
{
  "title": "非常酒店",
  "content": "合肥",
  "price": 600
}


PUT /my_index/_doc/8
{
  "title": "金都酒店",
  "content": "淮北",
  "price": 700
}

PUT /my_index/_doc/9
{
  "title": "金都酒店",
  "content": "淮南",
  "price": 900
}

PUT /my_index/_doc/10
{
  "title": "丽舍酒店",
  "content": "阜阳",
  "price": 1000
}

PUT /my_index/_doc/11
{
  "title": "文轩酒店",
  "content": "蚌埠",
  "price": 1020
}

PUT /my_index/_doc/12
{
  "title": "大理酒店",
  "content": "长沙",
  "price": 1100
}

02. ElasticSearch 如何查询所有文档?

ElasticSearch 查询所有文档

GET /my_index/_search

根据查询结果可以看出,集群中总共有12个文档,hits.total.value=12, 但是在 hits 数组中只有 10 个文档。如何才能看到其他的文档?

{
  "took" : 688,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 12,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "title" : "金都嘉怡假日酒店",
          "content" : "北京",
          "price" : 337
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "title" : "金都欣欣酒店",
          "content" : "天津",
          "price" : 200
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "文雅酒店",
          "content" : "青岛",
          "price" : 556
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "title" : "金都酒店",
          "content" : "上海",
          "price" : 300
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "title" : "自如酒店",
          "content" : "南京",
          "price" : 400
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 1.0,
        "_source" : {
          "title" : "如家酒店",
          "content" : "杭州",
          "price" : 500
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 1.0,
        "_source" : {
          "title" : "非常酒店",
          "content" : "合肥",
          "price" : 600
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 1.0,
        "_source" : {
          "title" : "金都酒店",
          "content" : "淮北",
          "price" : 700
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 1.0,
        "_source" : {
          "title" : "金都酒店",
          "content" : "淮南",
          "price" : 900
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "10",
        "_score" : 1.0,
        "_source" : {
          "title" : "丽舍酒店",
          "content" : "阜阳",
          "price" : 1000
        }
      }
    ]
  }
}

03. ElasticSearch 如何指定搜索结果的条数?

Elasticsearch 接受 fromsize 参数:

from:显示应该跳过的初始结果数量,默认是0
size:显示应该返回的结果数量,默认是10

from 和 size 参数的默认值分别为 0 和 10,因此如果不指定这两个参数,将返回前 10 条记录,这也是为什么集群中总共有12个文档,hits.total.value=12, 但是在 hits 数组中只有 10 个文档的原因。

如果我们想返回更多的结果数量,可以通过size参数来指定:

GET /my_index/_search
{
  "size": 15
}

集群中总共有12条文档。size=15 会把集群中所有的文档返回:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 12,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "title" : "金都嘉怡假日酒店",
          "content" : "北京",
          "price" : 337
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "title" : "金都欣欣酒店",
          "content" : "天津",
          "price" : 200
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "文雅酒店",
          "content" : "青岛",
          "price" : 556
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "title" : "金都酒店",
          "content" : "上海",
          "price" : 300
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "title" : "自如酒店",
          "content" : "南京",
          "price" : 400
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 1.0,
        "_source" : {
          "title" : "如家酒店",
          "content" : "杭州",
          "price" : 500
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 1.0,
        "_source" : {
          "title" : "非常酒店",
          "content" : "合肥",
          "price" : 600
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 1.0,
        "_source" : {
          "title" : "金都酒店",
          "content" : "淮北",
          "price" : 700
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 1.0,
        "_source" : {
          "title" : "金都酒店",
          "content" : "淮南",
          "price" : 900
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "10",
        "_score" : 1.0,
        "_source" : {
          "title" : "丽舍酒店",
          "content" : "阜阳",
          "price" : 1000
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "11",
        "_score" : 1.0,
        "_source" : {
          "title" : "文轩酒店",
          "content" : "蚌埠",
          "price" : 1020
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "12",
        "_score" : 1.0,
        "_source" : {
          "title" : "大理酒店",
          "content" : "长沙",
          "price" : 1100
        }
      }
    ]
  }
}

04. ElasticSearch 分页查询方式有哪些?

使用 from 和 size 参数来实现分页查询。
使用 scroll 查询来实现分页查询。
使用搜索后再次查询的方式来实现分页查询。

05. ElasticSearch 如何实现 from+size 分页查询?

在 ElasticSearch 中,可以使用 from 和 size 参数来进行分页搜索。 from 和 size 参数用来指定从哪个文档开始,返回多少个文档。具体命令如下:

GET /my_index/_search
{
  "query": {
    "match": {
      "title": "酒店"
    }
  }, 
  "from": 0, // 从第 1 条数据开始
  "size": 3  // 返回 3 条数据
}

结果如下,总共有12条数据,从第1条数据开始,返回3条数据:

{
  "took" : 19,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 12,
      "relation" : "eq"
    },
    "max_score" : 0.075949445,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.075949445,
        "_source" : {
          "title" : "文雅酒店",
          "content" : "青岛",
          "price" : 556
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.075949445,
        "_source" : {
          "title" : "金都酒店",
          "content" : "上海",
          "price" : 300
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.075949445,
        "_source" : {
          "title" : "自如酒店",
          "content" : "南京",
          "price" : 400
        }
      }
    ]
  }
}

在上面的命令中,我们使用 from 参数指定从哪个文档开始,使用 size 参数指定返回多少个文档。例如,当 from=0 且 size=10 时,返回的是第 1 到第 10 条数据。当 from=10 且 size=10 时,返回的是第 11 到第 20 条数据。

06. ElasticSearch 如何实现 searchAfter 分页查询?

Search After API 可以用于在 Elasticsearch 中处理大量数据。它允许您在不影响性能的情况下检索大量数据。使用 Search After API,您可以在多个请求之间保持查询上下文,并在每个请求中返回一定数量的结果。这样,您就可以逐步处理大量数据,而不必一次性将所有数据加载到内存中。

Search After API 从指定的某个数据后面开始读。这种方式不能随机跳转分页,只能一页一页地读取数据,而且必须用一个唯一且不重复的属性对查询数据进行排序。

POST /my_index/_search
{
  "size": 3,
  "query": {
    "match": {
      "title": "酒店"
    }
  },
  "sort": [
    {
      "price": "asc"
    }
  ],
  "track_total_hits": true
}

以上代码表示从 my_index 索引中查询 title 包含 酒店的数据,每次返回 3 条数据,并按照 price 字段升序排序。查询结果中会返回一个 sort 值,用于在后续请求中使用。同时,设置 track_total_hits 参数为 true,表示计算总命中数。

查询文档的总命中数 hits.total.value 为12,返回3条数据:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 12,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "title" : "金都欣欣酒店",
          "content" : "天津",
          "price" : 200
        },
        "sort" : [
          200
        ]
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "title" : "金都酒店",
          "content" : "上海",
          "price" : 300
        },
        "sort" : [
          300
        ]
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "title" : "金都嘉怡假日酒店",
          "content" : "北京",
          "price" : 337
        },
        "sort" : [
          337
        ]
      }
    ]
  }
}

接下来,可以使用 sort 值来获取下一页数据:

POST /my_index/_search
{
  "size": 1000,
  "query": {
    "match": {
      "title": "酒店"
    }
  },
  "sort": [
    {
      "price": "asc"
    }
  ],
  "search_after": [337]
}
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 12,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "title" : "自如酒店",
          "content" : "南京",
          "price" : 400
        },
        "sort" : [
          400
        ]
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : null,
        "_source" : {
          "title" : "如家酒店",
          "content" : "杭州",
          "price" : 500
        },
        "sort" : [
          500
        ]
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "title" : "文雅酒店",
          "content" : "青岛",
          "price" : 556
        },
        "sort" : [
          556
        ]
      }
    ]
  }
}

07. ElasticSearch 如何实现 scroll 分页查询?

Scroll API 可以用于在 Elasticsearch 中处理大量数据。它允许您在不影响性能的情况下检索大量数据。使用 Scroll API,您可以在多个请求之间保持查询上下文,并在每个请求中返回一定数量的结果。这样,您就可以逐步处理大量数据,而不必一次性将所有数据加载到内存中。

第一个查询会在内存中保存一个历史快照和光标(scroll_id)来记录当前消息查询的终止位置。下次查询会从光标记录的位置往后进行查询。这种方式性能好,一般用于海量数据导出或者重建索引。但是 scroll_id 有过期时间,两次查询之间如果 scroll_id 过期了,第二次查询会抛异常“找不到 “scroll_id”。

启用游标查询可以通过在查询的时候设置参数 scroll 的值为我们期望的游标查询的过期时间。 游标查询的过期时间会在每次做查询的时候刷新,所以这个时间只需要足够处理当前批的结果就可以了,而不是处理查询结果的所有文档的所需时间。 这个过期时间的参数很重要,因为保持这个游标查询窗口需要消耗资源,所以我们期望如果不再需要维护这种资源就该早点儿释放掉。 设置这个超时能够让 Elasticsearch 在稍后空闲的时候自动释放这部分资源。

① 执行初始查询,获取scroll_id,其中,scroll参数指定了scroll查询的有效时间,这里设置为1分钟,size 表示每次返回7条数据。

POST /my_index/_search?scroll=1m
{
  "size": 7,
  "query": {
    "match": {
      "title": "酒店"
    }
  }
}

执行上述查询后,查询结果中会返回一个 scroll_id,用于在后续请求中使用,类似于以下内容:

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACQVUWZFFwRElpblJROU9lZV9LeXI5MUpPQQ==",
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 12,
      "relation" : "eq"
    },
    "max_score" : 0.06382885,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "文雅酒店",
          "content" : "青岛",
          "price" : 556
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "金都酒店",
          "content" : "上海",
          "price" : 300
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "自如酒店",
          "content" : "南京",
          "price" : 400
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "如家酒店",
          "content" : "杭州",
          "price" : 500
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "非常酒店",
          "content" : "合肥",
          "price" : 600
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "金都酒店",
          "content" : "淮南",
          "price" : 900
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "金都酒店",
          "content" : "淮北",
          "price" : 700
        }
      }
    ]
  }
}

② 使用scroll_id获取下一页数据:

POST /_search/scroll
{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACQVUWZFFwRElpblJROU9lZV9LeXI5MUpPQQ=="
}

执行上述查询后,会返回下一页数据和一个新的scroll_id:

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACQVUWZFFwRElpblJROU9lZV9LeXI5MUpPQQ==",
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 12,
      "relation" : "eq"
    },
    "max_score" : 0.06382885,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "10",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "丽舍酒店",
          "content" : "阜阳",
          "price" : 1000,
          "uploadTime" : 1678073241
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "11",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "文轩酒店",
          "content" : "蚌埠",
          "price" : 1020
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "12",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "大理酒店",
          "content" : "长沙",
          "price" : 1100
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.05390298,
        "_source" : {
          "title" : "金都欣欣酒店",
          "content" : "天津",
          "price" : 200
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.046648744,
        "_source" : {
          "title" : "金都嘉怡假日酒店",
          "content" : "北京",
          "price" : 337
        }
      }
    ]
  }
}

③ 重复步骤②,直到所有数据都被检索完毕

POST /_search/scroll
{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACQVUWZFFwRElpblJROU9lZV9LeXI5MUpPQQ=="
}
{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACQVUWZFFwRElpblJROU9lZV9LeXI5MUpPQQ==",
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 12,
      "relation" : "eq"
    },
    "max_score" : 0.06382885,
    "hits" : [ ]
  }
}

④ 当所有数据都被检索完毕后,需要使用clear_scroll API来清除scroll_id。

DELETE /_search/scroll
{
    "scroll_id": [
        "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACQVUWZFFwRElpblJROU9lZV9LeXI5MUpPQQ==",
        "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACQVUWZFFwRElpblJROU9lZV9LeXI5MUpPQQ=="
    ]
}

注意,scroll查询会占用Elasticsearch的资源,因此在使用时需要注意性能问题。同时,scroll查询也不适用于实时数据的查询,因为scroll查询只能查询到在scroll查询开始时已经存在的数据。

08. ElasticSearch 深分页是什么?

ElasticSearch 深分页是指在搜索结果中,需要跳过大量的文档才能到达目标文档的情况。这种情况通常发生在需要访问大量文档的搜索结果中,例如搜索结果有数百万个文档,但只需要访问其中的前几个文档。这个查询的实现原理类似于mysql中的limit。比如查询10001条数据,需要把前10000条取出来过滤,最后得到数据。

在 ElasticSearch 中,深分页可能会导致性能问题,因为每次跳过大量文档时,ElasticSearch 都需要执行一次查询,并且需要将查询结果中的所有文档加载到内存中,这会占用大量的 CPU 和内存资源。

为了避免这种情况,可以使用 ElasticSearch 的 Scroll API 或 Search After API 来进行分页查询。这些 API 可以在不加载所有文档的情况下,快速地获取搜索结果中的指定文档。

09. ElasticSearch 分页查询的最大限制是多少?

当查询页很深或者查询的数据量很大时,就会发生深分页。ElasticSearch 分页查询的最大限制是 10000 条数据,当查询条数超过10000时,会报错。

GET /my_index/_search
{
  "query": {
    "match": {
      "title": "酒店"
    }
  }, 
  "from": 0,
  "size": 10001
}

查询结果会报错:Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.

也就是说我们最多只能分页查询10000条数据。

10. ElasticSearch 如何解除分页查询的限制?

max_result_window 属性控制从Elasticsearch中检索文档的最大数量,默认情况下,它的值为10000。可以通过修改 index.max_result_window 参数来增加搜索结果的最大数量。如果您需要检索更多的文档,请增加max_result_window的值。但是,需要注意的是,增加max_result_window的值可能会影响Elasticsearch的性能。

第一种办法:在kibana中执行,解除索引最大查询数的限制

PUT /my_index/_settings
{
  "index.max_result_window":200000
}

第二种办法:在创建索引的时候加上

PUT /my_index
{
  "settings": {
    "index": {
      "max_result_window": 10000
    }
  }
}

11. ElasticSearch 查询文档总命中数最大限制为多少?

ElasticSearch中可以根据搜索结果中的 hits.total.value 值获取查询文档的总命中数, 但最大返回条数是有限制的,默认情况下最大为 10000 条。数据量不大的情况下这个数值没问题。但是当数据超出 10000 的时候,这个 hits.total.value 将不会增长了,固定为 10000,这个时候的匹配文档数量统计就不准了。

如集群中总共有30000条文档,查询所有时 hits.total.value 的值却为10000:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
        // ...
     ]
  }
}

12. ElasticSearch 如何解除查询文档总命中数的限制?

Elasticsearch 的 track_total_hits 参数用于控制查询时是否计算总命中数,如果想要统计准确的匹配文档数,需要使用参数 track_total_hits 来开启精确匹配。默认情况下会计算前10000条数据的总命中数,如果想解除这个限制,需要将track_total_hits 参数设置为true。

track_total_hits 参数有三种取值:

true:计算总命中数。
false:不计算总命中数。
数字:只计算前 n 条数据的总命中数。

① 计算总命中数:

GET /my_index/_search
{
  "query": {
    "match": {
      "title": "酒店"
    }
  },
  "track_total_hits": true
}

查询文档的总命中数 hits.total.value 值为12,文档列表 hits.hits 中10条文档(from=0,size=10)

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 12,
      "relation" : "eq"
    },
    "max_score" : 0.06382885,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "文雅酒店",
          "content" : "青岛",
          "price" : 556
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "金都酒店",
          "content" : "上海",
          "price" : 300
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "自如酒店",
          "content" : "南京",
          "price" : 400
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "如家酒店",
          "content" : "杭州",
          "price" : 500
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "非常酒店",
          "content" : "合肥",
          "price" : 600
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "金都酒店",
          "content" : "淮南",
          "price" : 900
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "金都酒店",
          "content" : "淮北",
          "price" : 700
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "10",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "丽舍酒店",
          "content" : "阜阳",
          "price" : 1000,
          "uploadTime" : 1678073241
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "11",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "文轩酒店",
          "content" : "蚌埠",
          "price" : 1020
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "12",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "大理酒店",
          "content" : "长沙",
          "price" : 1100
        }
      }
    ]
  }
}

② 不计算总命中数:

GET /my_index/_search
{
  "query": {
    "match": {
      "title": "酒店"
    }
  },
  "track_total_hits": false
}

查询结果中不返回总命中数 hits.total.value ,文档列表 hits.hits 中10条文档(from=0,size=10)

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "max_score" : 0.06382885,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "文雅酒店",
          "content" : "青岛",
          "price" : 556
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "金都酒店",
          "content" : "上海",
          "price" : 300
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "自如酒店",
          "content" : "南京",
          "price" : 400
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "如家酒店",
          "content" : "杭州",
          "price" : 500
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "非常酒店",
          "content" : "合肥",
          "price" : 600
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "金都酒店",
          "content" : "淮南",
          "price" : 900
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "金都酒店",
          "content" : "淮北",
          "price" : 700
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "10",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "丽舍酒店",
          "content" : "阜阳",
          "price" : 1000,
          "uploadTime" : 1678073241
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "11",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "文轩酒店",
          "content" : "蚌埠",
          "price" : 1020
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "12",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "大理酒店",
          "content" : "长沙",
          "price" : 1100
        }
      }
    ]
  }
}

③ 只计算前5条数据的总命中数:

GET /my_index/_search
{
  "query": {
    "match": {
      "title": "酒店"
    }
  },
  "track_total_hits": 5
}

前5条数据的总命中数 hits.total.value 值为5,文档列表 hits.hits 中10条文档(from=0,size=10)

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "gte"
    },
    "max_score" : 0.06382885,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "文雅酒店",
          "content" : "青岛",
          "price" : 556
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "金都酒店",
          "content" : "上海",
          "price" : 300
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "自如酒店",
          "content" : "南京",
          "price" : 400
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "如家酒店",
          "content" : "杭州",
          "price" : 500
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "非常酒店",
          "content" : "合肥",
          "price" : 600
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "金都酒店",
          "content" : "淮南",
          "price" : 900
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "金都酒店",
          "content" : "淮北",
          "price" : 700
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "10",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "丽舍酒店",
          "content" : "阜阳",
          "price" : 1000,
          "uploadTime" : 1678073241
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "11",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "文轩酒店",
          "content" : "蚌埠",
          "price" : 1020
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "12",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "大理酒店",
          "content" : "长沙",
          "price" : 1100
        }
      }
    ]
  }
}

④ 计算前15条文档的总命中数:

GET /my_index/_search
{
  "query": {
    "match": {
      "title": "酒店"
    }
  },
  "track_total_hits": 15
}

前15条数据的总命中数 hits.total.value 值为12,文档列表 hits.hits 中10条文档(from=0,size=10)

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 12,
      "relation" : "eq"
    },
    "max_score" : 0.06382885,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "文雅酒店",
          "content" : "青岛",
          "price" : 556
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "金都酒店",
          "content" : "上海",
          "price" : 300
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "自如酒店",
          "content" : "南京",
          "price" : 400
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "如家酒店",
          "content" : "杭州",
          "price" : 500
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "非常酒店",
          "content" : "合肥",
          "price" : 600
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "金都酒店",
          "content" : "淮南",
          "price" : 900
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "金都酒店",
          "content" : "淮北",
          "price" : 700
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "10",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "丽舍酒店",
          "content" : "阜阳",
          "price" : 1000,
          "uploadTime" : 1678073241
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "11",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "文轩酒店",
          "content" : "蚌埠",
          "price" : 1020
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "12",
        "_score" : 0.06382885,
        "_source" : {
          "title" : "大理酒店",
          "content" : "长沙",
          "price" : 1100
        }
      }
    ]
  }
}

13. ElasticSearch 分页查询的性能优化有哪些?

尽量减少查询的字段,只查询需要的字段。
尽量减少查询的数据量,只查询需要的数据。
使用 scroll 查询或者搜索后再次查询的方式来避免过多的分页查询。
使用索引优化技术,如分片、副本等来提高查询性能。文章来源地址https://www.toymoban.com/news/detail-401588.html

14. SpringBoo整合ES实现:from+size 分页查询?

GET /my_index/_search
{
  "query": {
    "match": {
      "title": "酒店"
    }
  }, 
  "from": 0, // 从第 1 条数据开始
  "size": 3  // 返回 3 条数据
}
@Slf4j
@Service
public class ElasticSearchImpl {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    public void searchUser() throws IOException {
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        // query 查询
        MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("title","酒店");
        searchSourceBuilder.query(matchQueryBuilder);
        
        // 分页查询
        int page = 1; // 第1页
        int pageSize = 3; // 每页返回3条数据
        searchSourceBuilder.from((page-1)*pageSize);
        searchSourceBuilder.size(pageSize);

        SearchRequest searchRequest = new SearchRequest(new String[]{"my_index"},searchSourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        // 搜索结果
        SearchHits searchHits = searchResponse.getHits();
        SearchHit[] hits = searchHits.getHits();
        for (SearchHit hit : hits) {
            // hits.hits._source:匹配的文档的原始数据
            String sourceAsString = hit.getSourceAsString();
        }
        System.out.println(searchResponse);

    }
}

15. SpringBoo整合ES实现:searchAfetr 分页查询?

POST /my_index/_search
{
  "size": 3,
  "query": {
    "match": {
      "title": "酒店"
    }
  },
  "sort": [
    {
      "price": "asc"
    }
  ],
  "track_total_hits": true
}
POST /my_index/_search
{
  "size": 1000,
  "query": {
    "match": {
      "title": "酒店"
    }
  },
  "sort": [
    {
      "price": "asc"
    }
  ],
  "search_after": [337]
}
@Slf4j
@Service
public class ElasticSearchImpl {
    @Autowired
    private RestHighLevelClient restHighLevelClient;

    public void searchUser() throws IOException {
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        // query 查询
        MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("title","酒店");
        searchSourceBuilder.query(matchQueryBuilder);

        // 计算总命中数:track_total_hits
        searchSourceBuilder.trackTotalHits(true);

        // 每次返回3条数据
        searchSourceBuilder.size(3);

        // 设置排序字段
        searchSourceBuilder.sort(SortBuilders.fieldSort("price").order(SortOrder.ASC));

        SearchRequest searchRequest = new SearchRequest(new String[]{"my_index"},searchSourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        List<Map<String, Object>> result = new ArrayList<>();
        while (searchResponse.getHits().getHits()!=null && searchResponse.getHits().getHits().length>0){
            SearchHit[] hits = searchResponse.getHits().getHits();
            for (SearchHit hit : hits) {
                Map<String, Object> sourceAsMap = hit.getSourceAsMap();
                result.add(sourceAsMap);
            }
            // 取得最后一条数据的排序值sort,下次查询时将从这个地方开始取数
            Object[] lastNum = hits[hits.length - 1].getSortValues();
            searchSourceBuilder.searchAfter(lastNum);
            searchRequest.source(searchSourceBuilder);
            // 做下次查询
            searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        }
        System.out.println(result);
    }
}

16. SpringBoo整合ES实现:scroll 分页查询?

@Slf4j
@Service
public class ElasticSearchImpl {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    public void search() throws IOException {
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        // query 查询
        MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("title","酒店");
        searchSourceBuilder.query(matchQueryBuilder);
        // 计算总命中数:track_total_hits
        searchSourceBuilder.trackTotalHits(true);
        // 每次返回7条数据
        searchSourceBuilder.size(7);
        // 设置排序字段
        searchSourceBuilder.sort(SortBuilders.fieldSort("price").order(SortOrder.ASC));

        SearchRequest searchRequest = new SearchRequest(new String[]{"my_index"},searchSourceBuilder);
        // 指定游标的过期时间
        searchRequest.scroll(TimeValue.timeValueMinutes(1L));
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        // 获取 scrollId
        String scrollId = searchResponse.getScrollId();
        SearchHit[] searchHits = searchResponse.getHits().getHits();
        List<Map<String, Object>> result = new ArrayList<>();
        for (SearchHit hit: searchHits) {
            result.add(hit.getSourceAsMap());
        }
        while (true) {
            // 根据 scrollId 查询下一页数据
            SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
            // 指定游标的过期时间
            scrollRequest.scroll(TimeValue.timeValueMinutes(1L));
            SearchResponse scrollResp = restHighLevelClient.scroll(scrollRequest, RequestOptions.DEFAULT);
            SearchHit[] hits = scrollResp.getHits().getHits();
            if (hits != null && hits.length > 0) {
                for (SearchHit hit : hits) {
                    result.add(hit.getSourceAsMap());
                }
            } else {
                break;
            }
        }
        System.out.println(result);
        // After checking, we delete the id stored in the cache. After scrolling, clear the scrolling context
        ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
        clearScrollRequest.addScrollId(scrollId);
        ClearScrollResponse clearScrollResponse = restHighLevelClient.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
        boolean succeeded = clearScrollResponse.isSucceeded();
        System.out.println(succeeded);
        restHighLevelClient.close();
    }
}

到了这里,关于ElasticSearch系列 - SpringBoot整合ES:实现分页搜索 from+size、search after、scroll的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • ElasticSearch系列 - SpringBoot整合ES:ElasticSearch分析器

    1. ElasticSearch match 文本搜索的过程? Elasticsearch 的 match 查询是一种基于文本匹配的查询方式,它的搜索过程如下: ① 将查询字符串分词:Elasticsearch 会将查询字符串分成一个个词项(term),并去除停用词(如“的”、“是”等常用词汇)和标点符号等无意义的字符。 ② 构建

    2023年04月18日
    浏览(123)
  • 搜索引擎ElasticSearch分布式搜索和分析引擎学习,SpringBoot整合ES个人心得

    Elasticsearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java语言开发的,并作为Apache许可条款下的开放源码发布,是一种流行的企业级搜索引擎。Elasticsearch用于云计算中,能够达到实时搜索,稳定,可靠,

    2024年02月04日
    浏览(72)
  • ElasticSearch系列 - SpringBoot整合ES:分析器

    1. ElasticSearch match 文本搜索的过程? Elasticsearch 的 match 查询是一种基于文本匹配的查询方式,它的搜索过程如下: ① 将查询字符串分词:Elasticsearch 会将查询字符串分成一个个词项(term),并去除停用词(如“的”、“是”等常用词汇)和标点符号等无意义的字符。 ② 构建

    2024年02月06日
    浏览(58)
  • Java SpringBoot API 实现ES(Elasticsearch)搜索引擎的一系列操作(超详细)(模拟数据库操作)

    小编使用的是elasticsearch-7.3.2 基础说明: 启动:进入elasticsearch-7.3.2/bin目录,双击elasticsearch.bat进行启动,当出现一下界面说明,启动成功。也可以访问http://localhost:9200/ 启动ES管理:进入elasticsearch-head-master文件夹,然后进入cmd命令界面,输入npm run start 即可启动。访问http

    2024年02月04日
    浏览(57)
  • SpringBoot整合ElasticSearch实现分页查询

    本文使用SpringBoot整合ElasticSearch实现分页查询 还是继续使用spring-boot-starter-data-elasticsearch来实现分页查询操作 数据准备 使用ElasticsearchRestTemplate来实现 程序结果 使用ElasticsearchOperations来实现 程序结果 本文记录了SpringBoot整合ElasticSearch来实现分页查询的两种方式

    2024年01月25日
    浏览(52)
  • ElasticSearch系列 - SpringBoot整合ES:多个精确值查询 terms

    ElasticSearch - SpringBoot整合ES:多个精确值查询 terms 01. ElasticSearch terms 查询支持的数据类型 在Elasticsearch中,terms查询支持多种数据类型,包括: 字符串类型:可以将多个字符串值作为数组传递给terms查询,以匹配包含任何一个指定字符串值的文档。 数值类型:可以将多个数值作

    2024年02月16日
    浏览(71)
  • ElasticSearch系列 - SpringBoot整合ES:组合多个查询条件 bool 查询

    01. ElasticSearch 布尔查询是什么? 在实际应用中,我们很有可能会查询多个值或字段。 一个 bool 查询由三部分组成: must:所有的语句都必须(must) 匹配,与 AND 等价。 must_not:所有的语句都不能(must not)匹配,与 NOT 等价。 should:至少有一个语句要匹配,与 OR 等价。 02.

    2023年04月08日
    浏览(74)
  • ES(Elasticsearch)+SpringBoot实现分页查询

    1.ES介绍   ES作为一个搜索工具,寄托于Lucene之上,提供了方便的数据存储和搜索服务,一般的用它来作为网页数据索引以及存储用户画像(即用户标签)数据,可以提供复具有复杂的查询条件的服务。例如在网页索引中,通过倒排的方式索引的方式,对文档进行分词存储,

    2024年02月16日
    浏览(47)
  • SpringBoot 整合ElasticSearch实现模糊查询,批量CRUD,排序,分页,高亮

    准备一个空的SpringBoot项目 写入依赖 注意你的SpringBoot和你的es版本,一定要对应,如果不知道的可以查看这篇文章:https://blog.csdn.net/u014641168/article/details/130386872 我的版本是2.2.6,所以用的ES版本是 6.8.12,安装es请看这篇文章:https://blog.csdn.net/u014641168/article/details/130622430 查看

    2024年02月08日
    浏览(53)
  • ES es Elasticsearch 十三 Java api 实现搜索 分页查询 复杂查询 过滤查询 ids查询 等

    目录 Java api 实现搜索 Pom.xml 建立链接 搜索全部记录 增加规则值查某些字段 搜索分页 全代码 Ids 搜索 搜索Match搜索 multi_match 搜索 多字段搜索 复杂查询 bool查询 filter  bool 复杂查询增加过滤器查询 复杂擦好像加排序 日志 思路 参考 api 写法 写Java代码 请求条件构建层次

    2024年02月04日
    浏览(60)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包