Elasticsearch 8.X 如何依据 Nested 嵌套类型的某个字段进行排序？-Toy模板网

这篇具有很好参考价值的文章主要介绍了Elasticsearch 8.X 如何依据 Nested 嵌套类型的某个字段进行排序？。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

1、问题来源

这是来自社区的一个真实企业场景问题。

https://elasticsearch.cn/question/13135

如下所示，希望在查出的结果后，对结果进行后处理，对tags列表，根据depth进行排序。

{
"keyProperty":"22",
"name":"测试内容",
"_class":"com.xxxxxxxx.ElasticSearchContent",
"contentType":"attractionArea",
"content":"这是一条测试内容",
"timestamp":1701325254191,
"tags":[
{
"path":"33^^35^^36^^38",
"depth":4,
"id":38,
"label":"测试42"
},
{
"path":"33^^35^^36^^37^^39",
"depth":5,
"id":39,
"label":"测试51"
},
{
"path":"33^^35",
"depth":2,
"id":35,
"label":"测试22"
}
]
}

2、分析一下

Elasticsearch 能支持的排序方式罗列如下：

包含但不限于：

基于特定字段的排序
基于Nested对象字段的排序
基于特定脚本实现的排序

等等......

参见：

https://www.elastic.co/guide/en/elasticsearch/reference/current/sort-search-results.html#nested-sorting

再看咱们的开篇需求，

第一：检索返回结果；
第二：基于结果的 tags 数组下的子字段 depth 进行排序。

字段排序分类中的：基于特定字段的排序和基于 Nested 对象字段的排序，是对整个查询结果集进行排序，这在 Elasticsearch 中通常是针对顶层文档字段或者简单嵌套字段进行的。

而咱们开篇需求的应用场景和实现方式与之是不同的，哪咋办？

见招拆招了，只能考虑基于特定脚本实现的排序了。

Elasticsearch 8.X 如何依据 Nested 嵌套类型的某个字段进行排序？,elasticsearch,jenkins,大数据,搜索引擎,全文检索

要实现开篇的需求——即对每个文档的 tags 列表进行排序，需要在返回结果中对这些 tags 列表进行处理。

通常有两大类方案：

使用脚本字段（script_fields）实现；
在查询结果返回后在客户端进行处理，大白话：自己Java或Python程序层面处理。

3、尝试拆解实现

咱们要先模拟构造数据，包含创建索引和bulk 批量构造写入数据两个部分。

创建索引如下：

PUT /example_index
{
  "mappings": {
    "properties": {
      "keyProperty": {
        "type": "keyword"
      },
      "name": {
        "type": "text"
      },
      "_class": {
        "type": "keyword"
      },
      "contentType": {
        "type": "keyword"
      },
      "content": {
        "type": "text"
      },
      "timestamp": {
        "type": "date"
      },
      "tags": {
        "type": "nested",
        "properties": {
          "path": {
            "type": "keyword"
          },
          "depth": {
            "type": "integer"
          },
          "id": {
            "type": "integer"
          },
          "label": {
            "type": "text"
          }
        }
      }
    }
  }
}

导入数据：

POST /example_index/_bulk
{"index":{"_id":1}}
{"keyProperty":"22","name":"测试内容1","_class":"com.xxxxxxxx.ElasticSearchContent","contentType":"attractionArea","content":"这是一条测试内容","timestamp":1701325254191,"tags":[{"path":"33^^35^^36^^38","depth":4,"id":38,"label":"测试42"},{"path":"33^^35^^36^^37^^39","depth":5,"id":39,"label":"测试51"},{"path":"33^^35","depth":2,"id":35,"label":"测试22"}]}
{"index":{"_id":2}}
{"keyProperty":"23","name":"测试内容2","_class":"com.xxxxxxxx.ElasticSearchContent","contentType":"attractionArea","content":"这是另一条测试内容","timestamp":1701325254200,"tags":[{"path":"33^^35^^36","depth":5,"id":36,"label":"测试33"},{"path":"33^^35^^37","depth":3,"id":37,"label":"测试34"}]}

3.1 方案一：脚本字段（script_fields）实现自建排序

GET /example_index/_search
{
  "query": {
    "nested": {
      "path": "tags",
      "query": {
        "match_all": {}
      }
    }
  },
  "script_fields": {
    "sorted_tags": {
      "script": {
        "lang": "painless",
        "source": """
        if (!params._source.tags.empty) {
          def tags = new ArrayList(params._source.tags);
          boolean swapped;
          do {
            swapped = false;
            for (int i = 0; i < tags.size() - 1; i++) {
              if (tags[i].depth > tags[i + 1].depth) {
                def temp = tags[i];
                tags[i] = tags[i + 1];
                tags[i + 1] = temp;
                swapped = true;
              }
            }
          } while (swapped);
          return tags;
        } else {
          return null;
        }
      """
      }
    }
  }
}

召回结果如下：

Elasticsearch 8.X 如何依据 Nested 嵌套类型的某个字段进行排序？,elasticsearch,jenkins,大数据,搜索引擎,全文检索

有人可能会说，这不是扯吗？都整出个冒泡排序来了。

是的，就是传统的数组排序的脚本实现。当没有办法的时候，不考虑性能的时候，笨办法也是办法。

在 Elasticsearch 中处理大量数据时运行复杂的脚本可能会消耗较多的计算资源！

还有，冒泡排序是一种效率较低的排序算法，特别是对于大列表，其性能不是最佳的。

相比于使用 Elasticsearch 内置的排序功能，手动实现排序算法增加了脚本的复杂性。

3.2 方案二：脚本字段实现自建排序——lamda表达式排序

GET /example_index/_search
{
  "query": {
    "nested": {
      "path": "tags",
      "query": {
        "match_all": {}
      }
    }
  },
  "script_fields": {
    "sorted_tags": {
      "script": {
        "lang": "painless",
        "source": """
          if (!params._source.tags.empty) {
            def tags = new ArrayList(params._source.tags);
            tags.sort((a, b) -> a.depth.compareTo(b.depth));
            return tags;
          } else {
            return null;
          }
        """
      }
    }
  },
  "size": 10
}

这里使用了一个 lambda 表达式 (a, b) -> a.depth.compareTo(b.depth)。最后，返回排序后的 tags。

参见：

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-fields.html

执行结果如下：