Elasticsearch使用篇 - 直方图聚合、日期直方图聚合、自动日期直方图聚合

这篇具有很好参考价值的文章主要介绍了Elasticsearch使用篇 - 直方图聚合、日期直方图聚合、自动日期直方图聚合。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

Histogram aggregation

[ˈhɪstəˌɡræm] ，直方图聚合。属于分桶聚合。

基于数值类型的字段或者数值范围类型的字段进行聚合。按照固定间隔动态构建分桶。

field：用于聚合的字段。
interval：指定每个桶之间的间隔。该值必须是一个大于 0 的双精度浮点数。
keyed：默认 false，则使用数组格式返回数据；如果设置 true，则使用键值对格式返回数据。
missing：如果文档没有聚合的字段，则指定一个缺省值。
min_doc_count：每个桶中的文档数需要大于等于该值时，才会返回。如果该值大于 0，则空桶不会被返回。
extended_bounds：用于拓展分桶数。可以指定 min、max 值。在 min 到 max 范围内，如果遇到空桶也会进行返回。
hard_bounds：用于限制分桶的范围。可以指定 min、max 值。限制返回 min 到 max 范围内的所有分桶。
order：指定排序规则。支持 _key、_count。默认按照 _key 升序排序。

GET kibana_sample_data_flights/_search
{
  "track_total_hits": true,
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "FlightTimeMin": {
              "gte": 0,
              "lt": 500
            }
          }
        }
      ]
    }
  }
}

查询 FlightTimeMin 在 [0, 500) 范围内的所有文档，结果命中文档的数量为 6466。

接着我们使用 histogram 聚合并且 interval 参数设置为 500。

GET kibana_sample_data_flights/_search
{
  "track_total_hits": true, 
  "size": 0,
  "aggs": {
    "FlightTimeMin_histrogram": {
      "histogram": {
        "field": "FlightTimeMin",
        "missing": 0,
        "min_doc_count": 0, 
        "interval": 500
      }
    }
  }
}

输出的聚合结果如下：

"aggregations" : {
    "FlightTimeMin_histrogram" : {
      "buckets" : [
        {
          "key" : 0.0,
          "doc_count" : 6466
        },
        {
          "key" : 500.0,
          "doc_count" : 5581
        },
        {
          "key" : 1000.0,
          "doc_count" : 935
        },
        {
          "key" : 1500.0,
          "doc_count" : 77
        }
      ]
    }
  }

可见，[0, 500) 范围内的文档数同样为 6466。

使用 hard_bounds 参数限制返回的分桶范围。

GET kibana_sample_data_flights/_search
{
  "track_total_hits": true, 
  "size": 0,
  "aggs": {
    "FlightTimeMin_histrogram": {
      "histogram": {
        "field": "FlightTimeMin",
        "missing": 0,
        "min_doc_count": 0, 
        "interval": 500,
        "hard_bounds": {
          "min": 0,
          "max": 1000
        }
      }
    }
  }
}

聚合结果的输出如下：

"aggregations" : {
    "FlightTimeMin_histrogram" : {
      "buckets" : [
        {
          "key" : 0.0,
          "doc_count" : 6466
        },
        {
          "key" : 500.0,
          "doc_count" : 5581
        },
        {
          "key" : 1000.0,
          "doc_count" : 935
        }
      ]
    }
  }

可见，只返回了 0 到 1000 范围的三个桶。

指定 order 参数，按照文档数降序排序。

GET kibana_sample_data_flights/_search
{
  "track_total_hits": true, 
  "size": 0,
  "aggs": {
    "FlightTimeMin_histrogram": {
      "histogram": {
        "field": "FlightTimeMin",
        "missing": 0,
        "min_doc_count": 0, 
        "interval": 500,
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

聚合结果输出如下：

"aggregations" : {
    "FlightTimeMin_histrogram" : {
      "buckets" : [
        {
          "key" : 0.0,
          "doc_count" : 6466
        },
        {
          "key" : 500.0,
          "doc_count" : 5581
        },
        {
          "key" : 1000.0,
          "doc_count" : 935
        },
        {
          "key" : 1500.0,
          "doc_count" : 77
        }
      ]
    }
  }

对 histogram 类型的字段进行直方图聚合。

PUT company-staff-001
{
  "mappings": {
    "properties": {
      "grade": {
        "type": "histogram"
      }
    }
  }
}

接着插入两条数据。

PUT company-staff-001/_doc/1
{
  "grade": {
    "values": [1, 2, 3, 4, 5],
    "counts": [100, 120, 150, 200, 220]
  }
}

PUT company-staff-001/_doc/2
{
  "grade": {
    "values": [2, 3, 4, 5, 6],
    "counts": [100, 120, 150, 200, 220]
  }
}

然后使用直方图聚合。

GET company-staff-001/_search
{
  "size": 0, 
  "aggs": {
    "latency_buckets": {
      "histogram": {
        "field": "grade",
        "interval": 5
      }
    }
  }
}

聚合结果输出如下：

"aggregations" : {
    "latency_buckets" : {
      "buckets" : [
        {
          "key" : 0.0,
          "doc_count" : 940
        },
        {
          "key" : 5.0,
          "doc_count" : 640
        }
      ]
    }
  }

histogram 聚合作为 terms 聚合的子聚合。

按照 DestCountry 分成三个桶，再对每个桶中 FlightTimeMin 进行直方图聚合，间隔为 500，按照文档数降序排序。

GET kibana_sample_data_flights/_search
{
  "track_total_hits": true, 
  "size": 0,
  "aggs": {
    "DestCountry_terms": {
      "terms": {
        "field": "DestCountry",
        "size": 3
      },
      "aggs": {
        "FlightTimeMin_histogram": {
          "histogram": {
            "field": "FlightTimeMin",
            "interval": 500,
            "order": {
              "_count": "desc"
            }
          }
        }
      }
    }
  }
}

聚合结果输出如下：

"aggregations" : {
    "DestCountry_terms" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 7605,
      "buckets" : [
        {
          "key" : "IT",
          "doc_count" : 2371,
          "FlightTimeMin_histogram" : {
            "buckets" : [
              {
                "key" : 0.0,
                "doc_count" : 1368
              },
              {
                "key" : 500.0,
                "doc_count" : 897
              },
              {
                "key" : 1000.0,
                "doc_count" : 101
              },
              {
                "key" : 1500.0,
                "doc_count" : 5
              }
            ]
          }
        },
        {
          "key" : "US",
          "doc_count" : 1987,
          "FlightTimeMin_histogram" : {
            "buckets" : [
              {
                "key" : 500.0,
                "doc_count" : 933
              },
              {
                "key" : 0.0,
                "doc_count" : 905
              },
              {
                "key" : 1000.0,
                "doc_count" : 142
              },
              {
                "key" : 1500.0,
                "doc_count" : 7
              }
            ]
          }
        },
        {
          "key" : "CN",
          "doc_count" : 1096,
          "FlightTimeMin_histogram" : {
            "buckets" : [
              {
                "key" : 500.0,
                "doc_count" : 563
              },
              {
                "key" : 0.0,
                "doc_count" : 434
              },
              {
                "key" : 1000.0,
                "doc_count" : 90
              },
              {
                "key" : 1500.0,
                "doc_count" : 9
              }
            ]
          }
        }
      ]
    }
  }

Date_histogram aggregation

[ˈhɪstəˌɡræm] ，日期直方图聚合。属于分桶聚合。

和直方图聚合相似，但是只能对日期类型的字段或者日期范围类型的字段进行聚合。

field：用于聚合的字段。
format：限制 key_as_string 返回的时间格式、以及 extended_bounds、hard_bounds 参数的 min、max 指定的时间格式。
time_zone：指定时区。比如 -01:00、+08:00。
keyed：默认 false，则使用数组格式返回数据；如果设置 true，则使用键值对格式返回数据。
missing：如果文档没有聚合的字段，则指定一个缺省值。
min_doc_count：每个桶中的文档数需要大于等于该值时，才会返回。如果该值大于 0，则空桶不会被返回。
extended_bounds：用于拓展分桶数。可以指定 min、max 值。在 min 到 max 范围内，如果遇到空桶也会进行返回。
hard_bounds：用于限制分桶的范围。可以指定 min、max 值。限制返回 min 到 max 范围内的所有分桶。
order：指定排序规则。支持 _key、_count。默认按照 _key 升序排序。
calendar_interval：指定间隔时间。支持如下几种值：
- minute、1m
- hour、1h
- day、1d
- week、1w
- month、1M
- quarter、1q
- year、1y
fixed_interval：指定间隔时间。支持如下几种值：n 为某个数字
- n milliseconds、ms
- n seconds、s
- n minutes、m
- n hours、h
- n days、d

对 timestamp 字段进行聚合，间隔为一个月。

GET kibana_sample_data_flights/_search
{
  "track_total_hits": true, 
  "size": 0,
  "aggs": {
    "timestamp_histogram": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "1M",
        "format": "yyyy-MM-dd"
      }
    }
  }  
}

聚合结果输出如下：

"aggregations" : {
    "timestamp_histogram" : {
      "buckets" : [
        {
          "key_as_string" : "2022-07-01",
          "key" : 1656633600000,
          "doc_count" : 4379
        },
        {
          "key_as_string" : "2022-08-01",
          "key" : 1659312000000,
          "doc_count" : 8680
        }
      ]
    }
  }

对 timestamp 字段进行聚合，间隔为 30 天。

GET kibana_sample_data_flights/_search
{
  "track_total_hits": true, 
  "size": 0,
  "aggs": {
    "timestamp_histogram": {
      "date_histogram": {
        "field": "timestamp",
        "fixed_interval": "30d",
        "format": "yyyy-MM-dd"
      }
    }
  }  
}

聚合结果输出如下：

"aggregations" : {
    "timestamp_histogram" : {
      "buckets" : [
        {
          "key_as_string" : "2022-06-27",
          "key" : 1656288000000,
          "doc_count" : 2881
        },
        {
          "key_as_string" : "2022-07-27",
          "key" : 1658880000000,
          "doc_count" : 9305
        },
        {
          "key_as_string" : "2022-08-26",
          "key" : 1661472000000,
          "doc_count" : 873
        }
      ]
    }
  }

Auto_date_histogram aggregation

[ˈhɪstəˌɡræm] ，自动日期直方图聚合。属于分桶聚合。

与日期直方图聚合相似，但是不需要指定时间间隔，只需要指定分桶数，就会自动按照最佳的时间间隔进行分桶。

field：用于聚合的字段。
buckets：指定的分桶数。默认 10。实际返回的分桶数可能会小于等于该值。
format：限制 key_as_string 返回的时间格式。
time_zone：指定时区。比如 -01:00、+08:00。
missing：如果文档没有聚合的字段，则指定一个缺省值。
mimum_interval：指定最小的时间间隔。支持 year、month、day、hour、minute、second。

时间间隔按照天对 timestamp 字段分成 10 个桶。

GET kibana_sample_data_flights/_search
{
  "track_total_hits": true, 
  "size": 0,
  "aggs": {
    "timestamp_auto_date_histogram": {
      "auto_date_histogram": {
        "field": "timestamp",
        "format": "yyyy-MM-dd",
        "buckets": 10,
        "minimum_interval": "day"
      }
    }
  }  
}

聚合结果如下：文章来源地址https://www.toymoban.com/news/detail-807193.html

"aggregations" : {
    "timestamp_auto_date_histogram" : {
      "buckets" : [
        {
          "key_as_string" : "2022-07-18",
          "key" : 1658102400000,
          "doc_count" : 2202
        },
        {
          "key_as_string" : "2022-07-25",
          "key" : 1658707200000,
          "doc_count" : 2177
        },
        {
          "key_as_string" : "2022-08-01",
          "key" : 1659312000000,
          "doc_count" : 2142
        },
        {
          "key_as_string" : "2022-08-08",
          "key" : 1659916800000,
          "doc_count" : 2187
        },
        {
          "key_as_string" : "2022-08-15",
          "key" : 1660521600000,
          "doc_count" : 2188
        },
        {
          "key_as_string" : "2022-08-22",
          "key" : 1661126400000,
          "doc_count" : 2163
        }
      ],
      "interval" : "7d"
    }
  }