【博客669】prometheus rate()选择range范围的最佳实践-Toy模板网

这篇具有很好参考价值的文章主要介绍了【博客669】prometheus rate()选择range范围的最佳实践。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

prometheus rate()选择range范围的最佳实践

1、场景

我们使用rate来计算counter的速率，那么rate的range范围应该选择哪一个？有没有固定的答案呢？30s，1m？

答案：没有，要根据你的scrap_interval来决定的

2、rate()选择range范围的最佳实践：

选择范围的一般规则是至少应为刮擦间隔的 4 倍。这是为了允许各种竞争条件，并对失败的刮擦具有弹性。

剖析：

假设您有一个 10 秒的抓取间隔，并且抓取从 t=0 开始。速率函数至少需要两个样本才能工作，因此对于 t=10 的查询，您需要 1 倍的抓取间隔。在 t=20 时，当时的刮擦可能尚未完全摄取，因此 2x 将为您提供两个样本回到 t=0。在 t=29 时，刮擦可能仍未被摄取，因此您需要 ~3x 才能安全。最后，您想对失败的擦除具有弹性。如果 t=20 抓取失败并且你在 t=39 但 t=30 抓取仍在进行中，那么你需要 ~4x 才能看到 t=0 和 t=10 样本。因此 40s 速率（即rate(my_counter_total[40s]）将是最小安全范围。通常，对于 1m 的速率，您会将其四舍五入到 60s。

参考资料： what-range-should-i-use-with-rate

Let’s say you had a 10s scrape interval, and scrapes started at t=0. The rate function needs at least two samples to work, so for a query at t=10 you’d need 1x the scrape interval. At t=20, the scrape at that time may not have been fully ingested yet so 2x will cover you for two samples back to t=0. At t=29 that scrape might still not have been ingested, so you’d need ~3x to be safe. Finally you want to be resilient to a failed scrape. If the t=20 scrape fails and you’re at t=39 but the t=30 scrape is still ongoing, then you’d need ~4x to see both the t=0 and t=10 samples. So a 40s rate (i.e. rate(my_counter_total[40s]) would be the minimum safe range. Usually you would round this up to 60s for a 1m rate.

图解：
【博客669】prometheus rate()选择range范围的最佳实践

3、rate选择range最佳实践总结

选择太大的范围要好好考虑。如果您采用超过一个小时的速率并对此发出警报，那么在基本条件停止之前一切都很好，然后您必须等待一个小时直到警报消失。因此，一方面，较长的范围可以更容易地发现趋势，但平均效应也会增加反应时间。
如果您确实希望获得不同范围内的平均值（并且比率基本上是平均值），请不要为每个可能的范围创建记录规则。这很浪费，会造成混乱，而且很难维护。这是因为您无法比较不同范围内的速率（例如，5m 速率和 10m 速率不能直接比较），并且您必须跟踪哪个用于何处。
使用一个范围至少是你的抓取间隔的 4 倍，在你的组织中选择一个一致的范围来记录规则，avg_over_time如果你需要一个较长时期内的平均值用于图表和警报，则使用。文章来源地址https://www.toymoban.com/news/detail-496311.html

到了这里，关于【博客669】prometheus rate()选择range范围的最佳实践的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！