as_count() in Monitor Evaluations
Overview
Queries using as_count() and as_rate() modifiers are calculated in ways that can yield different results in monitor evaluations. Monitors involving arithmetic and at least 1 as_count() modifier use a separate evaluation path that changes the order in which arithmetic and time aggregation are performed.
Error rate example
Suppose you want to monitor an error rate over 5 minutes using the metrics, requests.error and requests.total. Consider a single evaluation performed with these aligned timeseries points for the 5 min timeframe:
Numerator: sum:requests.error{*}
| Timestamp | Value |
|:--------------------|:------|
| 2018-03-13 11:00:30 | 1 |
| 2018-03-13 11:01:30 | 2 |
| 2018-03-13 11:02:40 | 3 |
| 2018-03-13 11:03:30 | 4 |
| 2018-03-13 11:04:40 | 5 |
Denominator: sum:requests.total{*}
| Timestamp | Value |
|:--------------------|:------|
| 2018-03-13 11:00:30 | 10 |
| 2018-03-13 11:01:30 | 10 |
| 2018-03-13 11:02:40 | 10 |
| 2018-03-13 11:03:30 | 10 |
| 2018-03-13 11:04:40 | 10 |
2 ways to calculate
Refer to this query as classic_eval_path:
sum(last_5m): sum:requests.error{*}.as_rate() / sum:requests.total{*}.as_rate()
and this query as as_count_eval_path:
sum(last_5m): sum:requests.error{*}.as_count() / sum:requests.total{*}.as_count()
Compare the result of the evaluation depending on the path:
| Path | Behavior | Expanded expression | Result |
|---|
classic_eval_path | Aggregation function applied after division | (1/10 + 2/10 + 3/10 + 4/10 + 5/10) | 1.5 |
as_count_eval_path | Aggregation function applied before division | (1+2+3+4+5) / (10+10+10+10+10) | 0.3 |
Note that both evaluations above are mathematically correct. Choose a method that suits your intentions.
It may be helpful visualize the classic_eval_path as:
and the as_count_eval_path as:
sum(last_5m):error
-----------------
sum(last_5m):total
In general, avg time aggregation with .as_rate() is reasonable, but sum aggregation with .as_count() is recommended for error rates. Aggregation methods other than sum do not make sense to use with (and cannot be used with) .as_count().
Reach out to the Datadog support team if you have any questions.