---
title: Anomaly Monitor
description: Detects anomalous behavior for a metric based on historical data
breadcrumbs: Docs > Monitors > Monitor Types > Anomaly Monitor
---

# Anomaly Monitor

## Overview{% #overview %}

Anomaly detection is an algorithmic feature that identifies when a metric is behaving differently than it has in the past, taking into account trends, seasonal day-of-week, and time-of-day patterns. It is suited for metrics with strong trends and recurring patterns that are hard to monitor with threshold-based alerting.

For example, anomaly detection can help you discover when your web traffic is unusually low on a weekday afternoon—even though that same level of traffic is normal later in the evening. Or consider a metric measuring the number of logins to your steadily-growing site. Because the number increases daily, any threshold would be outdated, whereas anomaly detection can alert you if there is an unexpected drop—potentially indicating an issue with the login system.

## Monitor creation{% #monitor-creation %}

To create an [anomaly monitor](https://app.datadoghq.com/monitors/create/anomaly) in Datadog, use the main navigation: *Monitors –> New Monitor –> Anomaly*.

### Define the metric{% #define-the-metric %}

Any metric reporting to Datadog is available for monitors. For more information, see the [Metric Monitor](https://docs.datadoghq.com/monitors/types/metric/#define-the-metric) page. **Note**: The `anomalies` function uses the past to predict what is expected in the future, so using it on a new metric may yield poor results.

After defining the metric, the anomaly detection monitor provides two preview graphs in the editor:

{% image
   source="https://datadog-docs.imgix.net/images/monitors/monitor_types/anomaly/context.3928f33bab7607b6ad04a380808455bc.png?auto=format"
   alt="historical context" /%}


- The **Historical View** allows you to explore the monitored query at different time scales to better understand why data may be considered anomalous or non-anomalous.
- The **Evaluation Preview** is longer than the alerting window and provides insight on what the anomalies algorithm takes into account when calculating the bounds.

### Set alert conditions{% #set-alert-conditions %}

Trigger an alert if the values have been `above or below`, `above`, or `below` the bounds for the last `15 minutes`, `1 hour`, etc. or `custom` to set a value between 15 minutes and 2 weeks. Recover if the values are within the bounds for at least `15 minutes`, `1 hour`, etc. or `custom` to set a value between 15 minutes and 2 weeks.

{% dl %}

{% dt %}
Anomaly detection
{% /dt %}

{% dd %}
With the default option (`above or below`) a metric is considered to be anomalous if it is outside of the gray anomaly band. Optionally, you can specify whether being only `above` or `below` the bands is considered anomalous.
{% /dd %}

{% dt %}
Trigger window
{% /dt %}

{% dd %}
How much time is required for the metric to be anomalous before the alert triggers. **Note**: If the alert window is too short, you might get false alarms due to spurious noise.
{% /dd %}

{% dt %}
Recovery window
{% /dt %}

{% dd %}
The amount of time required for the metric to no longer be considered anomalous, allowing the alert to recover. It is recommended to set the **Recovery Window** to the same value as the **Trigger Window**.
{% /dd %}

{% /dl %}

**Note**: The range of accepted values for the **Recovery Window** depends on the **Trigger Window** and the **Alert Threshold** to ensure the monitor can't both satisfy the recovery and the alert condition at the same time. Example:

- `Threshold`: 50%
- `Trigger window`: 4h The range of accepted values for the recovery window is between 121 minutes (`4h*(1-0.5) +1 min = 121 minutes`) and 4 hours. Setting a recovery window below 121 minutes could lead to a 4 hour timeframe with both 50% of anomalous points and the last 120 minutes with no anomalous points.

Another example:

- `Threshold`: 80%
- `Trigger window`: 4h The range of accepted values for the recovery window is between 49 minutes (`4h*(1-0.8) +1 min = 49 minutes`) and 4 hours.

### Advanced options{% #advanced-options %}

Datadog automatically analyzes your chosen metric and sets several parameters for you. However, the options are available for you to edit under **Advanced Options**.

{% image
   source="https://datadog-docs.imgix.net/images/monitors/monitor_types/anomaly/advanced_options.f2ed43d71d50032192207fb595d9387f.png?auto=format"
   alt="The Advanced Options menu in the Anomaly monitor configuration page with the configuration set to detect anomalies 2 deviations from the predicted data using the agile algorithm with weekly seasonality, to take daylight savings into effect, and to use a rollup interval of 60 seconds" /%}

{% dl %}

{% dt %}
Deviations
{% /dt %}

{% dd %}
The width of the gray band. This is equivalent to the bounds parameter used in the [anomalies function](https://docs.datadoghq.com/dashboards/functions/algorithms/#anomalies).
{% /dd %}

{% dt %}
Algorithm
{% /dt %}

{% dd %}
The anomaly detection algorithm (`basic`, `agile`, or `robust`).
{% /dd %}

{% dt %}
Seasonality
{% /dt %}

{% dd %}
The seasonality (`hourly`, `daily`, or `weekly`) of the cycle for the `agile` or `robust` algorithm to analyze the metric.
{% /dd %}

{% dt %}
Daylight savings
{% /dt %}

{% dd %}
Available for `agile` or `robust` anomaly detection with `weekly` or `daily` seasonality. For more information, see [Anomaly Detection and Time Zones](https://docs.datadoghq.com/monitors/guide/how-to-update-anomaly-monitor-timezone/).
{% /dd %}

{% dt %}
Rollup
{% /dt %}

{% dd %}
The [rollup interval](https://docs.datadoghq.com/dashboards/functions/rollup/).
{% /dd %}

{% dt %}
Thresholds
{% /dt %}

{% dd %}
The percentage of points that need to be anomalous for alerting, warning, and recovery.
{% /dd %}

{% /dl %}

### Seasonality{% #seasonality %}

{% dl %}

{% dt %}
Hourly
{% /dt %}

{% dd %}
The algorithm expects the same minute after the hour behaves like past minutes after the hour, for example 5:15 behaves like 4:15, 3:15, etc.
{% /dd %}

{% dt %}
Daily
{% /dt %}

{% dd %}
The algorithm expects the same time today behaves like past days, for example 5pm today behaves like 5pm yesterday.
{% /dd %}

{% dt %}
Weekly
{% /dt %}

{% dd %}
The algorithm expects that a given day of the week behaves like past days of the week, for example this Tuesday behaves like past Tuesdays.
{% /dd %}

{% /dl %}

**Required data history for Anomaly Detection algorithm**: Machine learning algorithms require at least three time as much historical data time as the chosen seasonality time to compute the baseline. For example:

- *weekly* seasonality requires at least three weeks of data
- *daily* seasonality requires at least three days of data
- *hourly* seasonality requires at least three hours of data

All of the seasonal algorithms may use up to six weeks of historical data when calculating a metric's expected normal range of behavior. By using a significant amount of past data, the algorithms avoid giving too much weight to abnormal behavior that might have occurred in the recent past.

### Anomaly detection algorithms{% #anomaly-detection-algorithms %}

{% dl %}

{% dt %}
Basic
{% /dt %}

{% dd %}
Use when metrics have no repeating seasonal pattern. Basic uses a simple lagging rolling quantile computation to determine the range of expected values. It uses little data and adjusts quickly to changing conditions but has no knowledge of seasonal behavior or longer trends.
{% /dd %}

{% dt %}
Agile
{% /dt %}

{% dd %}
Use when metrics are seasonal and expected to shift. The algorithm quickly adjusts to metric level shifts. A robust version of the [SARIMA](https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average) algorithm, it incorporates the immediate past into its predictions, allowing quick updates for level shifts at the expense of being less robust to recent, long-lasting anomalies.
{% /dd %}

{% dt %}
Robust
{% /dt %}

{% dd %}
Use when seasonal metrics expected to be stable, and slow, level shifts are considered anomalies. A [seasonal-trend decomposition](https://en.wikipedia.org/wiki/Decomposition_of_time_series) algorithm, it is stable and predictions remain constant even through long-lasting anomalies at the expense of taking longer to respond to intended level shifts (for example, if the level of a metric shifts due to a code change.)
{% /dd %}

{% /dl %}

## Examples{% #examples %}

The graphs below illustrate how and when these three algorithms behave differently from one another.

#### Anomaly detection comparison for hourly seasonality{% #anomaly-detection-comparison-for-hourly-seasonality %}

In this example, `basic` successfully identifies anomalies that spike out of the normal range of values, but it does not incorporate the repeating, seasonal pattern into its predicted range of values. By contrast, `robust` and `agile` both recognize the seasonal pattern and can detect more nuanced anomalies, for example if the metric was to flat-line near its minimum value. The trend also shows an hourly pattern, so the hourly seasonality works best in this case.

{% image
   source="https://datadog-docs.imgix.net/images/monitors/monitor_types/anomaly/alg_comparison_1.4880f1bd5edec262d75c5661be057017.png?auto=format"
   alt="anomaly detection algorithm comparison with daily seasonality" /%}

#### Anomaly detection comparison for weekly seasonality{% #anomaly-detection-comparison-for-weekly-seasonality %}

In this example, the metric exhibits a sudden level shift. `Agile` adjusts more quickly to the level shift than `robust`. Also, the width of `robust`'s bounds increases to reflect greater uncertainty after the level shift; the width of `agile`'s bounds remains unchanged. `Basic` is clearly a poor fit for this scenario, where the metric exhibits a strong weekly seasonal pattern.

{% image
   source="https://datadog-docs.imgix.net/images/monitors/monitor_types/anomaly/alg_comparison_2.4bc7b11f5c188914c3e479d64b2eb7f1.png?auto=format"
   alt="anomaly detection algorithm comparison with weekly seasonality" /%}

#### Comparison of algorithm reactions to change{% #comparison-of-algorithm-reactions-to-change %}

This example shows how the algorithms react to an hour-long anomaly. `Robust` does not adjust the bounds for the anomaly in this scenario since it reacts more slowly to abrupt changes. The other algorithms start to behave as if the anomaly is the new normal. `Agile` even identifies the metric's return to its original level as an anomaly.

{% image
   source="https://datadog-docs.imgix.net/images/monitors/monitor_types/anomaly/alg_comparison_3.f2ae848ceea406b4cb737f49d3c9163d.png?auto=format"
   alt="anomaly detection algorithm comparison with hourly seasonality" /%}

#### Comparison of algorithm reactions to scale{% #comparison-of-algorithm-reactions-to-scale %}

The algorithms deal with scale differently. `Basic` and `robust` are scale-insensitive, while `agile` is not. The graphs on the left below show `agile` and `robust` mark the level-shift as being anomalous. On the right, 1000 is added to the same metric, and `agile` no longer calls out the level-shift as being anomalous whereas `robust` continues do so.

{% image
   source="https://datadog-docs.imgix.net/images/monitors/monitor_types/anomaly/alg_comparison_scale.53687cd1dba0c7079864602e4132049e.png?auto=format"
   alt="algorithm comparison scale" /%}

#### Anomaly detection comparison for new metrics{% #anomaly-detection-comparison-for-new-metrics %}

This example shows how each algorithm handles a new metric. `Robust` and `agile` does not show any bounds during the first few seasons (weekly). `Basic` starts showing bounds shortly after the metric first appears.

{% image
   source="https://datadog-docs.imgix.net/images/monitors/monitor_types/anomaly/alg_comparison_new_metric.e06d77a7e5dc254b891fe96a6e5602f9.png?auto=format"
   alt="algorithm comparison new metric" /%}

## Advanced alert conditions{% #advanced-alert-conditions %}

For detailed instructions on the advanced alert options (auto resolve, evaluation delay, etc.), see the [Monitor configuration](https://docs.datadoghq.com/monitors/configuration/#advanced-alert-conditions) page. For the metric-specific option full data window, see the [Metric monitor](https://docs.datadoghq.com/monitors/types/metric/#data-window) page.

## Notifications{% #notifications %}

For detailed instructions on the **Configure notifications and automations** section, see the [Notifications](https://docs.datadoghq.com/monitors/notify/) page.

## API{% #api %}

Customers on an enterprise plan can create anomaly detection monitors using the [create-monitor API endpoint](https://docs.datadoghq.com/api/v1/monitors/#create-a-monitor). Datadog **strongly recommends** [exporting a monitor's JSON](https://docs.datadoghq.com/monitors/status/#settings) to build the query for the API. By using the [monitor creation page](https://app.datadoghq.com/monitors/create/anomaly) in Datadog, customers benefit from the preview graph and automatic parameter tuning to help avoid a poorly configured monitor.

Anomaly monitors are managed using the [same API](https://docs.datadoghq.com/api/v1/monitors/) as other monitors. These fields are unique for anomaly monitors:

### `query`{% #query %}

The `query` property in the request body should contain a query string in the following format:

```text
avg(<query_window>):anomalies(<metric_query>, '<algorithm>', <deviations>, direction='<direction>', alert_window='<alert_window>', interval=<interval>, count_default_zero='<count_default_zero>' [, seasonality='<seasonality>']) >= <threshold>
```

{% dl %}

{% dt %}
`query_window`
{% /dt %}

{% dd %}
A time frame like `last_4h` or `last_7d`. This parameter controls the time range of data shown in notification graphs. The `query_window` determines how much historical data appears in the visualization but does not affect alert evaluation. Datadog recommends the `query_window` to be around five times the `alert_window` to provide additional context. **Note**: The `query_window` must be at least as large as the `alert_window`.
{% /dd %}

{% dt %}
`metric_query`
{% /dt %}

{% dd %}
A standard Datadog metric query (for example, `sum:trace.flask.request.hits{service:web-app}.as_count()`).
{% /dd %}

{% dt %}
`algorithm`
{% /dt %}

{% dd %}
`basic`, `agile`, or `robust`.
{% /dd %}

{% dt %}
`deviations`
{% /dt %}

{% dd %}
A positive number; controls the sensitivity of the anomaly detection.
{% /dd %}

{% dt %}
`direction`
{% /dt %}

{% dd %}
The directionality of anomalies that should trigger an alert: `above`, `below`, or `both`.
{% /dd %}

{% dt %}
`alert_window`
{% /dt %}

{% dd %}
The timeframe to be checked for anomalies (for example, `last_5m`, `last_1h`).
{% /dd %}

{% dt %}
`interval`
{% /dt %}

{% dd %}
A positive integer representing the number of seconds in the rollup interval. It should be smaller or equal to a fifth of the `alert_window` duration.
{% /dd %}

{% dt %}
`count_default_zero`
{% /dt %}

{% dd %}
Use `true` for most monitors. Set to `false` only if submitting a count metric in which the lack of a value should *not* be interpreted as a zero.
{% /dd %}

{% dt %}
`seasonality`
{% /dt %}

{% dd %}
`hourly`, `daily`, or `weekly`. Exclude this parameter when using the `basic` algorithm.
{% /dd %}

{% dt %}
`threshold`
{% /dt %}

{% dd %}
A positive number no larger than 1. The fraction of points in the `alert_window` that must be anomalous in order for a critical alert to trigger.
{% /dd %}

{% /dl %}

Below is an example query for an anomaly detection monitor, which alerts when the average Cassandra node's CPU is three standard deviations above the ordinary value over the last 5 minutes:

```text
avg(last_1h):anomalies(avg:system.cpu.system{name:cassandra}, 'basic', 3, direction='above', alert_window='last_5m', interval=20, count_default_zero='true') >= 1
```

This query uses `avg` in two places:

- `avg(last_1h)` - Aggregates anomaly data points over the query window for notification graphs
- `avg:system.cpu.system{name:cassandra}` - Aggregates the CPU metric across Cassandra nodes before anomaly detection

### `options`{% #options %}

Most of the properties under `options` in the request body are the same as for other query alerts, except for `thresholds` and `threshold_windows`.

{% dl %}

{% dt %}
`thresholds`
{% /dt %}

{% dd %}
Anomaly monitors support `critical`, `critical_recovery`, `warning`, and `warning_recovery` thresholds. Thresholds are expressed as numbers from 0 to 1, and are interpreted as the fraction of the associated window that is anomalous. For example, a `critical` threshold value of `0.9` means that a critical alert triggers when at least 90% of the points in the `trigger_window` (described below) are anomalous. Or, a `warning_recovery` value of 0 means that the monitor recovers from the warning state only when 0% of the points in the `recovery_window` are anomalous.
{% /dd %}

{% dd %}
The `critical` `threshold` should match the `threshold` used in the `query`.
{% /dd %}

{% dt %}
`threshold_windows`
{% /dt %}

{% dd %}
Anomaly monitors have a `threshold_windows` property in `options`. `threshold_windows` must include both two properties—`trigger_window` and `recovery_window`. These windows are expressed as timeframe strings, such as `last_10m` or `last_1h`. The `trigger_window` must match the `alert_window` from the `query`. The `trigger_window` is the time range which is analyzed for anomalies when evaluating whether a monitor should trigger. The `recovery_window` is the time range that analyzed for anomalies when evaluating whether a triggered monitor should recover.
{% /dd %}

{% /dl %}

A standard configuration of thresholds and threshold window looks like:

```json
"options": {
  ...
  "thresholds": {
    "critical": 1,
    "critical_recovery": 0
  },
  "threshold_windows": {
    "trigger_window": "last_30m",
    "recovery_window": "last_30m"
  }
}
```

## Troubleshooting{% #troubleshooting %}

- [Anomaly Monitor FAQ](https://docs.datadoghq.com/monitors/guide/anomaly-monitor/)
- [Update anomaly monitor timezone](https://docs.datadoghq.com/monitors/guide/how-to-update-anomaly-monitor-timezone/)
- [Contact Datadog support](https://docs.datadoghq.com/help/)

## Further Reading{% #further-reading %}

- [Monitor Arista VeloCloud SD-WAN performance with Datadog](https://www.datadoghq.com/blog/velocloud-sdwan-integration)
- [Configure your monitor notifications](https://docs.datadoghq.com/monitors/notify/)
- [Schedule a downtime to mute a monitor](https://docs.datadoghq.com/monitors/downtimes/)
- [Consult your monitor status](https://docs.datadoghq.com/monitors/status/)
- [Anomalies function](https://docs.datadoghq.com/dashboards/functions/algorithms/#anomalies)
- [Anomaly detection, predictive correlations - Using AI-assisted metrics monitoring](https://www.datadoghq.com/blog/ai-powered-metrics-monitoring/)