The Service Map for APM is here!

Alerting

Overview

Monitoring all of your infrastructure in one place wouldn’t be complete without the ability to know when critical changes are occurring. Datadog gives you the ability to create monitors that actively check metrics, integration availability, network endpoints, and more.

Once a monitor is created, you are notified when its conditions are met. You can also notify team members via email, 3rd party services (e.g. Pagerduty or Stride), or other custom endpoints via Webhooks.

Triggered monitors appear in the event stream, allowing collaboration around active issues in your applications or infrastructure. Datadog provides a high-level view of open issues on the Triggered Monitors page as well as general monitor management on the Manage Monitors page.

Monitors can be managed programmatically, refer to the Datadog API docs for detailed information on managing monitors through the API using the available libraries or cURL.

In this section you can:

Glossary

Here is a quick overview of the different terms used:

  • Status: Each check run submits a status of OK, WARNING or CRITICAL.
  • Check: Emits one or more statuses.
  • Monitor: Sends notifications based on a sequence of check statuses, metric threshold or other alerting conditions.
  • Monitor type: Log, Forecasts host, metric, integration, process, outlier, anomaly, apm, composite, network, event based, and custom. See side navigation to drill into a specific type.
  • Tags: Configurable labels that can be applied to each metric and host. See the Tagging page for more details.

Creating a Monitor

Navigate to the Create Monitors page by hovering over Monitors in the main menu and clicking New Monitor in the sub-menu (depending on your chosen theme and screen resolution, the main menu may be at the top or on the left). You are presented with a list of monitor types on the left. See the Monitoring Reference to learn more about all monitor types.

navigation

Export your monitor

Export the JSON configuration for a monitor right from the create screen, or on your monitor status page in the upper right corner. If you manage and deploy monitors programmatically, it’s easier to define the monitor in the UI and export the JSON right away:

export monitor

Auditing Monitors

Any changes to monitors creates an event in the event stream that explains the change and shows the user that made the actual change.

Assuming you’ve made changes to your Monitors, you can see examples with the following event search:

https://app.datadoghq.com/event/stream?per_page=30&query=tags:audit%20status:all

We also provide you with the ability to be notified on changes to a monitor you create. At the bottom of the Monitor Editor there’s an option to notify alert recipients for all changes to the monitor:

Monitor_Change_notifications

Setting the above to Notify send an email for the monitor audit events to all people who are alerted in a specific monitor.

Manually resolve your monitor

The monitor Resolve function is artificially switching the monitor status to OK for its next evaluation. The following monitor evaluation will be performed normally on the data the monitor is based on.

If a monitor is alerting because its current data corresponds to its ALERT state, Resolve will have the monitor follow the state switch ALERT -> OK -> ALERT. Thus, it’s not appropriate for acknowledging that you have seen the alert or telling Datadog to ignore the alert.

Manually Resolve-ing a monitor is appropriate for cases where data is reported intermittently: after triggering an alert, the monitor doesn’t receive further data so it can no longer evaluate alerting conditions and recover to the OK state. In that case the Resolve function or the Automatically resolve monitor after X hours switches the monitor back to OK state.

Typical use case: monitor based on error metrics that are not generated when there are no errors (e.g. aws.elb.httpcode_elb_5xx, or any DogStatsD counter in your code reporting an error only when there is an error)

Managing Monitors

There are multiple community projects for maintaining or managing Monitors along with some other Datadog components via the API’s: