Monitoring all of your infrastructure in one place wouldn’t be complete without the ability to know when critical changes are occurring. Datadog gives you the ability to create monitors that actively check metrics, integration availability, network endpoints, and more.
Once a monitor is created, you are notified when its conditions are met. You can also notify team members via email, 3rd party services (e.g. Pagerduty), or other custom endpoints via Webhooks.
Triggered monitors appear in the event stream, allowing collaboration around active issues in your applications or infrastructure. Datadog provides a high-level view of open issues on the Triggered Monitors page as well as general monitor management on the Manage Monitors page.
In this section you can:
Here is a quick overview of the different terms used:
|Status||Each check run submits a status of OK, WARNING or CRITICAL.|
|Check||Emits one or more statuses.|
|Monitor||Sends notifications based on a sequence of check statuses, metric threshold or other alerting conditions.|
|Monitor type||The different monitor types available: log, forecast host, metric, integration, process, outlier, anomaly, apm, composite, network, event, and custom.|
|Tags||Configurable labels that can be applied to each metric and host. See the Tagging page for more details.|
Navigate to the Create Monitors page by hovering over Monitors in the main menu and clicking New Monitor in the sub-menu (depending on your chosen theme and screen resolution, the main menu may be at the top or on the left). You are presented with a list of monitor types on the left. See the Monitoring Reference to learn more about all monitor types.
Export the JSON configuration for a monitor right from the create screen, or on your monitor status page in the upper right corner. If you manage and deploy monitors programmatically, it’s easier to define the monitor in the UI and export the JSON right away:
Any changes to monitors creates an event in the event stream that explains the change and shows the user that made the actual change.
Assuming you’ve made changes to your monitors, you can see examples with the following event search:
Datadog also provides the ability to be notified on changes to a monitor you create. At the bottom of the monitor editor, there’s an option to notify alert recipients for all changes to the monitor:
Setting the above to Notify sends an email for the monitor audit events to all people who are alerted in a specific monitor.
The monitor Resolve function is artificially switching the monitor status to
OK for its next evaluation. The following monitor evaluation is performed normally on the data the monitor is based on.
If a monitor is alerting because its current data corresponds to its
ALERT state, Resolve has the monitor follow the state switch
ALERT -> OK -> ALERT. Thus, it’s not appropriate for acknowledging that you have seen the alert or telling Datadog to ignore the alert.
Manually Resolve-ing a monitor is appropriate for cases where data is reported intermittently: after triggering an alert, the monitor doesn’t receive further data so it can no longer evaluate alerting conditions and recover to the
OK state. In that case the Resolve function or the Automatically resolve monitor after X hours switches the monitor back to
Typical use case: monitor based on error metrics that are not generated when there are no errors (e.g.
aws.elb.httpcode_elb_5xx, or any DogStatsD counter in your code reporting an error only when there is an error)