Monitoring all of your infrastructure in one place wouldn’t be complete without the ability to know when critical changes are occurring. Datadog gives you the ability to create monitors that actively check metrics, integration availability, network endpoints, and more.
Once a monitor is created, you are notified when its conditions are met. You can notify team members via email, 3rd party services (e.g. Pagerduty or Hipchat) or other custom endpoints via Webhooks.
Triggered monitors appear in the event stream, allowing collaboration around active issues in your applications or infrastructure. Datadog provides a high-level view of open issues on the Triggered Monitors page as well as general monitor management on the Manage Monitors page.
In this section you can:
Here is a quick overview of the different terms used:
Navigate to the Create Monitors page by hovering over Monitors in the main menu and clicking New Monitor in the sub-menu (depending on your chosen theme and screen resolution, the main menu may be at the top or on the left). You are presented with a list of monitor types on the left. See the Monitoring Reference to learn more about all monitor types.
You can export the configuration JSON for a monitor right from the create screen.
If you manage and deploy monitors programmatically, it’s easier to define the monitor in the UI and export the JSON right away:
Any changes to monitors creates an event in the event stream that explains the change and shows the user that made the actual change.
Assuming you’ve made changes to your Monitors, you can see examples with the following event search:
We also provide you with the ability to be notified on changes to a monitor you create. At the bottom of the Monitor Editor there’s an option to notify alert recipients for all changes to the monitor:
Setting the above to Notify send an email for the monitor audit events to all people who are alerted in a specific monitor.
It only makes sense in a couple cases to manually resolve your monitor:
Otherwise the monitor picks up the current state on the next evaluation.
In other words, if the value is still above/below the configured threshold then the monitor may re-trigger upon the next evaluation (in about 60 seconds).
There are multiple community projects for maintaining or managing Monitors along with some other Datadog components via the API’s: