Getting Started with Monitors

Overview

A metric monitor provides alerts and notifications if a specific metric is above or below a certain threshold. This page provides instructions for setting up a metric monitor to alert on low disk space.

Prerequisites

Before getting started, you need a Datadog account linked to a host with the Datadog Agent installed. To verify, check your Infrastructure List in Datadog.

Setup

To create a metric monitor in Datadog, use the main navigation: Monitors –> New Monitor –> Metric.

Choose the detection method

When you create a metric monitor, Threshold Alert is automatically selected as the detection method. A threshold alert compares metric values against user-defined thresholds. The goal for this monitor is to alert on a static threshold, so no change is necessary.

Define the metric

To get an alert on low disk space, use the system.disk.in_use metric from the Disk integration and average the metric over host and device:

alert setup

After this is set, the monitor automatically updates to a Multi Alert that triggers a separate alert for each host, device reporting your metric.

Set alert conditions

According to the Disk integration documentation, system.disk.in_use is the amount of disk space in use as a fraction of the total. So, when this metric is reporting a value of 0.7, the device is 70% full.

To alert on low disk space, the monitor should trigger when the metric is above the threshold. The threshold values are based on your preference. For this metric, values between 0 and 1 are appropriate:

Metric monitor configuration settings within the Create Monitor page, with Multi Alert selected and configured to alert on the average of the query over the last 5 minutes for each host and device reporting the metric. The Set alert conditions section is configured to trigger when the evaluated value is above the threshold for any host or device, with the Alert threshold set at 0.9, the Warning threshold set at 0.8, and the monitor configured not to notify if data is missing

For this example, the other settings in this section are left on the defaults. For more details, see the Metric Monitors documentation.

Say what’s happening

Before a monitor can be saved, it must have a title and message.

Title

The title must be unique for each monitor. Since this is a multi alert monitor, names are available for each group element (host and device) with message template variables:

Disk space is low on {{device.name}} / {{host.name}}

Message

Use the message to tell your team how to resolve the issue, for example:

Steps to free up disk space:
1. Remove unused packages
2. Clear APT cache
3. Uninstall unnecessary applications
4. Remove duplicate files

For different messages based on alert vs. warning thresholds, see the Notification documentation.

Notify your team

Use this section to send notifications to your team through Email, Slack, PagerDuty, etc. You can search for team members and connected accounts with the dropdown box. When an @notification is added to this box, the notification is automatically added to the message box:

Message and Notifications

Removing the @notification from either section removes it from both sections.

Permissions

RBAC Restricted Monitor

Use this option to restrict the editing of your monitor to its creator and to specific roles in your org. For more information about roles, see Role Based Access Control.

View Monitors and Triage Alerts on Mobile

You can view Monitor Saved Views from your mobile home screen or view and mute monitors by downloading the Datadog Mobile App, available on the Apple App Store and Google Play Store. This helps with triaging when you are away from your laptop or desktop.

Incidents on Mobile App

Further Reading