Cette page n'est pas encore disponible en français, sa traduction est en cours.
Si vous avez des questions ou des retours sur notre projet de traduction actuel, n'hésitez pas à nous contacter.

To deploy a Datadog monitor, you can use the Datadog Operator and DatadogMonitor custom resource definition (CRD).

Prerequisites

Setup

  1. Create a file with the spec of your DatadogMonitor deployment configuration.

    Example:

    The following spec creates a metric monitor that alerts on the query avg(last_10m):avg:system.disk.in_use{*} by {host} > 0.5.

    datadog-metric-monitor.yaml

       apiVersion: datadoghq.com/v1alpha1
       kind: DatadogMonitor
       metadata:
         name: datadog-monitor-test
         namespace: datadog
       spec:
         query: "avg(last_10m):avg:system.disk.in_use{*} by {host} > 0.5"
         type: "metric alert"
         name: "Test monitor made from DatadogMonitor"
         message: "1-2-3 testing"
         tags:
           - "test:datadog"
         priority: 5
         controllerOptions:
           disableRequiredTags: false
         options:
           evaluationDelay: 300
           includeTags: true
           locked: false
           newGroupDelay: 300
           notifyNoData: true
           noDataTimeframe: 30
           renotifyInterval: 1440
           thresholds:
             critical: "0.5"
             warning: "0.28"
       

    See the complete list of configuration fields.

  2. Deploy your DatadogMonitor:

    kubectl apply -f /path/to/your/datadog-metric-monitor.yaml
    

Additional examples

Metric monitors

Other monitors

All available configuration fields

The following table lists all available configuration fields for the DatadogMonitor custom resource.

message
required - string
A message to include with notifications for this monitor.
name
required - string
The monitor name.
query
required - string
The monitor query.
type
required - enum
The type of the monitor.
Allowed enum values: metric alert, query alert, service check, event alert, log alert, process alert, rum alert, trace-analytics alert, slo alert, event-v2 alert, audit alert, composite
controllerOptions.disableRequiredTags
boolean
Disables the automatic addition of required tags to monitors.
priority
int64
An integer from 1 (high) to 5 (low) indicating alert severity.
restrictedRoles
[string]
A list of unique role identifiers to define which roles are allowed to edit the monitor. The unique identifiers for all roles can be pulled from the Roles API and are located in the data.id field.
tags
[string]
Tags associated to your monitor.
options
object
List of options associated with your monitor. See Options.

Options

The following fields are set in the options property.

For example:

apiVersion: datadoghq.com/v1alpha1
kind: DatadogMonitor
metadata:
  name: datadog-monitor-test
  namespace: datadog
spec:
  query: "avg(last_10m):avg:system.disk.in_use{*} by {host} > 0.5"
  type: "metric alert"
  name: "Test monitor made from DatadogMonitor"
  message: "1-2-3 testing"
  options:
    enableLogsSample: true
    thresholds:
      critical: "0.5"
      warning: "0.28"
enableLogsSample
boolean
Whether or not to send a log sample when the log monitor triggers.
escalationMessage
string
A message to include with a re-notification.
evaluationDelay
int64
Time (in seconds) to delay evaluation, as a non-negative integer. For example: if the value is set to 300 (5min), the timeframe is set to last_5m, and the time is 7:00, then the monitor evaluates data from 6:50 to 6:55. This is useful for AWS CloudWatch and other backfilled metrics to ensure the monitor always has data during evaluation.
groupRetentionDuration
string
The time span after which groups with missing data are dropped from the monitor state. The minimum value is one hour, and the maximum value is 72 hours. Example values are: 60m, 1h, and 2d. This option is only available for APM Trace Analytics, Audit Trail, CI, Error Tracking, Event, Logs, and RUM monitors.
groupbySimpleMonitor
boolean
DEPRECATED: Whether the log alert monitor triggers a single alert or multiple alerts when any group breaches a threshold. Use notifyBy instead.
includeTags
boolean
A Boolean indicating whether notifications from this monitor automatically inserts its triggering tags into the title.
locked
boolean
DEPRECATED: Whether or not the monitor is locked (only editable by creator and admins). Use restrictedRoles instead.
newGroupDelay
int64
Time (in seconds) to allow a host to boot and applications to fully start before starting the evaluation of monitor results. Should be a non-negative integer.
noDataTimeframe
int64
The number of minutes before a monitor notifies after data stops reporting. Datadog recommends at least 2x the monitor timeframe for metric alerts or 2 minutes for service checks. If omitted, 2x the evaluation timeframe is used for metric alerts, and 24 hours is used for service checks.
notificationPresetName
enum
Toggles the display of additional content sent in the monitor notification.
Allowed enum values: show_all, hide_query, hide_handles, hide_all
Default: show_all
notifyAudit
boolean
A Boolean indicating whether tagged users are notified on changes to this monitor.
notifyBy
[string]
A string indicating the granularity a monitor alerts on. Only available for monitors with groupings. For example, if you have a monitor grouped by cluster, namespace, and pod, and you set notifyBy to ["cluster"], then your monitor only notifies on each new cluster violating the alert conditions.
Tags mentioned in notifyBy must be a subset of the grouping tags in the query. For example, a query grouped by cluster and namespace cannot notify on region.
Setting notifyBy to [*] configures the monitor to notify as a simple-alert.
notifyNoData
boolean
A Boolean indicating whether this monitor notifies when data stops reporting.
Default: false.
onMissingData
enum
Controls how groups or monitors are treated if an evaluation does not return any data points. The default option results in different behavior depending on the monitor query type. For monitors using Count queries, an empty monitor evaluation is treated as 0 and is compared to the threshold conditions. For monitors using any query type other than Count, for example Gauge, Measure, or Rate, the monitor shows the last known status. This option is only available for APM Trace Analytics, Audit Trail, CI, Error Tracking, Event, Logs, and RUM monitors.
Allowed enum values: default, show_no_data, show_and_notify_no_data, resolve
renotifyInterval
int64
The number of minutes after the last notification before a monitor re-notifies on the current status. It only re-notifies if it’s not resolved.
renotifyOccurrences
int64
The number of times re-notification messages should be sent on the current status at the provided re-notification interval.
renotifyStatuses
[string]
The types of monitor statuses for which re-notification messages are sent.
If renotifyInterval is null, defaults to null.
If renotifyInterval is not null, defaults to ["Alert", "No Data"]
Values for monitor status: Alert, No Data, Warn
requireFullWindow
boolean
A Boolean indicating whether this monitor needs a full window of data before it’s evaluated. Datadog highly recommends you set this to false for sparse metrics, otherwise some evaluations are skipped.
Default: false.
schedulingOptions
object
Configuration options for scheduling:
customSchedule
object
Configuration options for the custom schedule:
recurrence
[object]
Array of custom schedule recurrences.
rrule
string
The recurrence rule in iCalendar format. For example, FREQ=MONTHLY;BYMONTHDAY=28,29,30,31;BYSETPOS=-1.
start
string
The start date of the recurrence rule defined in YYYY-MM-DDThh:mm:ss format. If omitted, the monitor creation time is used.
timezone
string
The timezone in tz database format, in which the recurrence rule is defined. For example, America/New_York or UTC.
evaluationWindow
object
Configuration options for the evaluation window. If hour_starts is set, no other fields may be set. Otherwise, day_starts and month_starts must be set together.
dayStarts
string
The time of the day at which a one day cumulative evaluation window starts. Must be defined in UTC time in HH:mm format.
hourStarts
integer
The minute of the hour at which a one hour cumulative evaluation window starts.
monthStarts
integer
The day of the month at which a one month cumulative evaluation window starts.
thresholdWindows
object
Alerting time window options:
recoveryWindow
string
Describes how long an anomalous metric must be normal before the alert recovers.
triggerWindow
string
Describes how long a metric must be anomalous before an alert triggers.
thresholds
object
List of the different monitor thresholds available:
critical
string
The monitor CRITICAL threshold.
criticalRecovery
string
The monitor CRITICAL recovery threshold.
ok
string
The monitor OK threshold.
unknown
string
The monitor UNKNOWN threshold.
warning
string
The monitor WARNING threshold.
warningRecovery
string
The monitor WARNING recovery threshold.
timeoutH
int64
The number of hours of the monitor not reporting data before it automatically resolves from a triggered state.

Further reading

Documentation, liens et articles supplémentaires utiles: