DatadogMonitor CRD

Docs > Surveillance des conteneurs > Datadog Operator > DatadogMonitor CRD

Cette page n'est pas encore disponible en français, sa traduction est en cours.
Si vous avez des questions ou des retours sur notre projet de traduction actuel, n'hésitez pas à nous contacter.

To deploy a Datadog monitor, you can use the Datadog Operator and DatadogMonitor custom resource definition (CRD).

Prerequisites

Setup

Create a file with the spec of your DatadogMonitor deployment configuration.

Example:

The following spec creates a metric monitor that alerts on the query avg(last_10m):avg:system.disk.in_use{*} by {host} > 0.5.

datadog-metric-monitor.yaml

   apiVersion: datadoghq.com/v1alpha1
   kind: DatadogMonitor
   metadata:
     name: datadog-monitor-test
     namespace: datadog
   spec:
     query: "avg(last_10m):avg:system.disk.in_use{*} by {host} > 0.5"
     type: "metric alert"
     name: "Test monitor made from DatadogMonitor"
     message: "1-2-3 testing"
     tags:
       - "test:datadog"
     priority: 5
     controllerOptions:
       disableRequiredTags: false
     options:
       evaluationDelay: 300
       includeTags: true
       locked: false
       newGroupDelay: 300
       notifyNoData: true
       noDataTimeframe: 30
       renotifyInterval: 1440
       thresholds:
         critical: "0.5"
         warning: "0.28"
   

See the complete list of configuration fields.

Deploy your DatadogMonitor:

kubectl apply -f /path/to/your/datadog-metric-monitor.yaml

Additional examples

Metric monitors

Other monitors

All available configuration fields

The following table lists all available configuration fields for the DatadogMonitor custom resource.

message: required - string
A message to include with notifications for this monitor.
name: required - string
The monitor name.
query: required - string
The monitor query.
type: required - enum
The type of the monitor.
Allowed enum values: metric alert, query alert, service check, event alert, log alert, process alert, rum alert, trace-analytics alert, slo alert, event-v2 alert, audit alert, composite
controllerOptions.disableRequiredTags: boolean
Disables the automatic addition of required tags to monitors.
priority: int64
An integer from 1 (high) to 5 (low) indicating alert severity.
restrictedRoles: [string]
A list of unique role identifiers to define which roles are allowed to edit the monitor. The unique identifiers for all roles can be pulled from the Roles API and are located in the data.id field.
tags: [string]
Tags associated to your monitor.
options: object
List of options associated with your monitor. See Options.

Options

The following fields are set in the options property.

For example:

apiVersion: datadoghq.com/v1alpha1
kind: DatadogMonitor
metadata:
  name: datadog-monitor-test
  namespace: datadog
spec:
  query: "avg(last_10m):avg:system.disk.in_use{*} by {host} > 0.5"
  type: "metric alert"
  name: "Test monitor made from DatadogMonitor"
  message: "1-2-3 testing"
  options:
    enableLogsSample: true
    thresholds:
      critical: "0.5"
      warning: "0.28"

enableLogsSample

boolean
Whether or not to send a log sample when the log monitor triggers.

escalationMessage

string
A message to include with a re-notification.

evaluationDelay

int64
Time (in seconds) to delay evaluation, as a non-negative integer. For example: if the value is set to 300 (5min), the timeframe is set to last_5m, and the time is 7:00, then the monitor evaluates data from 6:50 to 6:55. This is useful for AWS CloudWatch and other backfilled metrics to ensure the monitor always has data during evaluation.

groupRetentionDuration

string
The time span after which groups with missing data are dropped from the monitor state. The minimum value is one hour, and the maximum value is 72 hours. Example values are: 60m, 1h, and 2d. This option is only available for APM Trace Analytics, Audit Trail, CI, Error Tracking, Event, Logs, and RUM monitors.

groupbySimpleMonitor

boolean
DEPRECATED: Whether the log alert monitor triggers a single alert or multiple alerts when any group breaches a threshold. Use notifyBy instead.

includeTags

boolean
A Boolean indicating whether notifications from this monitor automatically inserts its triggering tags into the title.

locked

boolean
DEPRECATED: Whether or not the monitor is locked (only editable by creator and admins). Use restrictedRoles instead.

newGroupDelay

int64
Time (in seconds) to allow a host to boot and applications to fully start before starting the evaluation of monitor results. Should be a non-negative integer.

noDataTimeframe

int64
The number of minutes before a monitor notifies after data stops reporting. Datadog recommends at least 2x the monitor timeframe for metric alerts or 2 minutes for service checks. If omitted, 2x the evaluation timeframe is used for metric alerts, and 24 hours is used for service checks.

notificationPresetName

enum
Toggles the display of additional content sent in the monitor notification.
Allowed enum values: show_all, hide_query, hide_handles, hide_all
Default: show_all

notifyAudit

boolean
A Boolean indicating whether tagged users are notified on changes to this monitor.

notifyBy

[string]
A string indicating the granularity a monitor alerts on. Only available for monitors with groupings. For example, if you have a monitor grouped by cluster, namespace, and pod, and you set notifyBy to ["cluster"], then your monitor only notifies on each new cluster violating the alert conditions.
Tags mentioned in notifyBy must be a subset of the grouping tags in the query. For example, a query grouped by cluster and namespace cannot notify on region.
Setting notifyBy to [*] configures the monitor to notify as a simple-alert.

notifyNoData

boolean
A Boolean indicating whether this monitor notifies when data stops reporting.
Default: false.

onMissingData

enum
Controls how groups or monitors are treated if an evaluation does not return any data points. The default option results in different behavior depending on the monitor query type. For monitors using Count queries, an empty monitor evaluation is treated as 0 and is compared to the threshold conditions. For monitors using any query type other than Count, for example Gauge, Measure, or Rate, the monitor shows the last known status. This option is only available for APM Trace Analytics, Audit Trail, CI, Error Tracking, Event, Logs, and RUM monitors.
Allowed enum values: default, show_no_data, show_and_notify_no_data, resolve

renotifyInterval

int64
The number of minutes after the last notification before a monitor re-notifies on the current status. It only re-notifies if it’s not resolved.

renotifyOccurrences

int64
The number of times re-notification messages should be sent on the current status at the provided re-notification interval.

renotifyStatuses

[string]
The types of monitor statuses for which re-notification messages are sent.
If renotifyInterval is null, defaults to null.
If renotifyInterval is not null, defaults to ["Alert", "No Data"]
Values for monitor status: Alert, No Data, Warn

requireFullWindow

boolean
A Boolean indicating whether this monitor needs a full window of data before it’s evaluated. Datadog highly recommends you set this to false for sparse metrics, otherwise some evaluations are skipped.
Default: false.

schedulingOptions

object
Configuration options for scheduling:

customSchedule

object
Configuration options for the custom schedule:

recurrence

[object]
Array of custom schedule recurrences.

rrule: string
The recurrence rule in iCalendar format. For example, FREQ=MONTHLY;BYMONTHDAY=28,29,30,31;BYSETPOS=-1.
start: string
The start date of the recurrence rule defined in YYYY-MM-DDThh:mm:ss format. If omitted, the monitor creation time is used.
timezone: string
The timezone in tz database format, in which the recurrence rule is defined. For example, America/New_York or UTC.

evaluationWindow

object
Configuration options for the evaluation window. If hour_starts is set, no other fields may be set. Otherwise, day_starts and month_starts must be set together.

dayStarts: string
The time of the day at which a one day cumulative evaluation window starts. Must be defined in UTC time in HH:mm format.
hourStarts: integer
The minute of the hour at which a one hour cumulative evaluation window starts.
monthStarts: integer
The day of the month at which a one month cumulative evaluation window starts.

thresholdWindows

object
Alerting time window options:

recoveryWindow: string
Describes how long an anomalous metric must be normal before the alert recovers.
triggerWindow: string
Describes how long a metric must be anomalous before an alert triggers.

thresholds