Monitor-based SLOs
Datadog の調査レポート: サーバーレスの状態 レポート: サーバーレスの状態

Monitor-based SLOs

このページは日本語には対応しておりません。随時翻訳に取り組んでいます。翻訳に関してご質問やご意見ございましたら、お気軽にご連絡ください。

Overview

Select a monitor-based source if you want to build your SLO based on existing or new Datadog monitors. For more information about monitors, see the Monitor documentation. Monitor-based SLOs are useful for a time-based stream of data where you are differentiating time of good behavior vs bad behavior. Using the sum of the good time divided by the sum of total time provides a Service Level Indicator (or SLI).

Setup

On the SLO status page, select New SLO +. Then select Monitor.

Define queries

To start, you need to be using Datadog monitors. To set up a new monitor, go to the monitor creation page and select one of the monitor types that are supported by SLOs (listed below). Search for monitors by name and click on it to add it to the source list. An example SLO on a monitor is if the latency of all user requests should be less than 250ms 99% of the time in any 30 day window. To set this up, you would:

  1. Select a single monitor or,
  2. Select multiple monitors (up to 20) or,
  3. Select a single multi-alert monitor and select specific monitor groups (up to 20) to be included in SLO calculation.

Example: You might be tracking the uptime of a physical device. You have already configured a metric monitor on host:foo using a custom metric. This monitor might also ping your on-call team if it’s no longer reachable. To avoid burnout you want to track how often this host is down.

Set your SLO targets

SLO targets are the stat you use to measure uptime success.

First select your target value, example: 95% of all HTTP requests should be "good" over the last 7 days.

You can optionally include a warning value that is greater than the target value to indicate when you are approaching an SLO breach.

Identify this indicator

Here we add contextual information about the purpose of the SLO, including any related information in the description and tags you would like to associate with the SLO.

Underlying monitor and SLO histories

Making changes to the monitor used by an SLO recalculates the SLO history. Therefore, the monitor history and SLO history may not match after a monitor update.

Datadog recommends against using monitors with Alert Recovery Threshold and Warning Recovery Threshold as they can also affect your SLO calculations and do not allow you to cleanly differentiate between a SLI’s good behavior and bad behavior.

SLO calculations do not take into account when a monitor is resolved manually or as a result of the After x hours automatically resolve this monitor from a triggered state setting. If these are important tools for your workflow, consider cloning your monitor, removing auto-resolve settings and @-notifications, and using the clone for your SLO.

Confirm you are using the preferred SLI type for your use case. Datadog supports monitor-based SLIs and metric-based SLIs as described in the SLO metric documentation.

Further Reading

お役に立つドキュメント、リンクや記事: