- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
To build an SLO from new or existing Datadog monitors, create a monitor-based SLO. Using a monitor-based SLO, you can calculate the Service Level Indicator (SLI) by dividing the amount of time your system exhibits good behavior by the total time.
To create a monitor-based SLO, you need an existing Datadog monitor. To set up a new monitor, go to the monitor creation page.
Datadog monitor-based SLOs support the following monitor types:
On the SLO status page, click + New SLO. Then, select By Monitor Uptime.
In the search box, start typing the name of a monitor. A list of matching monitors appears. Click on a monitor name to add it to the source list.
Notes:
Select a target percentage, time window, and optional warning level.
The target percentage specifies the portion of time the underlying monitor(s) of the SLO should not be in the ALERT state. The time window specifies the rolling period the SLO runs its calculation.
Depending on the value of the SLI, the Datadog UI displays the SLO status in a different color:
The time window you choose changes the available precision for your monitor-based SLOs:
In the details UI for the SLO, Datadog displays two decimal places for SLOs configured with 7-day and 30-day time windows and three decimal places for SLOs configured with 90-day time windows.
The following example demonstrates why Datadog displays a limited number of decimal places for SLO calculations. A 99.999% target for a 7-day or 30-day time window results in an error budget of 6 seconds or 26 seconds, respectively. Monitors evaluate every minute, so the granularity of a monitor-based SLO is also 1 minute. Therefore, one alert would fully consume and overspend the 6 second or 26 second error budget in the previous example. In practice, teams cannot satisfy such small error budgets.
If you need finer granularity than the once a minute monitor evaluation, consider using metric-based SLOs instead.
Choose a name and extended description for your SLO. Select any tags you would like to associate with your SLO. Select Create or Create & Set Alert to save your new SLO.
Datadog calculates the overall SLO status as the uptime percentage across all monitors or monitor groups, unless specific groups have been selected:
Note: For monitor-based SLOs with groups, all groups can be displayed for any SLOs containing up to 5,000 groups. For SLOs containing more than 5,000 groups, the SLO is calculated based on all groups but no groups are displayed in the UI.
Monitor-based SLOs treat the WARN
state as OK
. The definition of an SLO requires a binary distinction between good and bad behavior. SLO calculations treat WARN
as good behavior since WARN
is not severe enough to indicate bad behavior.
Consider the following example for a monitor-based SLO containing 3 monitors. The calculation for a monitor-based SLO based on a single multi alert monitor would look similar.
Monitor | t1 | t2 | t3 | t4 | t5 | t6 | t7 | t8 | t9 | t10 | Status |
---|---|---|---|---|---|---|---|---|---|---|---|
Monitor 1 | OK | OK | OK | OK | ALERT | OK | OK | OK | OK | OK | 90% |
Monitor 2 | OK | OK | OK | OK | OK | OK | OK | OK | ALERT | OK | 90% |
Monitor 3 | OK | OK | ALERT | OK | ALERT | OK | OK | OK | OK | OK | 80% |
Overall Status | OK | OK | ALERT | OK | ALERT | OK | OK | OK | ALERT | OK | 70% |
In this example, the overall status is lower than the average of the individual statuses.
In certain cases, there is an exception to the status calculation for monitor-based SLOs that are comprised of one grouped Synthetic test. Synthetic tests have optional special alerting conditions that change the behavior of when the test enters the ALERT state and consequently impact the overall uptime:
If you change any of these conditions to something other than their defaults, the overall status for a monitor-based SLO using one Synthetic test could appear better than the aggregated statuses of the Synthetic test’s individual groups.
For more information on Synthetic test alerting conditions, see Synthetic Monitoring.
When a monitor is resolved manually or as a result of the After x hours automatically resolve this monitor from a triggered state setting, SLO calculations do not change. If these are important tools for your workflow, consider cloning your monitor, removing auto-resolve settings and @-notification
settings, and using the clone for your SLO.
Datadog recommends against using monitors with Alert Recovery Threshold
and Warning Recovery Threshold
to underlie an SLO. These settings make it difficult to cleanly differentiate between an SLI’s good behavior and bad behavior.
Muting a monitor does not affect the SLO calculation.
To exclude time periods from an SLO calculation, use the SLO status corrections feature.
When you create a metric monitor, you choose how the monitor will handle missing data. This configuration affects how a monitor-based SLO calculation interprets missing data:
Monitor configuration | SLO calculation of missing data |
---|---|
Evaluate as zero | Depends on the monitor alert threshold For instance, a threshold of > 10 would result in Uptime (since the Monitor status would be OK ), while a threshold of < 10 would result in Downtime. |
Show last known status | Keep last state of SLO |
Show NO DATA | Uptime |
Show NO DATA and notify | Downtime |
Show OK | Uptime |
When you create a service check monitor, you choose whether it sends an alert when data is missing. This configuration affects how a monitor-based SLO calculation interprets missing data. For monitors configured to ignore missing data, time periods with missing data are treated as OK (uptime) by the SLO. For monitors configured to alert on missing data, time periods with missing data are treated as ALERT (downtime) by the SLO.
If you pause a Synthetic test, the SLO removes the time period with missing data from its calculation. In the UI, these time periods are marked light gray on the SLO status bar.
SLOs based on the metric monitor types have a feature called SLO Replay that backfills SLO statuses with historical data pulled from the underlying monitors’ metrics and query configurations. When you create a new Metric Monitor and set an SLO on that new monitor, you do not have to wait a full 7, 30, or 90 days to view the SLO status. Instead, SLO Replay triggers when you create the new SLO and looks at the history of the monitor’s underlying metric and query to fill in the status.
SLO Replay also triggers when you change the underlying metric monitor’s query to correct the status based on the new monitor configuration. As a result of SLO Replay recalculating an SLO’s status history, the monitor’s status history and the SLO’s status history may not match after a monitor update.
Note: SLOs based on Synthetic tests or Service Checks do not support SLO Replay.