Service Level Objectives
Datadog's Research Report: The State of Serverless Report: The State of Serverless

Service Level Objectives

Overview

Service Level Objectives, or SLOs, are a key part of the site reliability engineering toolkit. SLOs provide a framework for defining clear targets around application performance, which ultimately help teams provide a consistent customer experience, balance feature development with platform stability, and improve communication with internal and external users.

Setup

Use the SLO and uptime widget to track your SLOs (Service Level Objectives) and uptime on screenboards and timeboards. You can use SLO by adding a widget to a dashboard, or by going to Datadog’s Service Level Objectives status page to create new SLOs and view all existing ones. Select an existing SLO from the dropdown and display it on any dashboard.

Uptime is defined as the amount of time a monitor was in an up state (OK) compared to down state (non-OK). The status is represented in bars as green (up) and red (down). Example: ’99 % of the time latency is less than 200ms.`

You can also track success rate and event-based SLIs (Service Level Indicators). Example: 99 % of requests are successful.

Configuration

  1. On the SLO status page, select New SLO +.
  2. Define the source for your SLOs. SLO types are Event-based and Monitor-based.
  3. Set your target uptime. Available windows are: 7 days, month-to-date, 30 days (rolling), Previous Month, and 90 days (rolling). For 7 days, the widget is restricted to two decimal places. For 30 days and up, it’s restricted to two to three decimal places.
  4. Finally, give the SLO a title, describe it in more detail, add tags, and save it.

Once you have monitors set up, on the Service Level Objectives status page, you can view the overall uptime percentage only—or the overall percentage, plus the uptime for each monitor.

Edit an SLO

To edit an SLO, hover over the SLO on the right, and click the edit pencil icon.

Searching SLOs

The Service Level Objectives status page lets you run an advanced search of all SLOs so you can view, delete or edit service tags for selected SLOs in bulk. You can also clone or fully edit any individual SLO in the search results.

Advanced search lets you query SLOs by any combination of SLO attributes:

  • name and description - text search
  • time window - *, 7d, 30d, 90d
  • type - metric, monitor
  • creator
  • tags - datacenter, env, service, team, etc.

To run a search, use the checkboxes on the left and the search bar. When you check the boxes, the search bar updates with the equivalent query. Likewise, when you modify the search bar query (or write one from scratch), the checkboxes update to reflect the change. Query results update in real-time as you edit the query; there’s no ‘Search’ button to click.

To edit an individual SLO, hover over it and use the buttons to the far right in its row: Edit, Clone, Delete. To see more detail on a SLO, click its table row to visit its status page.

SLO Tags

When you create or edit an SLO, you can add tags for filtering on the SLO status page.

Overall Uptime Calculation

The overall uptime can be considered as a percentage of the time where all monitors are in the OK state. It is not the average of the aggregated monitors.

Consider the following example for 3 monitors:

Monitort1t2t3t4t5t6t7t8t9t10Uptime
Monitor 1OKOKOKOKALERTOKOKOKOKOK90%
Monitor 2OKOKOKOKOKOKOKOKALERTOK90%
Monitor 3OKOKALERTOKALERTOKOKOKOKOK80%
Overall UptimeOKOKALERTOKALERTOKOKOKALERTOK70%

This can result in the overall uptime being lower than the average of the individual uptimes.

View your SLOs

You can view, edit your SLO and its properties and see the status over time and the history of your SLO from the SLO status page.

SLO Widgets

After creating your SLO, you can use the SLO dashboard widget to visualize the status of your SLOs along with your dashboard metrics, logs and APM data. For more information about SLO Widgets, see the SLO Widgets documentation page.

Further Reading