SLO checklist

SLO checklist

Click here to find a PDF version of this page.

Getting started

  1. Navigate to the SLO page: Monitors › Service Level Objectives

  2. Start thinking from the perspective of your user:

    • How are your users interacting with your application?
    • What is their journey through the application?
    • Which parts of your infrastructure do these journeys interact with?
    • What are they expecting from your systems and what are they hoping to accomplish?

Select the relevant SLI(s)

STEP 1

Response/Request

AvailabilityCould the server respond to the request successfully?
LatencyHow long did it take for the server to respond to the request?
ThroughputHow many requests can be handled?

Storage

AvailabilityCan the data be accessed on demand?
LatencyHow long does it take to read or write data?
DurabilityIs the data still there when it is needed?

Pipeline

CorrectnessWas the right data returned?
FreshnessHow long does it take for new data or processed results to appear?

STEP 2

Do you require a time-based or count-based SLI?

Time-based SLIs use Datadog monitors:

Example: the latency of all user requests should be less than 250 ms 99% of the time in any 30-day window.

  1. Select a single monitor,
  2. Select multiple monitors (up to 20), or
  3. Select a single multi-alert monitor and pick specific monitor groups (up to 20) to include in the SLO calculation

If you need to create a new monitor go to the Monitor create page.

Count-based SLIs use metrics in your Datadog account and do not require a monitor:

Example: 99% of requests should complete in less than 250 ms over a 30-day window.

Implementing your SLIs

  1. Custom metrics (e.g., counters)
  2. Integration metrics (e.g., load balancer, http requests)
  3. Datadog APM (e.g., errors, latency on services and resources)
  4. Datadog Logs (e.g., metrics generated from logs for a count of particular occurence)

Set your target objective and time window

  1. Select your target: 99%, 99.5%, 99.9%, 99.95%, or whatever makes sense for your requirements.
  2. Select your time window: over the last 7, 30, or 90 days

Name, describe, and tag your SLOs

  1. Name your SLO.
  2. Add a description: describe what the SLO is tracking and why it is important for your end user experience. You can also add links to dashboards for reference.
  3. Add tags: tagging by team and service is a common practice.

Use tags to search for your SLOs from the SLO list view.

Further Reading