이 페이지는 아직 한국어로 제공되지 않습니다. 번역 작업 중입니다.
현재 번역 프로젝트에 대한 질문이나 피드백이 있으신 경우
언제든지 연락주시기 바랍니다.Start a Bits AI SRE investigation
You can start a Bits AI SRE investigation from:
Manually start an investigation
Monitor alerts
You can invoke Bits on an individual monitor alert or warn event from several entry points:
Option 1: Bits AI SRE Monitors list
- Go to Bits AI SRE > Monitors > Supported.
- Click Investigate Recent Alerts and select an alert.
Option 2: Monitor status page
Navigate to the monitor status page of a Bits AI SRE-supported monitor and click Investigate with Bits AI SRE in the top-right corner.
Option 3: Monitor event side panel
In the monitor event side panel of a Bits AI SRE-supported monitor, click Investigate with Bits AI SRE.
Option 4: Slack
To use the Slack integration, connect your Slack workspace to Bits AI SRE.
In Slack, reply to a monitor notification with @Datadog Investigate this alert.
APM latency
Join the Preview!
Bits AI SRE investigations from APM latency graphs and APM Watchdog stories are in Preview.
APM latency graphs on service pages
- In Datadog, navigate to APM and open the service or resource page you want to investigate. Next to the latency graph, click Investigate.
- Click and drag your cursor over the point plot visualization to make a rectangular selection over a region that shows unusual latency to seed the analysis. Initial diagnostics on the latency issue appear, including the observed user impact, anomalous tags contributing to the issue, and recent changes. For more information, see APM Investigator.
- Click Investigate with Bits AI SRE to run a deeper investigation.
APM latency Watchdog stories
On a Watchdog APM latency story, click Investigate with Bits AI SRE.
Enable automatic investigations
In addition to manual investigations, you can configure Bits to run automatically when a monitor transitions to the alert state:
From the Bits AI SRE Monitors list
- Go to Bits AI SRE > Monitors > Supported.
- Toggle Auto-Investigate on for a single monitor, or bulk-edit multiple monitors by selecting multiple monitors, then clicking Auto-Investigate All.
For a single monitor
- Open the monitor’s status page and click Edit.
- Scroll to Configure notifications & automations and toggle Investigate with Bits AI SRE.
- Enabling automatic investigations using the Datadog API or Terraform is not supported.
- An investigation initiates when a monitor transitions to the alert state.
- Transitions to the warn or no data state, renotifications, and test notifications do not trigger automatic investigations.
Supported monitors
Bits is able to run investigations on the following monitor types:
- Metric
- Anomaly
- Forecast
- Integration
- Outlier
- Logs
- APM (
APM Metrics type only; Trace Analytics is not supported) - Synthetics (API tests only)
Best practices: Add investigation context to your monitors
Think of onboarding Bits as you would a new teammate: the more context you provide, the better it can investigate.
Include Datadog telemetry links: Add at least one helpful telemetry link in the monitor message. Think about the first place you’d normally look in Datadog when this monitor triggers. It could be a link to any of the following:
- Datadog dashboard
- Logs
- Traces
- Datadog notebook with helpful widgets
- Confluence runbook page containing Datadog telemetry links (requires a configured Confluence integration)
Bits uses these links during the Runbook steps of the initial investigation to identify potential problem areas. Because these links are user-defined, you have control over what Bits reviews, ensuring it focuses on the same data you would, and giving you the flexibility to tailor investigations to your team’s workflows. You don’t have to format the links in any particular way; plain links work.
Add service scoping: For monitors associated with a service, add a service tag to the monitor, or filter or group the monitor query by service.
For additional suggestions on how to optimize investigations, see Help Bits learn.
How Bits AI SRE investigates
Investigations happen in two phases:
Bits begins by gathering initial context on the problem and any information that might help it troubleshoot further. Depending on the starting point of the investigation, you may see one or more of the following types of step:
- Runbook: If the starting point is a monitor alert, Bits begins by parsing Datadog or Confluence links that you have added to the monitor’s message, and uses them as entry points into the investigation.
- Memory: If you have previously interacted with an investigation for the same monitor, Bits recalls any memories associated with the monitor to inform and accelerate the current investigation.
- General search: Bits automatically scans your Datadog environment to gather additional context about what’s happening around the alert.
- Trace Analysis: If the starting point is an APM latency graph, Bits automatically inspects anomalous traces to identify the specific services or tags contributing to latency hotspots.
Using the collected context, Bits builds multiple root cause hypotheses and tests them concurrently.
Bits looks at the following data sources:
- Metrics
- Traces
- Logs
- Dashboards
- Change events
- Kubernetes events
Each hypothesis ends in one of three states: validated, invalidated, or inconclusive. When a hypothesis is validated, Bits generates sub-hypotheses and repeats the same investigative process on them.
Reports
The Reports tab enables you to track the number of investigations run over time by monitor, user, service, and team. You can also track the mean time to initial findings and conclusion to assess the impact of Bits on your on-call efficiency.