Investigate Alerts

Este producto no es compatible con el sitio Datadog seleccionado. ().
Esta página aún no está disponible en español. Estamos trabajando en su traducción.
Si tienes alguna pregunta o comentario sobre nuestro actual proyecto de traducción, no dudes en ponerte en contacto con nosotros.

Get started with alert investigations

You can investigate alerts with Bits AI SRE in two ways:

  • Manually: Trigger an investigation on an individual monitor alert
  • Automatically: Configure monitors so Bits run an investigation whenever they alert

Manually start an investigation

You can manually invoke Bits on an individual monitor alert or warn event from several entry points:

Option 1: Bits AI SRE Monitors list

  1. Go to Bits AI SRE > Monitors > Ready for Bits.
  2. Click the Investigate Recent Alerts dropdown and select an alert.

Option 2: Monitor Status page

  1. For a monitor that is ready for Bits, navigate to its status page and click Investigate with Bits AI SRE in the top-right corner.
  2. Alternatively, select an alert from the event timeline and click Investigate with Bits AI SRE on the right.

Option 3: Monitor Event side panel

From the monitor event side panel, click Investigate with Bits AI SRE.

Option 4: Slack

In Slack, reply to a monitor notification with @Datadog Investigate this alert.

Enable automatic investigations

You can configure monitors so Bits runs automatically whenever they transition to the alert state:

Option 1: Bits AI SRE Monitors list

  1. Go to Bits AI SRE > Monitors > Ready for Bits.
  2. Toggle Enable under Automatic investigations for a single monitor, or bulk-edit multiple monitors by selecting a set of monitors, followed by Edit automatic investigations.

Option 2: Configure for a single monitor

  1. Open the monitor’s status page and click Edit.
  2. Scroll to Configure notifications & automations and toggle Investigate with Bits AI SRE.

Note: Enabling automatic investigations using the Datadog API or Terraform is not supported.

An investigation initiates when a monitor transitions to the alert state. Transitions to the warn or no data state, renotifications, and test notifications do not trigger automatic investigations.

Supported monitors

Note: Prior to general availability, monitor requirements may change.

Bits is able to run investigations on the following monitor types:

  • Metric
  • Anomaly
  • Forecast
  • Integration
  • Outlier
  • Logs
  • APM (APM Metrics type only)

Best practices: Add investigation context to your monitors

Think of onboarding Bits as you would a new teammate: the more context you provide, the better it can investigate.

  1. Include telemetry links: Add at least one helpful telemetry link in the monitor message. This link could be a Datadog dashboard, logs query, trace query, a Datadog notebook with helpful widgets, or a Confluence runbook page containing these links. Think about the first place you’d normally look in Datadog when this monitor triggers. Bits uses these links during the “Executing Runbook” step of the initial investigation to identify potential problem areas. Because these links are user-defined, you have control over what Bits reviews; ensuring it focuses on the same data you would, and giving you the flexibility to tailor investigations to your team’s workflows.

  2. Add service scoping: For monitors associated with a service, add a service tag to the monitor, or filter or group the monitor query by service.

    Example monitor with optimization steps applied

For additional suggestions on how to optimize investigations, see the section on Memories.

Configure where investigation findings are sent

By default, Bits’ investigation findings appear in two places:

  • Full investigation findings are available on the Bits AI Investigations page.
  • A summary of the findings is available on the status page for the monitor.

Additionally, if you have already configured @slack, @case, or @oncall notifications in your monitor, Bits automatically writes to those places. If not, you can add them as destinations for investigation findings to appear:

Slack

  1. Ensure the Datadog Slack app is installed in your Slack workspace.
  2. In your monitor, go to Configure notifications and automations and add the @slack-{channel-name} handle. This sends monitor notifications to your chosen Slack channel.
  3. Lastly, go to Bits AI SRE > Settings > Integrations and connect your Slack workspace. This allows Bits to write its findings directly under the monitor notification in Slack. Note: Each Slack workspace can only be connected to one Datadog organization.

Case Management

In the Configure notifications and automations section, add the @case-{project-name} handle. Case Management also supports optional two-way syncing with ticketing platforms like Jira and ServiceNow.

On-Call

In the Configure notifications and automations section, add the @oncall-{team} handle. Bits’ findings appear on the On-Call page in the Datadog mobile app, helping your teams triage issues on the go.

Configure knowledge base integrations

Bits integrates with Confluence to find relevant documentation and runbooks to support its investigations, and also allows you to interact with your Confluence content directly through chat.

  1. Connect your Confluence Cloud account by following the instructions in the Confluence integration tile.
  2. Optionally, enable account crawling to make Confluence a data source within Bits’ chat interface. This is not required for Bits to use Confluence when generating its investigation plan.
  3. You can view all connected Confluence accounts on the Bits Settings page.

Best practices: Optimize Bits’ understanding of your knowledge

Help Bits interpret and act on your documentation by following these best practices:

  • Include relevant Datadog telemetry links in your Confluence pages. Bits queries these links to extract information for its investigation.
  • Provide clear, step-by-step instructions for resolving monitor issues. Bits follows these instructions precisely, so being specific leads to more accurate outcomes.
  • Document the services or systems involved in detail. Bits uses this information to understand the environment and guide investigations effectively.

Tip: The more precisely your Confluence page matches the issue at hand, the more helpful Bits can be.

Configure permissions

There are two RBAC permissions that apply to Bits AI SRE:

NameDescriptionDefault Role
Bits Investigations Read (bits_investigations_read)Read Bits investigations.Datadog Read Only Role
Bits Investigations Write (bits_investigations_write)Run and configure Bits investigations.Datadog Standard Role

These permissions are added by default to Managed Roles. If your organization uses Custom Roles or have previously modified the default roles, an admin with the User Access Manage permission will need to manually add the permission to the appropriate roles. For details, see Access Control.

How Bits AI SRE investigates

Investigations happen in two phases:

  1. Initial context gathering
    • Bits begins by identifying any Datadog or Confluence links that you have added to the monitor’s message and uses them as entry points into the investigation.
    • It also automatically queries your Datadog environment to gather additional context about what’s happening around the alert.
    • If you have previously interacted with an investigation for the same monitor, Bits will recall any memories associated with the monitor to inform and accelerate the current investigation.
  2. Root cause hypothesis generation and testing
    • Using the gathered context, Bits performs a more thorough investigation by building multiple root cause hypotheses and testing them in parallel. Today, Bits is able to query:
    • Hypotheses can end in one of three states: validated, invalidated, or inconclusive.

Chat with Bits AI SRE about the investigation

On the Bits AI Investigations page, you can chat with Bits to gather additional information about the investigation or the services involved. Click the Suggested replies bubble for examples.

FunctionalityExample promptsData source
Understand the status of its investigationWhat's the latest status of the investigation?Investigation findings
Ask for elaborations of its findingsTell me more about the {issue}.Investigation findings
Look up information about a serviceAre there any ongoing incidents for {example-service}?Software Catalog service definitions
Find recent changes for a serviceWere there any recent changes on {example-service}?Change Tracking events
Find a dashboardGive me the {example-service} dashboard.Dashboards
Query APM request, error, and duration metricsWhat's the current error rate for {example-service}?APM metrics

Help Bits AI SRE learn

Reviewing Bits’ findings not only validates their accuracy, but also helps Bits learn from any mistakes it makes, enabling it to produce faster and more accurate investigations in the future.

During the investigation

You can guide Bits’ learning by:

  • Improving a step: Share a link to a better query Bits should have made.
  • Remembering a step: Tell Bits to remember any helpful queries it generated. This instructs Bits to prioritize running these queries the next time the same monitor fires.

After the investigation

At the end of an investigation, let Bits know if the conclusion it made was correct or not. If it was inaccurate, provide Bits with the correct root cause so that it can learn from the discrepancy.

An investigation conclusion with buttons to rate the conclusion helpful or unhelpful highlighted

Manage memories

Every piece of feedback you give generates a memory. Bits uses these memories to enhance future investigations by recalling relevant patterns, queries, and corrections. You can navigate to the Monitor Management page to view and delete memories in the Memories column.