Bits AI Kubernetes Remediation

Bits AI Kubernetes Remediation analyzes and fixes Kubernetes errors in your infrastructure.

The following Kubernetes errors are supported:

  • CrashLoopBackOff
  • ErrImagePull
  • ImagePullBackOff
  • OOMKilled
  • CreateContainerError
  • CreateContainerConfigError

Usage

You can launch Bits AI Kubernetes Remediation from multiple locations within Datadog:

  • From a Kubernetes monitor: In the Troubleshooting section, select a workload under Problematic Workloads.
  • From Kubernetes Explorer: Hover over a pod status with an error to see more information about the alert and the affected workload(s), and click Start Remediation.
  • From the Kubernetes Remediation tab: Select a workload from the list.

Any one of these actions opens a Remediation side panel that displays:

  • An AI-powered explanation for root cause, based on collected telemetry and known patterns
  • Recommended next steps, which you may be able to perform directly from Datadog
  • Related information on an adjustable timeframe: recent deployments, error logs, Kubernetes events, etc., including relevant metrics based on specific issue type
Remediation side panel opened for a workload with a CrashLoopBackOff error. Displays a What Happened section with a Bits AI-powered explanation of the error's root cause. Below, a Recommended Next Steps section where the user can inspect the workload manifest. Step-by-step instructions for a suggested fix are also displayed.

Remediate from Datadog

Join the Preview!

Automated fixes from Bits AI Kubernetes Remediation is in Preview. To sign up, click Request Access and complete the form.

Request Access

If your repositories are connected to Datadog, and an error can be fixed by changing code in one of these connected repositories, then you can use Bits AI to perform the remediation action directly from Datadog. For other problem scenarios, Bits AI provides a detailed list of remediation steps to follow.

Example: Increasing memory limit for a deployment

When a pod is terminated because the memory usage exceeded its limit, you may be able to fix the error by increasing your container’s memory limit.

  1. Click Edit Memory Limit.
  2. Adjust your limit so that it is higher than what your container normally uses.
  3. Click Fix with Bits AI.
  4. On the next page, select the repository where your deployment is defined, and review the proposed changes. Click Fix with Bits to create a pull request.
  5. You are redirected to a Bits Code Session, where you can verify that the Bits AI Dev Agent identified the specific configuration file where your memory limits are defined. Click Create Pull Request to initiate the creation of the pull request.
  6. Click View Pull Request to view the pull request in GitHub.

Further reading

Additional helpful documentation, links, and articles: