Flaky Tests Management

문서 > Test Optimization in Datadog > Flaky Tests Management

이 페이지는 아직 한국어로 제공되지 않습니다. 번역 작업 중입니다.
현재 번역 프로젝트에 대한 질문이나 피드백이 있으신 경우 언제든지 연락주시기 바랍니다.

Overview

The Flaky Tests Management page provides a centralized view to track, triage, and remediate flaky tests across your organization. You can view every test’s status along with key impact metrics like number of pipeline failures, CI time wasted, and failure rate.

From this UI, you can act on flaky tests to mitigate their impact. Quarantine or disable problematic tests to keep known flakes from breaking builds, and create cases and Jira issues to track work toward fixes.

Overview of the Flaky Tests Management UI

Change a flaky test’s status

Use the status drop-down to change how a flaky test is handled in your CI pipeline. This can help reduce CI noise while retaining traceability and control. Available statuses are:

Status	Description
Active	The test is known to be flaky and is running in CI.
Quarantined	Keep the test running in the background, but failures don’t affect CI status or break pipelines. This is useful for isolating flaky tests without blocking merges.
Disabled	Skip the test entirely in CI. Use this when a test is no longer relevant or needs to be temporarily removed from the pipeline.
Fixed	The test has passed consistently and is no longer flaky. If supported, use the remediation flow to confirm a fix and automatically apply this status, instead of manually changing it.

Status actions have minimum version requirements for each programming language's instrumentation library. See Compatibility for details.

Configure policies to automate the flaky test lifecycle

Configure automated Flaky Test Policies to govern how flaky tests are handled in each repository. For example, a test that flakes in the default branch can automatically be quarantined, and later disabled if it remains unfixed after 30 days.

Click the Policies button at the upper right of the Flaky Management page. You can also navigate to Flaky Test Policies in Software Delivery settings.
Search for and select the repository you want to configure. This opens the Edit Policies flyout.
Use the toggles to enable specific automated actions, and use automation rules to further customize how tests get quarantined, disabled, or retried:

Action	Description
Quarantine	Toggle to allow flaky tests to be quarantined for this repository. Customize automation rules based on: Time: Quarantine a test if its status is `Active` for a specified number of days. The rule is triggered every day at 12:15 UTC. Branch: Quarantine an `Active` test if it flakes in one or more specified branches. Failure rate: Quarantine an `Active` test if its failure rate over the last 7 days is greater or equal to the specified threshold. The rule is triggered every 15 minutes.
Disable	Toggle to allow flaky tests to be disabled for this repository. You may want to do this after quarantining or to protect specific branches from flakiness. Customize automation rules based on: Status and time: Disable a test if it has a specified status for a specified number of days. The rule is triggered every day at 12:30 UTC. Branch: Disable an `Active` or `Quarantined` test if it flakes in one or more specified branches. Failure rate: Disable an `Active` or `Quarantined` test if its failure rate over the last 7 days is greater or equal to the specified threshold. The rule is triggered every 15 minutes.
Attempt to Fix	When you attempt to fix a flaky test, automatically retry the test a specified number of times on the commit containing the fix.
Fixed	If a flaky test no longer flakes for 30 days, it is automatically moved to Fixed status. This automation is default behavior and can't be customized.

Track evolution of flaky tests

Track the evolution of the number of flaky tests with the test_optimization.test_management.flaky_tests out-of-the-box metric. The metric is enriched with the tags below to help you investigate the counts in more detail.

repository_id
test_service
branch
flaky_status
test_codeowners
flaky_category

The branch tag only exists when the test has flaked in the default branch of the repository during the last 30 days. This helps you discard flaky tests that have only exhibited flakiness in feature branches, as these may not be relevant. You can configure the default branch of your repositories under Repository Settings.

Investigate a flaky test

For more information about a specific flaky test, use these options in the actions menu at the end of each row:

View Last Failed Test Run: Open the side panel with the details of the test’s most recent failed run.
View related test executions: Open the Test Optimization Explorer populated with all of the test’s recent runs.

Create cases for flaky tests

For any flaky test, you can create a case and use Case Management to track any work toward remediation. Click the Create Case button or use the actions menu at the end of the row.

Confirm fixes for flaky tests

When you fix a flaky test, Test Optimization’s remediation flow can confirm the fix by retrying the test multiple times. If successful, the test’s status is automatically updated to Fixed. To enable the remediation flow:

For the test you are fixing, click Link commit to Flaky Test fix in the Flaky Tests Management UI.
Copy the unique flaky test key that is displayed (for example, DD_ABC123).
Include the test key in your Git commit title or message for the fix (for example, git commit -m "DD_ABC123").
When Datadog detects the test key in your commit, it automatically triggers the remediation flow for that test:
- Retries any tests you’re attempting to fix 20 times.
- Runs tests even if they are marked as Disabled.
- If all retries pass, updates the test’s status to Fixed.
- If any retry fails, keeps the test’s current status (Active, Quarantined, or Disabled).

AI-powered flaky test fixes

Join the Preview!

Bits AI Dev Agent is in Preview. To sign up, click Request Access and complete the form.

Request Access

Bits AI Dev Agent can automatically diagnose and fix flaky tests that have been detected by Test Optimization. When a flaky test is identified, Bits AI analyzes the test failure patterns and generates production-ready fixes that can be submitted as GitHub pull requests.

Bits AI Dev Agent displaying a proposed fix for a flaky test

Setup

To enable AI-powered flaky test fixes, enable Bits AI Dev Agent for Test Optimization by following the setup instructions in the Bits AI Dev Agent documentation. Bits AI Dev Agent automatically create fixes for flaky tests detected by Test Optimization.

AI-powered flaky test categorization

Flaky Tests Management uses AI to automatically assign a root cause category to each flaky test based on execution patterns and error signals. This helps you filter, triage, and prioritize flaky tests more effectively.

A test must have at least one failed execution that includes both @error.message and @error.stack tags to be eligible for categorization. If the test was recently detected, categorization may take several minutes to complete.

Categories

Category	Description
Concurrency	Test that invokes multiple threads interacting in an unsafe or unanticipated manner. Flakiness is caused by, for example, race conditions resulting from implicit assumptions about the ordering of execution, leading to deadlocks in certain test runs.
Randomness	Test uses the result of a random data generator. If the test does not account for all possible cases, then the test may fail intermittently, e.g., only when the result of a random number generator is zero.
Floating Point	Test uses the result of a floating-point operation. Floating-point operations can suffer from precision over- and under-flows, non-associative addition, etc., which—if not properly accounted for—can result in inconsistent outcomes (e.g., comparing a floating-point result to an exact real value in an assertion).
Unordered Collection	Test assumes a particular iteration order for an unordered-collection object. Since no order is specified, tests that assume a fixed order will likely be flaky for various reasons (e.g., collection-class implementation).
Too Restrictive Range	Test whose assertions accept only part of the valid output range. It intermittently fails on unhandled corner cases.
Timeout	Test fails due to time limitations, either at the individual test level or as part of a suite. This includes tests that exceed their execution time limit (e.g., single test or the whole suite) and fail intermittently due to varying execution times.
Order Dependency	Test depends on a shared value or resource modified by another test. Changing the test-run order can break those dependencies and produce inconsistent outcomes.
Resource Leak	Test improperly handles an external resource (e.g., failing to release memory). Subsequent tests that reuse the resource may become flaky.
Asynchronous Wait	Test makes an asynchronous call or waits for elements to load/render and does not explicitly wait for completion (often using a fixed delay). If the call or rendering takes longer than the delay, the test fails.
IO	Test is flaky due to its handling of input/output—for example, failing when disk space runs out during a write.
Network	Test depends on network availability (e.g., querying a server). If the network is unavailable or congested, the test may fail.
Time	Test relies on system time and may be flaky due to precision or timezone discrepancies (e.g., failing when midnight passes in UTC).
Environment Dependency	Test depends on specific OS, library versions, or hardware. It may pass on one environment but fail on another, especially in cloud-CI environments where machines vary nondeterministically.
Unknown	Test is flaky for an unknown reason.

Receive notifications

Set up notifications to track changes to your flaky tests. Whenever a user or a policy changes the state of a flaky test, a message is sent to your selected recipients. You can send notifications to email addresses or Slack channels (see the Datadog Slack integration), and route messages based on test code owners. If no code owners are specified, all selected recipients are notified of all flaky test changes in the repository. Configure notification for each repository from the Flaky Test Policies page in Software Delivery settings.

Notifications are not sent immediately; they are batched every few minutes to reduce noise.

Compatibility

To use Flaky Tests Management features, you must use Datadog’s native instrumentation for your test framework. The table below outlines the minimum versions of each Datadog tracing library required to quarantine, disable, and attempt to fix flaky tests. Click a language name for setup information:

Language	Quarantine & Disable	Attempt to fix
.NET	3.13.0+	3.17.0+
Go	1.73.0+ (Orchestrion v1.3.0+)	2.2.2+ (Orchestrion v1.6.0+)
Java	1.48.0+	1.50.0+
JavaScript	5.44.0+	5.52.0+
Python	3.3.0+	3.8.0+
Ruby	1.13.0+	1.17.0+
Swift	2.6.1+	2.6.1+