This product is not supported for your selected
Datadog site. (
).
Join the Preview!
Change Failure Detection is in Preview.
Overview
Datadog Change Failure Detection automatically identifies deployments that remediate previously failed deployments. By connecting deployment data with failure events, it provides a complete view of delivery performance, helping teams balance release velocity with operational stability.
A change failure is a deployment that causes issues in production and requires remediation. Change failures are used to calculate the following metrics:
- Change Failure Rate
: The percentage of deployments causing a failure in production, calculated as the following:
$$\text"Change Failure Rate" = \text"Number of change failures" / (\text"Total deployments" - \text"Rollback deployments")$$
- Failed Deployment Recovery Time
: The median duration between a failed deployment and its remediation, either through a rollback or rollforward deployment.
Change Failure Detection identifies two types of remediation deployments:
- Rollbacks: Automatically detected when a previously deployed version is redeployed
- Rollforwards: Detected through custom rules that match metadata patterns (such as revert PRs and hotfix labels)
Rollbacks
A rollback occurs when a previously deployed version is redeployed to restore the system after a failed or faulty change.
How rollback classification works
A deployment is classified as a rollback when it deploys a version that matches a previously deployed version but differs from the immediately preceding deployment.
- If Git metadata is present, the match is based on the commit SHA.
- If Git metadata is not present, the match is based on the version tag.
When a rollback is detected, the change failure is the first deployment after the rollback target (the version you reverted to).
Example: Rollback detection
For the sequence V1 → V2 → V3 → V1, the rollback target is the original V1, so V2 is marked as the change failure and V1 as a rollback deployment.
Note: Redeploying the same version back‑to‑back (for example, V1 → V1) is not considered a rollback.
Rollforwards
A rollforward occurs when a new deployment is made to fix or override a failed or faulty change. Unlike rollbacks (which redeploy a previous version), rollforwards deploy new code to remediate issues. This can include revert pull requests that restore previous behavior through a new release.
Rollforwards are detected through custom rules that match deployment metadata patterns. Custom rules are configured in the DORA Settings page.
Custom rules
You can define custom rules to automatically classify rollforward deployments based on repository or release metadata. Rules can operate in two ways:
- Linking deployments: Match deployments through shared variable values (for example, PR number or version)
- Static patterns: Match metadata patterns without variables (for example, labels or branch names)
Rules linked to failed deployments
Use these rules to identify rollforward deployments that should be linked to a specific earlier failed deployment. These rules use regular expression (regex) patterns with variables to match deployments through shared references.
You can enter regex rules that include one of these variables:
| Variable | Description |
|---|
$pr_title | Matches PR titles |
$pr_number | Matches PR numbers |
$version | Matches version tags |
How variable-based classification works
When a rule matches a deployment, the following actions occur:
- The variable value is extracted from the current deployment.
- The system finds the earlier deployment with the same extracted value.
- The current deployment is marked as a rollforward linked to that earlier deployment.
- The earlier deployment is marked as the change failure.
These rules work best when the failed deployment can be identified by a shared commit SHA, version tag, or PR reference.
Example: Revert pull requests
Revert pull requests are a common recovery pattern. For example, a PR titled Revert "Add feature X" references the original PR.
When a PR title matches this pattern, the following actions occur:
- The system extracts the original PR title from the revert PR (the value of
$pr_title). - It finds the earlier deployment that includes that original PR title.
- The current deployment (with the revert) is marked as the rollforward.
- The earlier deployment is marked as the change failure.
Note: If the original PR isn’t found in any prior deployment, or if both the original PR and its revert are in the same deployment, no classification is applied.
Static rules
Static rules classify rollforward deployments based on metadata patterns without using variables. These rules match broad indicators of remediation.
You can define regex rules that match specific types of metadata. The following table shows some example patterns you can use, but you may adjust them to fit your processes:
| Metadata Type | Example Regex Pattern | Description |
|---|
| PR title | .*rollforward.* | Matches PR titles containing rollforward |
| PR label | .*hotfix.* | Matches PR labels containing hotfix |
| PR branch name | recovery/.* | Matches branch names starting with recovery/ |
| Commit message | ^Revert ".*"$ | Matches commit messages starting with Revert and ending with " |
| Version tag | .*_hotfix | Matches version tags ending with _hotfix |
How static rule classification works
When a static rule matches a deployment, the following actions occur:
- The current deployment is marked as a rollforward.
- The immediately preceding deployment is marked as the change failure.
Use static rules for broad remediation indicators like hotfix labels, branch prefixes, or version tag conventions.
Default rules
Datadog provides default rules that are automatically enabled:
- Revert PRs: PR titles following revert naming conventions (for example, “Revert” referencing a prior PR) are treated as rollforwards. The earlier deployment containing the original change is marked as the change failure, using the variable-based linking rules described above.
- Hotfix indicators: PR labels, titles, or branch names containing “hotfix” are treated as rollforwards, with the preceding deployment marked as the change failure.
These default rules are fully configurable in the DORA metrics settings page. They are intended as opinionated starting points that interpret common signals as likely rollforward activity. You should adapt the patterns (such as naming conventions, labels, or version tags) as needed to reflect your own workflows and improve accuracy over time.
Further reading
Additional helpful documentation, links, and articles: