This page is not yet available in Spanish. We are working on its translation.
If you have any questions or feedback about our current translation project, feel free to reach out to us!

This page explains how to redrive executions directly from Datadog to continue failed AWS Step Functions from the point of failure without a state machine restart.

A visualization of a failed Step Function execution.

Enable redrive within Datadog

To enable using redrive within Datadog, configure an AWS Connection with Datadog App Builder. Ensure that your IAM roles include permissions that allow executing a Step Function for the retry action (StartExecution) or redriving a Step Function for the redrive action (RedriveExecution).

Usage

To take action on a Step Function in Datadog:

  1. Go to the Step Functions page.
  2. Find the Step Function you wish to redrive.
  3. Open this Step Function’s side panel. On the Executions tab, locate the failed execution you wish to redrive.
  4. Click on the Failed pill to open a redrive modal.
  5. Click the Redrive button.

Tracing redrives

When monitoring redriven executions, use the Waterfall view, as the large gap between the original execution and redrive can make the Flame Graph view imperceptible.

Troubleshooting missing redrive traces

If a redrive is triggered within one minute of the original execution’s failure, its corresponding trace may not appear.

Also, a redrive may not always share the same sampling decision as the original execution. To ensure that the redriven execution is also sampled, you can reference the @redrive:true span tag in a retention query.

Further Reading