Troubleshooting Serverless Monitoring for AWS Step Functions

I cannot see any traces

Verify that your Step Function is configured to send all logs

  • Ensure that the DD_TRACE_ENABLED environment variable is set to true.
  • In your AWS console, open your Step Function’s logging tab. Ensure that Log level is set to ALL, and that Include execution data is selected.
  • Ensure that the CloudWatch log group (also found on the logging tab) has a subscription filter to the Datadog Lambda Forwarder in the same region.

Verify that logs are forwarded successfully to Datadog

  • Check the Datadog Lambda Forwarder for error messages. Ensure that you have correctly set your API key and Datadog site.
  • Enable DEBUG logs on the Datadog Lambda Forwarder by setting the environment variable DD_LOG_LEVEL to debug.

In Datadog, go to Logs > Live Tail. Search for source:stepfunction. You may need to trigger the state machine a few times.

Search historic logs

To enable searching historic logs, add a temporary index to the forwarded logs. In Datadog, go to Logs > Configuration and then open the Indexes tab. Click the New Index button in the upper right.

Choose a name, set the index filter to Source:stepfunction, leave everything else with default values, and save.

New Log index

If your organization has an existing all-encompassing index with a low limit, place your new index at the top.

Note: Indexing logs is not a requirement for getting traces and may incur additional cost. If you are troubleshooting a specific issue, you may wish to temporarily send logs to an index, debug, and delete the index afterwards. See Indexes for more information.

Lambda traces are not merging with Step Function traces

  • Verify that you can see both Lambda traces and Step Function traces in Datadog.
  • Verify that you are using Python layer v75+ or Node.js layer v94+.
  • In your AWS console, open your Step Function and ensure that your state machine has "Payload.$": "States.JsonMerge($$, $, false)" on the Lambda steps.
  • Execute your Step Function once and verify that the TaskScheduled event log of the Lambda step has the payload containing data from the Step Function context object.

I can see the aws.stepfunctions root span but I cannot see any step spans

Please enable the Include execution data option on the state machine’s logging. After enabling this option, log execution input, data passed between states, and execution output is logged. The Datadog backend uses the logs to construct these step spans for you.

Some step spans are missing in the traces

  • For actions, we support basic actions of Lambda and DynamoDB. For example, Lambda Invoke, DynamoDB GetItem, DynamoDB PutItem, DynamoDB UpdateItem and more.
  • For different flows, we do not support Wait, Choice, Map, Success, Fail, and Pass. For Parallel flow, you would be able to see parallel executing spans stacking on top of each other, but there will be no Parallel spans showing on the flame graph.

Notes

Lambda steps that use the legacy Lambda API cannot be merged. If your Lambda step’s definition is "Resource": "<Lambda function ARN>" instead of "Resource": "arn:aws:states:::lambda:invoke", then your step is using the legacy Lambda API.

If your Lambda has the DD_TRACE_EXTRACTOR environment variable set, its traces cannot be merged.