---
title: AWS Step Functions
description: >-
  Coordinate the components of distributed applications and microservices using
  visual workflows.
breadcrumbs: Docs > Integrations > AWS Step Functions
---

# AWS Step Functions

## Overview{% #overview %}

AWS Step Functions enables you to coordinate the components of distributed applications and microservices using visual workflows.

This integration enables you to see basic AWS Step Functions metrics in Datadog. For tracing and enhanced metrics, see [Datadog Serverless Monitoring for AWS Step Functions](https://docs.datadoghq.com/serverless/step_functions).

## Setup{% #setup %}

### Installation{% #installation %}

If you haven't already, set up the [Amazon Web Services integration](https://docs.datadoghq.com/integrations/amazon_web_services/) first. Then, add the following permissions to the policy document for your AWS/Datadog Role:

```text
states:ListStateMachines,
states:DescribeStateMachine
```

### Metric collection{% #metric-collection %}

1. In the [AWS integration page](https://app.datadoghq.com/integrations/amazon-web-services), ensure that `States` is enabled under the `Metric Collection` tab. If your state machines use AWS Lambda, also ensure that `Lambda` is checked.
1. Install the [Datadog - AWS Step Functions integration](https://app.datadoghq.com/integrations/amazon-step-functions).

#### Augmenting AWS Lambda metrics{% #augmenting-aws-lambda-metrics %}

If your Step Functions states are Lambda functions, installing this integration adds additional [tags](https://docs.datadoghq.com/getting_started/tagging/) `statemachinename`, `statemachinearn`, and `stepname` to your Lambda metrics. This lets you see which state machines your Lambda functions belong to, and you can visualize this on the [Serverless page](https://docs.datadoghq.com/serverless/).

### Enhanced metric collection{% #enhanced-metric-collection %}

Datadog can also generate [enhanced metrics](https://docs.datadoghq.com/serverless/step_functions/enhanced-metrics) for your Step Functions to help you track the average or p99 of individual step durations. To make use of these enhanced metrics, see [Datadog Serverless Monitoring for AWS Step Functions](https://docs.datadoghq.com/serverless/step_functions).

### Log collection{% #log-collection %}

**Note**: Datadog's [automatic trigger setup](https://docs.datadoghq.com/logs/guide/send-aws-services-logs-with-the-datadog-lambda-function/?tab=awsconsole#automatically-set-up-triggers) is available for CloudWatch log groups only. For S3 buckets, use the [manual trigger setup](https://docs.datadoghq.com/logs/guide/send-aws-services-logs-with-the-datadog-lambda-function/#collecting-logs-from-s3-buckets).

1. Configure AWS Step Functions to [send logs to CloudWatch](https://docs.aws.amazon.com/step-functions/latest/dg/cw-logs.html). **Note**: Use the default CloudWatch log group prefix `/aws/vendedlogs/states` for Datadog to identify the source of the logs and parse them automatically.
1. [Send the logs to Datadog](https://docs.datadoghq.com/integrations/amazon_web_services/#log-collection).

### Trace collection{% #trace-collection %}

You can enable trace collection in two ways: through [Datadog APM for AWS Step Functions](https://docs.datadoghq.com/serverless/step_functions), or through AWS X-Ray.

#### Enable tracing through Datadog APM for AWS Step Functions{% #enable-tracing-through-datadog-apm-for-aws-step-functions %}

To enable distributed tracing for your AWS Step Functions, see [Datadog Serverless Monitoring for AWS Step Functions](https://docs.datadoghq.com/serverless/step_functions).

#### Enable tracing through AWS X-Ray{% #enable-tracing-through-aws-x-ray %}

{% alert level="warning" %}
This option does not collect [enhanced metrics for AWS Step Functions](https://docs.datadoghq.com/serverless/step_functions/enhanced-metrics). For these metrics, you must enable tracing through [Datadog APM for AWS Step Functions](https://docs.datadoghq.com/serverless/step_functions).
{% /alert %}

To collect traces from your AWS Step Functions through AWS X-Ray:

1. Enable the [Datadog AWS X-Ray integration](https://docs.datadoghq.com/tracing/guide/serverless_enable_aws_xray/).
1. Log in to the AWS Console.
1. Browse to **Step Functions.**
1. Select one of your Step Functions and click **Edit.**
1. Scroll to the **Tracing** section at the bottom of the page and check the box to **Enable X-Ray tracing.**
1. Recommended: [Install the AWS X-Ray tracing library](https://docs.datadoghq.com/tracing/guide/serverless_enable_aws_xray/#installing-the-x-ray-client-libraries) in your functions for more detailed traces.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **aws.states.activities\_failed**(count)                             | The number of activities that failed.                                                                                                                |
| **aws.states.activities\_heartbeat\_timed\_out**(count)              | The number of activities that were timed out due to a heartbeat timeout.                                                                             |
| **aws.states.activities\_scheduled**(count)                          | The number of activities that were scheduled.                                                                                                        |
| **aws.states.activities\_started**(count)                            | The number of activities that were started.                                                                                                          |
| **aws.states.activities\_succeeded**(count)                          | The number of activities that completed successfully.                                                                                                |
| **aws.states.activities\_timed\_out**(count)                         | The number of activities that were timed out on close.                                                                                               |
| **aws.states.activity\_run\_time**(gauge)                            | The average time interval, in milliseconds, between the time the activity was started and when it was closed.*Shown as millisecond*                  |
| **aws.states.activity\_run\_time.maximum**(gauge)                    | The maximum time interval, in milliseconds, between the time the activity was started and when it was closed.*Shown as millisecond*                  |
| **aws.states.activity\_run\_time.minimum**(gauge)                    | The minimum time interval, in milliseconds, between the time the activity was started and when it was closed.*Shown as millisecond*                  |
| **aws.states.activity\_run\_time.p95**(gauge)                        | The 95th percentile time interval, in milliseconds, between the time the activity was started and when it was closed.*Shown as millisecond*          |
| **aws.states.activity\_run\_time.p99**(gauge)                        | The 99th percentile time interval, in milliseconds, between the time the activity was started and when it was closed.*Shown as millisecond*          |
| **aws.states.activity\_schedule\_time**(gauge)                       | The avg time interval, in milliseconds, that the activity stayed in the schedule state.*Shown as millisecond*                                        |
| **aws.states.activity\_schedule\_time.maximum**(gauge)               | The maximum time interval, in milliseconds, that the activity stayed in the schedule state.*Shown as millisecond*                                    |
| **aws.states.activity\_schedule\_time.minimum**(gauge)               | The minimum time interval, in milliseconds, that the activity stayed in the schedule state.*Shown as millisecond*                                    |
| **aws.states.activity\_schedule\_time.p95**(gauge)                   | The 95th percentile time interval, in milliseconds, that the activity stayed in the schedule state.*Shown as millisecond*                            |
| **aws.states.activity\_schedule\_time.p99**(gauge)                   | The 99th percentile time interval, in milliseconds, that the activity stayed in the schedule state.*Shown as millisecond*                            |
| **aws.states.activity\_time**(gauge)                                 | The average time interval, in milliseconds, between the time the activity was scheduled and when it was closed.*Shown as millisecond*                |
| **aws.states.activity\_time.maximum**(gauge)                         | The maximum time interval, in milliseconds, between the time the activity was scheduled and when it was closed.*Shown as millisecond*                |
| **aws.states.activity\_time.minimum**(gauge)                         | The minimum time interval, in milliseconds, between the time the activity was scheduled and when it was closed.*Shown as millisecond*                |
| **aws.states.activity\_time.p95**(gauge)                             | The 95th percentile time interval, in milliseconds, between the time the activity was scheduled and when it was closed.*Shown as millisecond*        |
| **aws.states.activity\_time.p99**(gauge)                             | The 99th percentile time interval, in milliseconds, between the time the activity was scheduled and when it was closed.*Shown as millisecond*        |
| **aws.states.execution\_throttled**(count)                           | The number of StateEntered events in addition to retries                                                                                             |
| **aws.states.execution\_time**(gauge)                                | The average time interval, in milliseconds, between the time the execution started and the time it closed.*Shown as millisecond*                     |
| **aws.states.execution\_time.maximum**(gauge)                        | The maximum time interval, in milliseconds, between the time the execution started and the time it closed.*Shown as millisecond*                     |
| **aws.states.execution\_time.minimum**(gauge)                        | The minimum time interval, in milliseconds, between the time the execution started and the time it closed.*Shown as millisecond*                     |
| **aws.states.execution\_time.p95**(gauge)                            | The 95th percentile time interval, in milliseconds, between the time the execution started and the time it closed.*Shown as millisecond*             |
| **aws.states.execution\_time.p99**(gauge)                            | The 99th percentile time interval, in milliseconds, between the time the execution started and the time it closed.il*Shown as millisecond*           |
| **aws.states.executions\_aborted**(count)                            | The number of executions that were aborted/terminated.                                                                                               |
| **aws.states.executions\_failed**(count)                             | The number of executions that failed.                                                                                                                |
| **aws.states.executions\_started**(count)                            | The number of executions started.                                                                                                                    |
| **aws.states.executions\_succeeded**(count)                          | The number of executions that completed successfully.                                                                                                |
| **aws.states.executions\_timed\_out**(count)                         | The number of executions that timed out for any reason.                                                                                              |
| **aws.states.lambda\_function\_run\_time**(gauge)                    | The average time interval, in milliseconds, between the time the lambda function was started and when it was closed.*Shown as millisecond*           |
| **aws.states.lambda\_function\_run\_time.maximum**(gauge)            | The maximum time interval, in milliseconds, between the time the lambda function was started and when it was closed.*Shown as millisecond*           |
| **aws.states.lambda\_function\_run\_time.minimum**(gauge)            | The minimum time interval, in milliseconds, between the time the lambda function was started and when it was closed.*Shown as millisecond*           |
| **aws.states.lambda\_function\_run\_time.p95**(gauge)                | The 95th percentile time interval, in milliseconds, between the time the lambda function was started and when it was closed.*Shown as millisecond*   |
| **aws.states.lambda\_function\_run\_time.p99**(gauge)                | The 99th percentile time interval, in milliseconds, between the time the lambda function was started and when it was closed.*Shown as millisecond*   |
| **aws.states.lambda\_function\_schedule\_time**(gauge)               | The avg time interval, in milliseconds, that the activity stayed in the schedule state.*Shown as millisecond*                                        |
| **aws.states.lambda\_function\_schedule\_time.maximum**(gauge)       | The maximum time interval, in milliseconds, that the activity stayed in the schedule state.*Shown as millisecond*                                    |
| **aws.states.lambda\_function\_schedule\_time.minimum**(gauge)       | The minimum time interval, in milliseconds, that the activity stayed in the schedule state.*Shown as millisecond*                                    |
| **aws.states.lambda\_function\_schedule\_time.p95**(gauge)           | The 95th percentile time interval, in milliseconds, that the activity stayed in the schedule state.*Shown as millisecond*                            |
| **aws.states.lambda\_function\_schedule\_time.p99**(gauge)           | The 99th percentile time interval, in milliseconds, that the activity stayed in the schedule state.*Shown as millisecond*                            |
| **aws.states.lambda\_function\_time**(gauge)                         | The average time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.*Shown as millisecond*         |
| **aws.states.lambda\_function\_time.maximum**(gauge)                 | The maximum time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.*Shown as millisecond*         |
| **aws.states.lambda\_function\_time.minimum**(gauge)                 | The minimum time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.*Shown as millisecond*         |
| **aws.states.lambda\_function\_time.p95**(gauge)                     | The 95th percentile time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.*Shown as millisecond* |
| **aws.states.lambda\_function\_time.p99**(gauge)                     | The 99th percentile time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.*Shown as millisecond* |
| **aws.states.lambda\_functions\_failed**(count)                      | The number of lambda functions that failed.                                                                                                          |
| **aws.states.lambda\_functions\_heartbeat\_timed\_out**(count)       | The number of lambda functions that were timed out due to a heartbeat timeout.                                                                       |
| **aws.states.lambda\_functions\_scheduled**(count)                   | The number of lambda functions that were scheduled.                                                                                                  |
| **aws.states.lambda\_functions\_started**(count)                     | The number of lambda functions that were started.                                                                                                    |
| **aws.states.lambda\_functions\_succeeded**(count)                   | The number of lambda functions that completed successfully.                                                                                          |
| **aws.states.lambda\_functions\_timed\_out**(count)                  | The number of lambda functions that were timed out on close.                                                                                         |
| **aws.states.enhanced.execution.execution\_time**(gauge)             | The average execution time of the state machine.*Shown as nanosecond*                                                                                |
| **aws.states.enhanced.execution.execution\_time.maximum**(gauge)     | The maximum execution time of the state machine.*Shown as nanosecond*                                                                                |
| **aws.states.enhanced.execution.execution\_time.minimum**(gauge)     | The minimum execution time of the state machine.*Shown as nanosecond*                                                                                |
| **aws.states.enhanced.execution.execution\_time.p95**(gauge)         | The 95th percentile of the execution time of the state machine.*Shown as nanosecond*                                                                 |
| **aws.states.enhanced.execution.execution\_time.p99**(gauge)         | The 99th percentile of the execution time of the state machine.*Shown as nanosecond*                                                                 |
| **aws.states.enhanced.execution.failed**(count)                      | The number of state machine executions that failed.                                                                                                  |
| **aws.states.enhanced.execution.started**(count)                     | The number of state machine executions started.                                                                                                      |
| **aws.states.enhanced.execution.succeeded**(count)                   | The number of state machine executions that succeeded.                                                                                               |
| **aws.states.enhanced.task.execution.task\_duration**(gauge)         | The average duration of one task in the state machine.*Shown as nanosecond*                                                                          |
| **aws.states.enhanced.task.execution.task\_duration.maximum**(gauge) | The maximum duration of one task in the state machine.*Shown as nanosecond*                                                                          |
| **aws.states.enhanced.task.execution.task\_duration.minimum**(gauge) | The minimum duration of one task in the state machine.*Shown as nanosecond*                                                                          |
| **aws.states.enhanced.task.execution.task\_duration.p95**(gauge)     | The 95th percentile of the duration of one task in the state machine.*Shown as nanosecond*                                                           |
| **aws.states.enhanced.task.execution.task\_duration.p99**(gauge)     | The 99th percentile of the duration of one task in the state machine.*Shown as nanosecond*                                                           |
| **aws.states.enhanced.task.execution.task\_failed**(count)           | The number of state machine task executions that failed.                                                                                             |
| **aws.states.enhanced.task.execution.task\_started**(count)          | The number of state machine task executions started.                                                                                                 |
| **aws.states.enhanced.task.execution.task\_succeeded**(count)        | The number of state machine task executions that succeeded.                                                                                          |

### Events{% #events %}

The AWS Step Functions integration does not include any events.

### Service Checks{% #service-checks %}

The AWS Step Functions integration does not include any service checks.

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).
