AWS Step Functions

Información general

AWS Step Functions te permite coordinar los componentes de aplicaciones distribuidas y microservicios mediante flujos (flows) de trabajo visuales.

Esta integración te permite visualizar métricas AWS Step Functions básicas en Datadog. Para obtener información sobre el rastreo y las métricas mejoradas, consulta Monitorización Datadog serverless para AWS Step Functions.

Configuración

Instalación

Si aún no lo has hecho, configura la integración Amazon Web Services. A continuación, añade los siguientes permisos al documento de la política de tu rol AWS/Datadog:

states:ListStateMachines,
states:DescribeStateMachine

Recopilación de métricas

  1. En la página de la integración AWS, asegúrate de que States está habilitado en la pestaña Metric Collection. Si tus máquinas de estado utilizan AWS Lambda, asegúrate también de que Lambda está habilitado.
  2. Instala la integración AWS Step Functions en Datadog.

Para enriquecer las métricas AWS Lambda

Si tus estados de Step Functions son funciones Lambda, al instalar esta integración se añaden las etiquetas (tags) statemachinename, statemachinearn y stepname adicionales a tus métricas Lambda. Esto te permite ver a qué máquinas de estado pertenecen tus funciones Lambda: Puedes visualizarlo en la página de serverless.

Recopilación de métricas mejorada

Datadog también puede generar métricas mejoradas para tus Step Functions para ayudarte a realizar un seguimiento del promedio o p99 de las duraciones de pasos individuales. Para utilizar estas métricas mejoradas, consulta Monitorización Datadog serverless para AWS Step Functions.

Recopilación de logs

  1. Configura AWS Step Functions para enviar logs a CloudWatch. Nota: Utiliza el prefijo del grupo de logs de CloudWatch predeterminado /aws/vendedlogs/states de Datadog para identificar el origen de los logs y analizarlos automáticamente.
  2. Envía los logs a Datadog.

Recopilación de trazas (traces)

Puedes habilitar la recopilación de trazas de dos formas: a través de Datadog APM para Step Functions o a través de AWS X-Ray.

Habilitar el rastreo a través de Datadog APM para AWS Step Functions

Para habilitar el rastreo distribuido para tus AWS Step Functions, consulta Monitorización Datadog serverless para AWS Step Functions.

Habilitar el rastreo a través de AWS X-Ray

Esta opción no recopila métricas mejoradas para AWS Step Functions. Para recopilar estas métricas, debes habilitar el rastreo a través de Datadog APM para AWS Step Functions.

Para recopilar trazas de tus AWS Step Functions a través de AWS X-Ray:

  1. Habilita la integración AWS X-Ray en Datadog.
  2. Inicia sesión en la consola de AWS.
  3. Ve a Step Functions.
  4. Selecciona una de tus Step Functions y haz clic en Edit (Editar).
  5. Desplázate a la sección Rastreo en la parte inferior de la página y selecciona la casilla para Habilitar el rastreo X-Ray.
  6. Recomendado: Instala la biblioteca de rastreo de AWS X-Ray en tus funciones para obtener trazas más detalladas.

Datos recopilados

Métricas

aws.states.activities_failed
(count)
The number of activities that failed.
aws.states.activities_heartbeat_timed_out
(count)
The number of activities that were timed out due to a heartbeat timeout.
aws.states.activities_scheduled
(count)
The number of activities that were scheduled.
aws.states.activities_started
(count)
The number of activities that were started.
aws.states.activities_succeeded
(count)
The number of activities that completed successfully.
aws.states.activities_timed_out
(count)
The number of activities that were timed out on close.
aws.states.activity_run_time
(gauge)
The average time interval, in milliseconds, between the time the activity was started and when it was closed.
Shown as millisecond
aws.states.activity_run_time.maximum
(gauge)
The maximum time interval, in milliseconds, between the time the activity was started and when it was closed.
Shown as millisecond
aws.states.activity_run_time.minimum
(gauge)
The minimum time interval, in milliseconds, between the time the activity was started and when it was closed.
Shown as millisecond
aws.states.activity_run_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, between the time the activity was started and when it was closed.
Shown as millisecond
aws.states.activity_run_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, between the time the activity was started and when it was closed.
Shown as millisecond
aws.states.activity_schedule_time
(gauge)
The avg time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.activity_schedule_time.maximum
(gauge)
The maximum time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.activity_schedule_time.minimum
(gauge)
The minimum time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.activity_schedule_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.activity_schedule_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.activity_time
(gauge)
The average time interval, in milliseconds, between the time the activity was scheduled and when it was closed.
Shown as millisecond
aws.states.activity_time.maximum
(gauge)
The maximum time interval, in milliseconds, between the time the activity was scheduled and when it was closed.
Shown as millisecond
aws.states.activity_time.minimum
(gauge)
The minimum time interval, in milliseconds, between the time the activity was scheduled and when it was closed.
Shown as millisecond
aws.states.activity_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, between the time the activity was scheduled and when it was closed.
Shown as millisecond
aws.states.activity_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, between the time the activity was scheduled and when it was closed.
Shown as millisecond
aws.states.enhanced.execution.execution_time
(gauge)
The average execution time of the state machine.
Shown as nanosecond
aws.states.enhanced.execution.execution_time.maximum
(gauge)
The maximum execution time of the state machine.
Shown as nanosecond
aws.states.enhanced.execution.execution_time.minimum
(gauge)
The minimum execution time of the state machine.
Shown as nanosecond
aws.states.enhanced.execution.execution_time.p95
(gauge)
The 95th percentile of the execution time of the state machine.
Shown as nanosecond
aws.states.enhanced.execution.execution_time.p99
(gauge)
The 99th percentile of the execution time of the state machine.
Shown as nanosecond
aws.states.enhanced.execution.failed
(count)
The number of state machine executions that failed.
aws.states.enhanced.execution.started
(count)
The number of state machine executions started.
aws.states.enhanced.execution.succeeded
(count)
The number of state machine executions that succeeded.
aws.states.enhanced.task.execution.task_duration
(gauge)
The average duration of one task in the state machine.
Shown as nanosecond
aws.states.enhanced.task.execution.task_duration.maximum
(gauge)
The maximum duration of one task in the state machine.
Shown as nanosecond
aws.states.enhanced.task.execution.task_duration.minimum
(gauge)
The minimum duration of one task in the state machine.
Shown as nanosecond
aws.states.enhanced.task.execution.task_duration.p95
(gauge)
The 95th percentile of the duration of one task in the state machine.
Shown as nanosecond
aws.states.enhanced.task.execution.task_duration.p99
(gauge)
The 99th percentile of the duration of one task in the state machine.
Shown as nanosecond
aws.states.enhanced.task.execution.task_failed
(count)
The number of state machine task executions that failed.
aws.states.enhanced.task.execution.task_started
(count)
The number of state machine task executions started.
aws.states.enhanced.task.execution.task_succeeded
(count)
The number of state machine task executions that succeeded.
aws.states.execution_throttled
(count)
The number of StateEntered events in addition to retries
aws.states.execution_time
(gauge)
The average time interval, in milliseconds, between the time the execution started and the time it closed.
Shown as millisecond
aws.states.execution_time.maximum
(gauge)
The maximum time interval, in milliseconds, between the time the execution started and the time it closed.
Shown as millisecond
aws.states.execution_time.minimum
(gauge)
The minimum time interval, in milliseconds, between the time the execution started and the time it closed.
Shown as millisecond
aws.states.execution_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, between the time the execution started and the time it closed.
Shown as millisecond
aws.states.execution_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, between the time the execution started and the time it closed.il
Shown as millisecond
aws.states.executions_aborted
(count)
The number of executions that were aborted/terminated.
aws.states.executions_failed
(count)
The number of executions that failed.
aws.states.executions_started
(count)
The number of executions started.
aws.states.executions_succeeded
(count)
The number of executions that completed successfully.
aws.states.executions_timed_out
(count)
The number of executions that timed out for any reason.
aws.states.lambda_function_run_time
(gauge)
The average time interval, in milliseconds, between the time the lambda function was started and when it was closed.
Shown as millisecond
aws.states.lambda_function_run_time.maximum
(gauge)
The maximum time interval, in milliseconds, between the time the lambda function was started and when it was closed.
Shown as millisecond
aws.states.lambda_function_run_time.minimum
(gauge)
The minimum time interval, in milliseconds, between the time the lambda function was started and when it was closed.
Shown as millisecond
aws.states.lambda_function_run_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, between the time the lambda function was started and when it was closed.
Shown as millisecond
aws.states.lambda_function_run_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, between the time the lambda function was started and when it was closed.
Shown as millisecond
aws.states.lambda_function_schedule_time
(gauge)
The avg time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.lambda_function_schedule_time.maximum
(gauge)
The maximum time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.lambda_function_schedule_time.minimum
(gauge)
The minimum time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.lambda_function_schedule_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.lambda_function_schedule_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.lambda_function_time
(gauge)
The average time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.
Shown as millisecond
aws.states.lambda_function_time.maximum
(gauge)
The maximum time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.
Shown as millisecond
aws.states.lambda_function_time.minimum
(gauge)
The minimum time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.
Shown as millisecond
aws.states.lambda_function_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.
Shown as millisecond
aws.states.lambda_function_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.
Shown as millisecond
aws.states.lambda_functions_failed
(count)
The number of lambda functions that failed.
aws.states.lambda_functions_heartbeat_timed_out
(count)
The number of lambda functions that were timed out due to a heartbeat timeout.
aws.states.lambda_functions_scheduled
(count)
The number of lambda functions that were scheduled.
aws.states.lambda_functions_started
(count)
The number of lambda functions that were started.
aws.states.lambda_functions_succeeded
(count)
The number of lambda functions that completed successfully.
aws.states.lambda_functions_timed_out
(count)
The number of lambda functions that were timed out on close.

Eventos

La integración AWS Step Functions no incluye eventos.

Checks de servicio

La integración AWS Step Functions no incluye checks de servicios.

Solucionar problemas

¿Necesitas ayuda? Ponte en contacto con soporte técnico de Datadog.