概要

AWS Step Functions では、ビジュアルなワークフローを使用して、分散アプリケーションおよびマイクロサービスのコンポーネントを調整できます。

このインテグレーションを有効にすると、Datadog にすべての Step Functions メトリクスを表示できます。

Datadog のネイティブ AWS Step Function モニタリングは、公開ベータ版で利用可能です。強化されたメトリクスとトレースで Step Function をインスツルメンテーションするには、サーバーレスのドキュメントをご覧ください。

計画と使用

インフラストラクチャーリスト

Amazon Web Services インテグレーションをまだセットアップしていない場合は、最初にセットアップします。次に、AWS/Datadog ロールのポリシードキュメントに以下のアクセス許可を追加します。

states:ListStateMachines,
states:DescribeStateMachine

メトリクスの収集

  1. AWS インテグレーションページMetric Collection タブで、States を有効にします。ステートマシンが AWS Lambda を使用している場合は、Lambda がチェックされていることも確認してください。
  2. Datadog - AWS Step Functions インテグレーションをインストールします。

AWS Lambda メトリクスの増強

Step Functions ステートが Lambda 関数である場合、このインテグレーションをインストールすると、Lambda メトリクスにタグ statemachinenamestatemachinearnstepname が追加されます。これにより、Lambda 関数がどのステートマシンに属しているかを確認でき、サーバーレスページでこれを視覚化できます。

収集データ

  1. AWS Step Functions を CloudWatch にログを送信するように構成します。: Datadog がログのソースを識別し、自動的にパースするために、CloudWatch のロググループのデフォルトのプレフィックス /aws/vendedlogs/states を使用します。
  2. Datadog にログを送信します

トレースの収集

AWS X-Ray トレーシングを有効にする

AWS Step Functions の分散型トレーシングを有効にするには

  1. Datadog AWS X-Ray インテグレーションを有効にします。
  2. AWS コンソールにログインします。
  3. Step Functions にアクセスします。
  4. Step Functions の 1 つを選択して、Edit をクリックします。
  5. ページの下部にある Tracing セクションまでスクロールし、Enable X-Ray tracing チェックボックスをオンにします。
  6. 推奨: より詳細なトレースを行うには、関数に AWS X-Ray トレーシングライブラリをインストールしてください。

リアルユーザーモニタリング

データセキュリティ

aws.states.activities_failed
(count)
The number of activities that failed.
aws.states.activities_heartbeat_timed_out
(count)
The number of activities that were timed out due to a heartbeat timeout.
aws.states.activities_scheduled
(count)
The number of activities that were scheduled.
aws.states.activities_started
(count)
The number of activities that were started.
aws.states.activities_succeeded
(count)
The number of activities that completed successfully.
aws.states.activities_timed_out
(count)
The number of activities that were timed out on close.
aws.states.activity_run_time
(gauge)
The average time interval, in milliseconds, between the time the activity was started and when it was closed.
Shown as millisecond
aws.states.activity_run_time.maximum
(gauge)
The maximum time interval, in milliseconds, between the time the activity was started and when it was closed.
Shown as millisecond
aws.states.activity_run_time.minimum
(gauge)
The minimum time interval, in milliseconds, between the time the activity was started and when it was closed.
Shown as millisecond
aws.states.activity_run_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, between the time the activity was started and when it was closed.
Shown as millisecond
aws.states.activity_run_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, between the time the activity was started and when it was closed.
Shown as millisecond
aws.states.activity_schedule_time
(gauge)
The avg time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.activity_schedule_time.maximum
(gauge)
The maximum time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.activity_schedule_time.minimum
(gauge)
The minimum time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.activity_schedule_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.activity_schedule_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.activity_time
(gauge)
The average time interval, in milliseconds, between the time the activity was scheduled and when it was closed.
Shown as millisecond
aws.states.activity_time.maximum
(gauge)
The maximum time interval, in milliseconds, between the time the activity was scheduled and when it was closed.
Shown as millisecond
aws.states.activity_time.minimum
(gauge)
The minimum time interval, in milliseconds, between the time the activity was scheduled and when it was closed.
Shown as millisecond
aws.states.activity_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, between the time the activity was scheduled and when it was closed.
Shown as millisecond
aws.states.activity_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, between the time the activity was scheduled and when it was closed.
Shown as millisecond
aws.states.enhanced.execution.execution_time
(gauge)
The average execution time of the state machine.
Shown as nanosecond
aws.states.enhanced.execution.execution_time.maximum
(gauge)
The maximum execution time of the state machine.
Shown as nanosecond
aws.states.enhanced.execution.execution_time.minimum
(gauge)
The minimum execution time of the state machine.
Shown as nanosecond
aws.states.enhanced.execution.execution_time.p95
(gauge)
The 95th percentile of the execution time of the state machine.
Shown as nanosecond
aws.states.enhanced.execution.execution_time.p99
(gauge)
The 99th percentile of the execution time of the state machine.
Shown as nanosecond
aws.states.enhanced.execution.failed
(count)
The number of state machine executions that failed.
aws.states.enhanced.execution.started
(count)
The number of state machine executions started.
aws.states.enhanced.execution.succeeded
(count)
The number of state machine executions that succeeded.
aws.states.enhanced.task.execution.task_duration
(gauge)
The average duration of one task in the state machine.
Shown as nanosecond
aws.states.enhanced.task.execution.task_duration.maximum
(gauge)
The maximum duration of one task in the state machine.
Shown as nanosecond
aws.states.enhanced.task.execution.task_duration.minimum
(gauge)
The minimum duration of one task in the state machine.
Shown as nanosecond
aws.states.enhanced.task.execution.task_duration.p95
(gauge)
The 95th percentile of the duration of one task in the state machine.
Shown as nanosecond
aws.states.enhanced.task.execution.task_duration.p99
(gauge)
The 99th percentile of the duration of one task in the state machine.
Shown as nanosecond
aws.states.enhanced.task.execution.task_failed
(count)
The number of state machine task executions that failed.
aws.states.enhanced.task.execution.task_started
(count)
The number of state machine task executions started.
aws.states.enhanced.task.execution.task_succeeded
(count)
The number of state machine task executions that succeeded.
aws.states.execution_throttled
(count)
The number of StateEntered events in addition to retries
aws.states.execution_time
(gauge)
The average time interval, in milliseconds, between the time the execution started and the time it closed.
Shown as millisecond
aws.states.execution_time.maximum
(gauge)
The maximum time interval, in milliseconds, between the time the execution started and the time it closed.
Shown as millisecond
aws.states.execution_time.minimum
(gauge)
The minimum time interval, in milliseconds, between the time the execution started and the time it closed.
Shown as millisecond
aws.states.execution_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, between the time the execution started and the time it closed.
Shown as millisecond
aws.states.execution_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, between the time the execution started and the time it closed.il
Shown as millisecond
aws.states.executions_aborted
(count)
The number of executions that were aborted/terminated.
aws.states.executions_failed
(count)
The number of executions that failed.
aws.states.executions_started
(count)
The number of executions started.
aws.states.executions_succeeded
(count)
The number of executions that completed successfully.
aws.states.executions_timed_out
(count)
The number of executions that timed out for any reason.
aws.states.lambda_function_run_time
(gauge)
The average time interval, in milliseconds, between the time the lambda function was started and when it was closed.
Shown as millisecond
aws.states.lambda_function_run_time.maximum
(gauge)
The maximum time interval, in milliseconds, between the time the lambda function was started and when it was closed.
Shown as millisecond
aws.states.lambda_function_run_time.minimum
(gauge)
The minimum time interval, in milliseconds, between the time the lambda function was started and when it was closed.
Shown as millisecond
aws.states.lambda_function_run_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, between the time the lambda function was started and when it was closed.
Shown as millisecond
aws.states.lambda_function_run_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, between the time the lambda function was started and when it was closed.
Shown as millisecond
aws.states.lambda_function_schedule_time
(gauge)
The avg time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.lambda_function_schedule_time.maximum
(gauge)
The maximum time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.lambda_function_schedule_time.minimum
(gauge)
The minimum time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.lambda_function_schedule_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.lambda_function_schedule_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.lambda_function_time
(gauge)
The average time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.
Shown as millisecond
aws.states.lambda_function_time.maximum
(gauge)
The maximum time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.
Shown as millisecond
aws.states.lambda_function_time.minimum
(gauge)
The minimum time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.
Shown as millisecond
aws.states.lambda_function_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.
Shown as millisecond
aws.states.lambda_function_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.
Shown as millisecond
aws.states.lambda_functions_failed
(count)
The number of lambda functions that failed.
aws.states.lambda_functions_heartbeat_timed_out
(count)
The number of lambda functions that were timed out due to a heartbeat timeout.
aws.states.lambda_functions_scheduled
(count)
The number of lambda functions that were scheduled.
aws.states.lambda_functions_started
(count)
The number of lambda functions that were started.
aws.states.lambda_functions_succeeded
(count)
The number of lambda functions that completed successfully.
aws.states.lambda_functions_timed_out
(count)
The number of lambda functions that were timed out on close.

ヘルプ

AWS Step Functions インテグレーションには、イベントは含まれません。

ヘルプ

AWS Step Functions インテグレーションには、サービスのチェック機能は含まれません。

ヘルプ

ご不明な点は、Datadog のサポートチームまでお問合せください。