AWS Step Functions

개요

AWS 스텝 함수로 시각적 워크플로우를 사용하여 분산 애플리케이션 및 마이크로서비스의 구성 요소를 조정할 수 있습니다.

이 통합을 활성화하면 Datadog에서 모든 스텝 함수 메트릭을 볼 수 있습니다.

Datadog의 기본 AWS 스텝 함수 모니터링 기능은 공개 베타 버전으로 제공됩니다. 스텝 함수를 보강 메트릭 및 트레이스로 계측하려면 서버리스 문서를 참조하세요.

설정

설치

아직 설정하지 않았다면 먼저 Amazon Web Services 통합을 설정합니다. 그 다음 아래의 권한을 AWS/Datadog 역할에 맞는 정책 문서에 추가합니다.

states:ListStateMachines,
states:DescribeStateMachine

메트릭 수집

  1. AWS 통합 페이지에서 Metric Collection 탭 하단에 States이 활성화되어 있는지 확인합니다. 상태 시스템이 AWS Lambda를 사용하는 경우 Lambda도 체크되어 있는지 확인합니다.
  2. Datadog - AWS 스텝 함수 통합을 설치합니다.

AWS 람다 메트릭 보강

스텝 함수 상태가 Lambda 함수라면 본 통합 설치 시 Lambda 메트릭에 추가 statemachinename, statemachinearn, stepname 태그가 추가됩니다. Lambda 함수가 어떤 상태 시스템에 속하는지 확인할 수 있으며 서버리스 페이지에서 시각화할 수도 있습니다.

로그 수집

  1. AWS 스텝 함수를 설정하여 클라우드와치(CloudWatch)로 로그를 전송합니다. 참고: Datadog용 기본 클라우드와치(CloudWatch) 로그 그룹 접두사인 /aws/vendedlogs/states를 사용하면 로그의 소스를 식별하고 자동으로 파싱합니다.
  2. [Datadog에 로그를 전송]합니다7.

트레이스 수집

AWS X-Ray 추적 활성화

AWS 스텝 함수용 분산 추적을 사용하려면:

  1. Datadog AWS X-Ray 통합을 활성화합니다.
  2. AWS 콘솔에 로그인합니다.
  3. 스텝 함수를 찾습니다.
  4. 스텝 함수 중 하나를 선택하고 편집을 클릭합니다.
  5. 페이지 하단의 추적 섹션으로 스크롤하여 X-Ray 추적 사용란을 체크합니다.
  6. 권장: 트레이스를 더 자세히 확인하려면 함수에 AWS X-Ray 추적 라이브러리를 설치합니다.

수집한 데이터

메트릭

aws.states.activities_failed
(count)
The number of activities that failed.
aws.states.activities_heartbeat_timed_out
(count)
The number of activities that were timed out due to a heartbeat timeout.
aws.states.activities_scheduled
(count)
The number of activities that were scheduled.
aws.states.activities_started
(count)
The number of activities that were started.
aws.states.activities_succeeded
(count)
The number of activities that completed successfully.
aws.states.activities_timed_out
(count)
The number of activities that were timed out on close.
aws.states.activity_run_time
(gauge)
The average time interval, in milliseconds, between the time the activity was started and when it was closed.
Shown as millisecond
aws.states.activity_run_time.maximum
(gauge)
The maximum time interval, in milliseconds, between the time the activity was started and when it was closed.
Shown as millisecond
aws.states.activity_run_time.minimum
(gauge)
The minimum time interval, in milliseconds, between the time the activity was started and when it was closed.
Shown as millisecond
aws.states.activity_run_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, between the time the activity was started and when it was closed.
Shown as millisecond
aws.states.activity_run_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, between the time the activity was started and when it was closed.
Shown as millisecond
aws.states.activity_schedule_time
(gauge)
The avg time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.activity_schedule_time.maximum
(gauge)
The maximum time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.activity_schedule_time.minimum
(gauge)
The minimum time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.activity_schedule_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.activity_schedule_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.activity_time
(gauge)
The average time interval, in milliseconds, between the time the activity was scheduled and when it was closed.
Shown as millisecond
aws.states.activity_time.maximum
(gauge)
The maximum time interval, in milliseconds, between the time the activity was scheduled and when it was closed.
Shown as millisecond
aws.states.activity_time.minimum
(gauge)
The minimum time interval, in milliseconds, between the time the activity was scheduled and when it was closed.
Shown as millisecond
aws.states.activity_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, between the time the activity was scheduled and when it was closed.
Shown as millisecond
aws.states.activity_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, between the time the activity was scheduled and when it was closed.
Shown as millisecond
aws.states.enhanced.execution.execution_time
(gauge)
The average execution time of the state machine.
Shown as nanosecond
aws.states.enhanced.execution.execution_time.maximum
(gauge)
The maximum execution time of the state machine.
Shown as nanosecond
aws.states.enhanced.execution.execution_time.minimum
(gauge)
The minimum execution time of the state machine.
Shown as nanosecond
aws.states.enhanced.execution.execution_time.p95
(gauge)
The 95th percentile of the execution time of the state machine.
Shown as nanosecond
aws.states.enhanced.execution.execution_time.p99
(gauge)
The 99th percentile of the execution time of the state machine.
Shown as nanosecond
aws.states.enhanced.execution.failed
(count)
The number of state machine executions that failed.
aws.states.enhanced.execution.started
(count)
The number of state machine executions started.
aws.states.enhanced.execution.succeeded
(count)
The number of state machine executions that succeeded.
aws.states.enhanced.task.execution.task_duration
(gauge)
The average duration of one task in the state machine.
Shown as nanosecond
aws.states.enhanced.task.execution.task_duration.maximum
(gauge)
The maximum duration of one task in the state machine.
Shown as nanosecond
aws.states.enhanced.task.execution.task_duration.minimum
(gauge)
The minimum duration of one task in the state machine.
Shown as nanosecond
aws.states.enhanced.task.execution.task_duration.p95
(gauge)
The 95th percentile of the duration of one task in the state machine.
Shown as nanosecond
aws.states.enhanced.task.execution.task_duration.p99
(gauge)
The 99th percentile of the duration of one task in the state machine.
Shown as nanosecond
aws.states.enhanced.task.execution.task_failed
(count)
The number of state machine task executions that failed.
aws.states.enhanced.task.execution.task_started
(count)
The number of state machine task executions started.
aws.states.enhanced.task.execution.task_succeeded
(count)
The number of state machine task executions that succeeded.
aws.states.execution_throttled
(count)
The number of StateEntered events in addition to retries
aws.states.execution_time
(gauge)
The average time interval, in milliseconds, between the time the execution started and the time it closed.
Shown as millisecond
aws.states.execution_time.maximum
(gauge)
The maximum time interval, in milliseconds, between the time the execution started and the time it closed.
Shown as millisecond
aws.states.execution_time.minimum
(gauge)
The minimum time interval, in milliseconds, between the time the execution started and the time it closed.
Shown as millisecond
aws.states.execution_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, between the time the execution started and the time it closed.
Shown as millisecond
aws.states.execution_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, between the time the execution started and the time it closed.il
Shown as millisecond
aws.states.executions_aborted
(count)
The number of executions that were aborted/terminated.
aws.states.executions_failed
(count)
The number of executions that failed.
aws.states.executions_started
(count)
The number of executions started.
aws.states.executions_succeeded
(count)
The number of executions that completed successfully.
aws.states.executions_timed_out
(count)
The number of executions that timed out for any reason.
aws.states.lambda_function_run_time
(gauge)
The average time interval, in milliseconds, between the time the lambda function was started and when it was closed.
Shown as millisecond
aws.states.lambda_function_run_time.maximum
(gauge)
The maximum time interval, in milliseconds, between the time the lambda function was started and when it was closed.
Shown as millisecond
aws.states.lambda_function_run_time.minimum
(gauge)
The minimum time interval, in milliseconds, between the time the lambda function was started and when it was closed.
Shown as millisecond
aws.states.lambda_function_run_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, between the time the lambda function was started and when it was closed.
Shown as millisecond
aws.states.lambda_function_run_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, between the time the lambda function was started and when it was closed.
Shown as millisecond
aws.states.lambda_function_schedule_time
(gauge)
The avg time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.lambda_function_schedule_time.maximum
(gauge)
The maximum time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.lambda_function_schedule_time.minimum
(gauge)
The minimum time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.lambda_function_schedule_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.lambda_function_schedule_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, that the activity stayed in the schedule state.
Shown as millisecond
aws.states.lambda_function_time
(gauge)
The average time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.
Shown as millisecond
aws.states.lambda_function_time.maximum
(gauge)
The maximum time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.
Shown as millisecond
aws.states.lambda_function_time.minimum
(gauge)
The minimum time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.
Shown as millisecond
aws.states.lambda_function_time.p95
(gauge)
The 95th percentile time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.
Shown as millisecond
aws.states.lambda_function_time.p99
(gauge)
The 99th percentile time interval, in milliseconds, between the time the lambda function was scheduled and when it was closed.
Shown as millisecond
aws.states.lambda_functions_failed
(count)
The number of lambda functions that failed.
aws.states.lambda_functions_heartbeat_timed_out
(count)
The number of lambda functions that were timed out due to a heartbeat timeout.
aws.states.lambda_functions_scheduled
(count)
The number of lambda functions that were scheduled.
aws.states.lambda_functions_started
(count)
The number of lambda functions that were started.
aws.states.lambda_functions_succeeded
(count)
The number of lambda functions that completed successfully.
aws.states.lambda_functions_timed_out
(count)
The number of lambda functions that were timed out on close.

이벤트

AWS 스텝 함수 통합에는 이벤트가 포함되지 않습니다.

서비스 점검

AWS 스텝 함수 통합에는 서비스 점검이 포함되지 않습니다.

트러블슈팅

도움이 필요하신가요? Datadog 고객 지원팀에 문의하세요.