Amazon Managed Workflows for Apache Airflow(MWAA)

개요

Amazon Managed Workflows for Apache Airflow(MWAA)는 Apach Airflow용 관리형 서비스로, 클라우드에서 워크플로우를 빌드하고 관리하기 쉽게 도와줍니다.

이 통합을 활성화하면 Datadog에서 모든 MWAA 메트릭을 확인할 수 있습니다.

설정

설치

아직 설정하지 않은 경우 먼저 Amazon Web Services 통합을 설정하세요.

메트릭 수집

  1. AWS 통합 페이지에서 Metric Collection 탭 하단의 MWAA가 활성화되어 있는지 확인합니다.
  2. Datadog - Amazon Managed Workflows for Apache Airflow(MWAA) 통합을 설치하세요.

로그 수집

  1. 로그를 CloudWatch로 전송하도록 AWS MWAA를 구성하세요.
  2. Datadog로 로그를 전송하세요.

수집한 데이터

메트릭

aws.mwaa.collect_dbdags
(gauge)
Average milliseconds taken for fetching all Serialized Dags from DB. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.collect_dbdags.maximum
(gauge)
Maximum milliseconds taken for fetching all Serialized Dags from DB. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.collect_dbdags.minimum
(gauge)
Minimum milliseconds taken for fetching all Serialized Dags from DB. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.critical_section_busy
(count)
Count of times a scheduler process tried to get a lock on the critical section (needed to send tasks to the executor) and found it locked by another process. Only available in Airflow v2.
Shown as unit
aws.mwaa.critical_section_duration
(gauge)
Average milliseconds spent in the critical section of scheduler loop - only a single scheduler can enter this loop at a time. Only available in Airflow v2.
Shown as millisecond
aws.mwaa.critical_section_duration.maximum
(gauge)
Maximum milliseconds spent in the critical section of scheduler loop - only a single scheduler can enter this loop at a time. Only available in Airflow v2.
Shown as millisecond
aws.mwaa.critical_section_duration.minimum
(gauge)
Minimum milliseconds spent in the critical section of scheduler loop - only a single scheduler can enter this loop at a time. Only available in Airflow v2.
Shown as millisecond
aws.mwaa.dag_bag_size
(count)
Number of DAGs found when the scheduler ran a scan based on it’s configuration. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.dag_callback_exceptions
(count)
Number of exceptions raised from DAG callbacks. When this happens, it means DAG callback is not working. Only available in Airflow v2.
Shown as unit
aws.mwaa.dagdependency_check
(gauge)
Average milliseconds taken to check DAG dependencies. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagdependency_check.maximum
(gauge)
Maximum milliseconds taken to check DAG dependencies. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagdependency_check.minimum
(gauge)
Minimum milliseconds taken to check DAG dependencies. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagduration_failed
(gauge)
Milliseconds taken for a DagRun to reach failed state. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagduration_success
(gauge)
Milliseconds taken for a DagRun to reach success state. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagfile_processing_last_duration
(gauge)
Average milliseconds taken to load the given DAG file. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagfile_processing_last_duration.maximum
(gauge)
Maximum milliseconds taken to load the given DAG file. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagfile_processing_last_duration.minimum
(gauge)
Minimum milliseconds taken to load the given DAG file. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagfile_processing_last_run_seconds_ago
(gauge)
Seconds since was last processed. Available in both Airflow v1 and v2.
Shown as second
aws.mwaa.dagfile_refresh_error
(count)
Number of failures loading any DAG files. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.dagschedule_delay
(gauge)
Milliseconds of delay between the scheduled DagRun start date and the actual DagRun start date. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.exception_failures
(count)
Number of failures caused by exception in the previous smart sensor poking loop. Only available in Airflow v2.
Shown as unit
aws.mwaa.failed_slaemail_attempts
(count)
Number of failed SLA miss email notification attempts. Only available in Airflow v2.
Shown as unit
aws.mwaa.first_task_scheduling_delay
(gauge)
Milliseconds elapsed between first task start_date and dagrun expected start. Only available in Airflow v2.
Shown as millisecond
aws.mwaa.import_errors
(count)
Number of errors from trying to parse DAG files. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.infra_failures
(count)
Number of infrastructure failures in the previous smart sensor poking loop. Only available in Airflow v2.
Shown as unit
aws.mwaa.job_end
(count)
Number of ended job, ex. SchedulerJob, LocalTaskJob. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.job_heartbeat_failure
(count)
Number of failed Heartbeats for a job, ex. SchedulerJob, LocalTaskJob. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.job_start
(count)
Number of started job, ex. SchedulerJob, LocalTaskJob. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.manager_stalls
(count)
Number of stalled DagFileProcessorManager. Only available in Airflow v2.
Shown as unit
aws.mwaa.open_slots
(count)
Number of open slots on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.operator_failures
(count)
Operator failures. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.operator_successes
(count)
Operator successes. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.orphaned_tasks_adopted
(count)
Number of Orphaned tasks adopted by the Scheduler. Only available in Airflow v2.
Shown as unit
aws.mwaa.orphaned_tasks_cleared
(count)
Number of Orphaned tasks cleared by the Scheduler. Only available in Airflow v2.
Shown as unit
aws.mwaa.poked_exceptions
(count)
Number of exceptions in the previous smart sensor poking loop. Only available in Airflow v2.
Shown as unit
aws.mwaa.poked_success
(count)
Number of newly succeeded tasks poked by the smart sensor in the previous poking loop. Only available in Airflow v2.
Shown as unit
aws.mwaa.poked_tasks
(count)
Number of tasks poked by the smart sensor in the previous poking loop. Only available in Airflow v2.
Shown as unit
aws.mwaa.pool_open_slots
(count)
Number of open slots in the pool. Only available in Airflow v2.
Shown as unit
aws.mwaa.pool_queued_slots
(count)
Number of queued slots in the pool. Only available in Airflow v2.
Shown as unit
aws.mwaa.pool_running_slots
(count)
Number of running slots in the pool. Only available in Airflow v2.
Shown as unit
aws.mwaa.pool_starving_tasks
(count)
Number of starving tasks in the pool. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.pool_used_slots
(count)
Number of used slots in the pool. Only available in Airflow v1.
Shown as unit
aws.mwaa.processor_timeouts
(count)
Number of file processors that have been killed due to taking too long. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.queued_tasks
(count)
Sum number of queued tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.queued_tasks.average
(gauge)
Average number of queued tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.queued_tasks.max
(gauge)
Max number of queued tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.queued_tasks.min
(gauge)
Min number of queued tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.running_tasks
(count)
Sum number of running tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.running_tasks.average
(gauge)
Average number of running tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.running_tasks.max
(gauge)
Max number of running tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.running_tasks.min
(gauge)
Min number of running tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.scheduler_heartbeat
(count)
Scheduler heartbeats. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.task_instance_created_using_operator
(count)
Number of tasks instances created for a given Operator. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.task_instance_duration
(gauge)
Milliseconds taken to finish a task. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.task_instance_failures
(count)
Overall task instances failures. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.task_instance_finished
(count)
Number of completed task in a given dag. Similar to _end but for task. Only available in Airflow v2.
Shown as unit
aws.mwaa.task_instance_previously_succeeded
(count)
Number of previously succeeded task instances. Only available in Airflow v2.
Shown as unit
aws.mwaa.task_instance_started
(count)
Number of started task in a given dag. Similar to _start but for task. Only available in Airflow v2.
Shown as unit
aws.mwaa.task_instance_successes
(count)
Overall task instances successes. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.task_removed_from_dag
(count)
Number of tasks removed for a given dag (i.e. task no longer exists in DAG). Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.task_restored_to_dag
(count)
Number of tasks restored for a given dag (i.e. task instance which was previously in REMOVED state in the DB is added to DAG file). Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.task_timeout_error
(count)
Number of AirflowTaskTimeout errors raised when publishing Task to Celery Broker. Only available in Airflow v2.
Shown as unit
aws.mwaa.tasks_executable
(count)
Sum number of tasks that are ready for execution (set to queued) with respect to pool limits, DAG concurrency, executor state, and priority. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_executable.average
(gauge)
Average number of tasks that are ready for execution (set to queued) with respect to pool limits, DAG concurrency, executor state, and priority. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_executable.max
(gauge)
Max number of tasks that are ready for execution (set to queued) with respect to pool limits, DAG concurrency, executor state, and priority. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_executable.min
(gauge)
Min number of tasks that are ready for execution (set to queued) with respect to pool limits, DAG concurrency, executor state, and priority. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_killed_externally
(count)
Sum number of tasks killed externally. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_killed_externally.average
(count)
Average number of tasks killed externally. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_killed_externally.max
(count)
Max number of tasks killed externally. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_killed_externally.min
(count)
Min number of tasks killed externally. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_pending
(count)
Sum number of tasks pending. Available in Airflow v1.
Shown as unit
aws.mwaa.tasks_pending.average
(gauge)
Average number of tasks pending. Available in both Airflow v1.
Shown as unit
aws.mwaa.tasks_pending.max
(gauge)
Max number of tasks pending. Available in both Airflow v1.
Shown as unit
aws.mwaa.tasks_pending.min
(gauge)
Min number of tasks pending. Available in both Airflow v1.
Shown as unit
aws.mwaa.tasks_running
(count)
Sum number of tasks running in executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_running.average
(gauge)
Average number of tasks running in executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_running.max
(gauge)
Max number of tasks running in executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_running.min
(gauge)
Min number of tasks running in executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_starving
(count)
Sum number of tasks that cannot be scheduled because of no open slot in pool. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_starving.average
(gauge)
Average number of tasks that cannot be scheduled because of no open slot in pool. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_starving.max
(gauge)
Max number of tasks that cannot be scheduled because of no open slot in pool. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_starving.min
(gauge)
Min number of tasks that cannot be scheduled because of no open slot in pool. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_without_dag_run
(count)
Number of tasks without DagRuns or with DagRuns not in Running state. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.total_parse_time
(gauge)
Average seconds taken to scan and import all DAG files once. Available in both Airflow v1 and v2.
Shown as second
aws.mwaa.total_parse_time.maximum
(gauge)
Maximum seconds taken to scan and import all DAG files once. Available in both Airflow v1 and v2.
Shown as second
aws.mwaa.total_parse_time.minimum
(gauge)
Minimum seconds taken to scan and import all DAG files once. Available in both Airflow v1 and v2.
Shown as second
aws.mwaa.zombies_killed
(count)
Zombie tasks killed. Available in both Airflow v1 and v2.
Shown as unit

이벤트

Amazon Managed Workflows for Apache Airflow(MWAA) 통합에는 이벤트가 포함되어 있지 않습니다.

서비스 점검

Amazon Managed Workflows for Apache Airflow(MWAA) 통합에는 서비스 점검이 포함되어 있지 않습니다.

트러블슈팅

도움이 필요하신가요? Datadog 지원팀에 문의하세요.