Amazon Managed Workflows for Apache Airflow (MWAA)

Overview

Amazon Managed Workflows for Apache Airflow (MWAA) is a managed service for Apache Airflow that makes it easy for you to build and manage your workflows in the cloud.

Enable this integration to see all your Amazon MWAA metrics in Datadog.

Setup

Installation

If you haven’t already, set up the Amazon Web Services integration first.

Metric collection

  1. In the AWS integration page, ensure that MWAA is enabled under the Metric Collection tab.
  2. Install the Datadog - Amazon Managed Workflows for Apache Airflow (MWAA) Integration.

Log collection

  1. Configure Amazon MWAA to send logs to CloudWatch.
  2. Send the logs to Datadog.

Data Collected

Metrics

aws.mwaa.collect_dbdags
(gauge)
Average milliseconds taken for fetching all Serialized Dags from DB. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.collect_dbdags.maximum
(gauge)
Maximum milliseconds taken for fetching all Serialized Dags from DB. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.collect_dbdags.minimum
(gauge)
Minimum milliseconds taken for fetching all Serialized Dags from DB. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.critical_section_busy
(count)
Count of times a scheduler process tried to get a lock on the critical section (needed to send tasks to the executor) and found it locked by another process. Only available in Airflow v2.
Shown as unit
aws.mwaa.critical_section_duration
(gauge)
Average milliseconds spent in the critical section of scheduler loop - only a single scheduler can enter this loop at a time. Only available in Airflow v2.
Shown as millisecond
aws.mwaa.critical_section_duration.maximum
(gauge)
Maximum milliseconds spent in the critical section of scheduler loop - only a single scheduler can enter this loop at a time. Only available in Airflow v2.
Shown as millisecond
aws.mwaa.critical_section_duration.minimum
(gauge)
Minimum milliseconds spent in the critical section of scheduler loop - only a single scheduler can enter this loop at a time. Only available in Airflow v2.
Shown as millisecond
aws.mwaa.dag_bag_size
(count)
Number of DAGs found when the scheduler ran a scan based on it’s configuration. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.dag_callback_exceptions
(count)
Number of exceptions raised from DAG callbacks. When this happens, it means DAG callback is not working. Only available in Airflow v2.
Shown as unit
aws.mwaa.dagdependency_check
(gauge)
Average milliseconds taken to check DAG dependencies. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagdependency_check.maximum
(gauge)
Maximum milliseconds taken to check DAG dependencies. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagdependency_check.minimum
(gauge)
Minimum milliseconds taken to check DAG dependencies. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagduration_failed
(gauge)
Milliseconds taken for a DagRun to reach failed state. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagduration_success
(gauge)
Milliseconds taken for a DagRun to reach success state. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagfile_processing_last_duration
(gauge)
Average milliseconds taken to load the given DAG file. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagfile_processing_last_duration.maximum
(gauge)
Maximum milliseconds taken to load the given DAG file. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagfile_processing_last_duration.minimum
(gauge)
Minimum milliseconds taken to load the given DAG file. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.dagfile_processing_last_run_seconds_ago
(gauge)
Seconds since was last processed. Available in both Airflow v1 and v2.
Shown as second
aws.mwaa.dagfile_refresh_error
(count)
Number of failures loading any DAG files. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.dagschedule_delay
(gauge)
Milliseconds of delay between the scheduled DagRun start date and the actual DagRun start date. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.exception_failures
(count)
Number of failures caused by exception in the previous smart sensor poking loop. Only available in Airflow v2.
Shown as unit
aws.mwaa.failed_slaemail_attempts
(count)
Number of failed SLA miss email notification attempts. Only available in Airflow v2.
Shown as unit
aws.mwaa.first_task_scheduling_delay
(gauge)
Milliseconds elapsed between first task start_date and dagrun expected start. Only available in Airflow v2.
Shown as millisecond
aws.mwaa.import_errors
(count)
Number of errors from trying to parse DAG files. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.infra_failures
(count)
Number of infrastructure failures in the previous smart sensor poking loop. Only available in Airflow v2.
Shown as unit
aws.mwaa.job_end
(count)
Number of ended job, ex. SchedulerJob, LocalTaskJob. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.job_heartbeat_failure
(count)
Number of failed Heartbeats for a job, ex. SchedulerJob, LocalTaskJob. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.job_start
(count)
Number of started job, ex. SchedulerJob, LocalTaskJob. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.manager_stalls
(count)
Number of stalled DagFileProcessorManager. Only available in Airflow v2.
Shown as unit
aws.mwaa.open_slots
(count)
Number of open slots on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.operator_failures
(count)
Operator failures. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.operator_successes
(count)
Operator successes. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.orphaned_tasks_adopted
(count)
Number of Orphaned tasks adopted by the Scheduler. Only available in Airflow v2.
Shown as unit
aws.mwaa.orphaned_tasks_cleared
(count)
Number of Orphaned tasks cleared by the Scheduler. Only available in Airflow v2.
Shown as unit
aws.mwaa.poked_exceptions
(count)
Number of exceptions in the previous smart sensor poking loop. Only available in Airflow v2.
Shown as unit
aws.mwaa.poked_success
(count)
Number of newly succeeded tasks poked by the smart sensor in the previous poking loop. Only available in Airflow v2.
Shown as unit
aws.mwaa.poked_tasks
(count)
Number of tasks poked by the smart sensor in the previous poking loop. Only available in Airflow v2.
Shown as unit
aws.mwaa.pool_open_slots
(count)
Number of open slots in the pool. Only available in Airflow v2.
Shown as unit
aws.mwaa.pool_queued_slots
(count)
Number of queued slots in the pool. Only available in Airflow v2.
Shown as unit
aws.mwaa.pool_running_slots
(count)
Number of running slots in the pool. Only available in Airflow v2.
Shown as unit
aws.mwaa.pool_starving_tasks
(count)
Number of starving tasks in the pool. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.pool_used_slots
(count)
Number of used slots in the pool. Only available in Airflow v1.
Shown as unit
aws.mwaa.processor_timeouts
(count)
Number of file processors that have been killed due to taking too long. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.queued_tasks
(count)
Sum number of queued tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.queued_tasks.average
(gauge)
Average number of queued tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.queued_tasks.max
(gauge)
Max number of queued tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.queued_tasks.min
(gauge)
Min number of queued tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.running_tasks
(count)
Sum number of running tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.running_tasks.average
(gauge)
Average number of running tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.running_tasks.max
(gauge)
Max number of running tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.running_tasks.min
(gauge)
Min number of running tasks on executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.scheduler_heartbeat
(count)
Scheduler heartbeats. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.task_instance_created_using_operator
(count)
Number of tasks instances created for a given Operator. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.task_instance_duration
(gauge)
Milliseconds taken to finish a task. Available in both Airflow v1 and v2.
Shown as millisecond
aws.mwaa.task_instance_failures
(count)
Overall task instances failures. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.task_instance_finished
(count)
Number of completed task in a given dag. Similar to _end but for task. Only available in Airflow v2.
Shown as unit
aws.mwaa.task_instance_previously_succeeded
(count)
Number of previously succeeded task instances. Only available in Airflow v2.
Shown as unit
aws.mwaa.task_instance_started
(count)
Number of started task in a given dag. Similar to _start but for task. Only available in Airflow v2.
Shown as unit
aws.mwaa.task_instance_successes
(count)
Overall task instances successes. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.task_removed_from_dag
(count)
Number of tasks removed for a given dag (i.e. task no longer exists in DAG). Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.task_restored_to_dag
(count)
Number of tasks restored for a given dag (i.e. task instance which was previously in REMOVED state in the DB is added to DAG file). Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.task_timeout_error
(count)
Number of AirflowTaskTimeout errors raised when publishing Task to Celery Broker. Only available in Airflow v2.
Shown as unit
aws.mwaa.tasks_executable
(count)
Sum number of tasks that are ready for execution (set to queued) with respect to pool limits, DAG concurrency, executor state, and priority. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_executable.average
(gauge)
Average number of tasks that are ready for execution (set to queued) with respect to pool limits, DAG concurrency, executor state, and priority. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_executable.max
(gauge)
Max number of tasks that are ready for execution (set to queued) with respect to pool limits, DAG concurrency, executor state, and priority. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_executable.min
(gauge)
Min number of tasks that are ready for execution (set to queued) with respect to pool limits, DAG concurrency, executor state, and priority. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_killed_externally
(count)
Sum number of tasks killed externally. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_killed_externally.average
(count)
Average number of tasks killed externally. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_killed_externally.max
(count)
Max number of tasks killed externally. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_killed_externally.min
(count)
Min number of tasks killed externally. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_pending
(count)
Sum number of tasks pending. Available in Airflow v1.
Shown as unit
aws.mwaa.tasks_pending.average
(gauge)
Average number of tasks pending. Available in both Airflow v1.
Shown as unit
aws.mwaa.tasks_pending.max
(gauge)
Max number of tasks pending. Available in both Airflow v1.
Shown as unit
aws.mwaa.tasks_pending.min
(gauge)
Min number of tasks pending. Available in both Airflow v1.
Shown as unit
aws.mwaa.tasks_running
(count)
Sum number of tasks running in executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_running.average
(gauge)
Average number of tasks running in executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_running.max
(gauge)
Max number of tasks running in executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_running.min
(gauge)
Min number of tasks running in executor. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_starving
(count)
Sum number of tasks that cannot be scheduled because of no open slot in pool. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_starving.average
(gauge)
Average number of tasks that cannot be scheduled because of no open slot in pool. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_starving.max
(gauge)
Max number of tasks that cannot be scheduled because of no open slot in pool. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_starving.min
(gauge)
Min number of tasks that cannot be scheduled because of no open slot in pool. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.tasks_without_dag_run
(count)
Number of tasks without DagRuns or with DagRuns not in Running state. Available in both Airflow v1 and v2.
Shown as unit
aws.mwaa.total_parse_time
(gauge)
Average seconds taken to scan and import all DAG files once. Available in both Airflow v1 and v2.
Shown as second
aws.mwaa.total_parse_time.maximum
(gauge)
Maximum seconds taken to scan and import all DAG files once. Available in both Airflow v1 and v2.
Shown as second
aws.mwaa.total_parse_time.minimum
(gauge)
Minimum seconds taken to scan and import all DAG files once. Available in both Airflow v1 and v2.
Shown as second
aws.mwaa.zombies_killed
(count)
Zombie tasks killed. Available in both Airflow v1 and v2.
Shown as unit

Events

The Amazon Managed Workflows for Apache Airflow (MWAA) integration does not include any events.

Service Checks

The Amazon Managed Workflows for Apache Airflow (MWAA) integration does not include any service checks.

Troubleshooting

Need help? Contact Datadog support.