airflow.can_connect (count) | 1 if can connect to Airflow, otherwise 0 |
airflow.celery.execute_command.failure (count) | Number of non-zero exit code from Celery task. |
airflow.celery.task_timeout_error (count) | Number of AirflowTaskTimeout errors raised when publishing Task to Celery Broker. Shown as error |
airflow.collect_db_dags (gauge) | Milliseconds taken for fetching all Serialized Dags from DB Shown as millisecond |
airflow.dag.callback_exceptions (count) | Number of exceptions raised from DAG callbacks. When this happens, it means DAG callback is not working Shown as error |
airflow.dag.loading_duration (gauge) | DAG loading duration in seconds (deprecated) Shown as second |
airflow.dag.queue_duration (gauge) | Milliseconds a task spends in the queue (deprecated, use queued_duration). Shown as millisecond |
airflow.dag.queued_duration (gauge) | Milliseconds a task spends in the Queued state, before being Running. Shown as millisecond |
airflow.dag.scheduled_duration (gauge) | Milliseconds a task spends in the Scheduled state, before being Queued. Shown as millisecond |
airflow.dag.task.duration (gauge) | Milliseconds taken to finish a task Shown as millisecond |
airflow.dag.task.ongoing_duration (gauge) | Current duration for ongoing DAG tasks Shown as second |
airflow.dag.task.total_running (gauge) | Total number of running tasks |
airflow.dag.task_removed (gauge) | Tasks removed from DAG Shown as second |
airflow.dag.task_restored (gauge) | Tasks restored to DAG Shown as second |
airflow.dag_file_processor_timeouts (gauge) | (DEPRECATED) same behavior as dag_processing.processor_timeouts |
airflow.dag_file_refresh_error (count) | Number of failures loading any DAG files Shown as error |
airflow.dag_processing.file_path_queue_size (gauge) | Number of DAG files to be considered for the next scan. |
airflow.dag_processing.file_path_queue_update_count (count) | Number of times we’ve scanned the filesystem and queued all existing dags. |
airflow.dag_processing.import_errors (gauge) | Number of errors from trying to parse DAG files Shown as error |
airflow.dag_processing.last_duration (gauge) | Milliseconds taken to load the given DAG file Shown as millisecond |
airflow.dag_processing.last_num_of_db_queries (gauge) | Number of queries to Airflow database during parsing per dag_file. |
airflow.dag_processing.last_run.seconds_ago (gauge) | Seconds since <dag_file> was last processed Shown as second |
airflow.dag_processing.last_runtime (gauge) | Seconds spent processing <dag_file> (in most recent iteration) Shown as second |
airflow.dag_processing.manager_stalls (count) | Number of stalled DagFileProcessorManager |
airflow.dag_processing.other_callback_count (count) | Number of non-SLA callbacks received. |
airflow.dag_processing.processes (count) | Number of currently running DAG parsing processes |
airflow.dag_processing.processor_timeouts (gauge) | Number of file processors that have been killed due to taking too long |
airflow.dag_processing.sla_callback_count (count) | Number of SLA callbacks received. |
airflow.dag_processing.total_parse_time (gauge) | Seconds taken to scan and import all DAG files once Shown as second |
airflow.dagbag_size (gauge) | DAG bag size |
airflow.dagrun.dependency_check (gauge) | Milliseconds taken to check DAG dependencies Shown as millisecond |
airflow.dagrun.duration.failed (gauge) | Milliseconds taken for a DagRun to reach failed state Shown as millisecond |
airflow.dagrun.duration.success (gauge) | Milliseconds taken for a DagRun to reach success state Shown as millisecond |
airflow.dagrun.first_task_scheduling_delay (gauge) | Milliseconds elapsed between first task start_date and dagrun expected start Shown as millisecond |
airflow.dagrun.schedule_delay (gauge) | Milliseconds of delay between the scheduled DagRun start date and the actual DagRun start date Shown as millisecond |
airflow.dataset.orphaned (gauge) | Number of datasets marked as orphans because they are no longer referenced in DAG. |
airflow.dataset.triggered_dagruns (count) | Number of DAG runs triggered by a dataset update. |
airflow.dataset.updates (count) | Number of updated datasets. |
airflow.executor.open_slots (gauge) | Number of open slots on executor |
airflow.executor.queued_tasks (gauge) | Number of queued tasks on executor Shown as task |
airflow.executor.running_tasks (gauge) | Number of running tasks on executor Shown as task |
airflow.healthy (count) | 1 if Airflow is healthy, otherwise 0 |
airflow.job.end (count) | Number of ended <job_name> job, ex. SchedulerJob, LocalTaskJob Shown as job |
airflow.job.heartbeat.failure (count) | Number of failed Heartbeats for a <job_name> job, ex. SchedulerJob, LocalTaskJob Shown as error |
airflow.job.start (count) | Number of started <job_name> job, ex. SchedulerJob, LocalTaskJob Shown as job |
airflow.kubernetes_executor.adopt_task_instances.duration (gauge) | Milliseconds taken to adopt the task instances in Kubernetes Executor. Shown as millisecond |
airflow.kubernetes_executor.clear_not_launched_queued_tasks.duration (gauge) | Milliseconds taken for clearing not launched queued tasks in Kubernetes Executor. Shown as millisecond |
airflow.local_task_job.task_exit (count) | Number of LocalTaskJob terminations with a return_code. |
airflow.operator_failures (count) | Operator <operator_name> failures |
airflow.operator_successes (count) | Operator <operator_name> successes |
airflow.pool.deferred_slots (gauge) | Number of deferred slots in the pool. |
airflow.pool.open_slots (gauge) | Number of open slots in the pool |
airflow.pool.queued_slots (gauge) | Number of queued slots in the pool |
airflow.pool.running_slots (gauge) | Number of running slots in the pool |
airflow.pool.scheduled_slots (gauge) | Number of scheduled slots in the pool. |
airflow.pool.starving_tasks (gauge) | Number of starving tasks in the pool Shown as task |
airflow.pool.used_slots (gauge) | Number of used slots in the pool |
airflow.previously_succeeded (count) | Number of previously succeeded task instances Shown as task |
airflow.scheduler.critical_section_busy (count) | Count of times a scheduler process tried to get a lock on the critical section (needed to send tasks to the executor) and found it locked by another process. Shown as operation |
airflow.scheduler.critical_section_duration (gauge) | Milliseconds spent in the critical section of scheduler loop – only a single scheduler can enter this loop at a time Shown as millisecond |
airflow.scheduler.critical_section_query_duration (gauge) | Milliseconds spent running the critical section task instance query. Shown as millisecond |
airflow.scheduler.orphaned_tasks.adopted (count) | Number of Orphaned tasks adopted by the Scheduler Shown as task |
airflow.scheduler.orphaned_tasks.cleared (count) | Number of Orphaned tasks cleared by the Scheduler Shown as task |
airflow.scheduler.scheduler_loop_duration (gauge) | Milliseconds spent running one scheduler loop. Shown as millisecond |
airflow.scheduler.tasks.executable (count) | Number of tasks that are ready for execution (set to queued) with respect to pool limits, dag concurrency, executor state, and priority. Shown as task |
airflow.scheduler.tasks.killed_externally (count) | Number of tasks killed externally Shown as task |
airflow.scheduler.tasks.running (count) | Number of tasks running in executor Shown as task |
airflow.scheduler.tasks.starving (count) | Number of tasks that cannot be scheduled because of no open slot in pool Shown as task |
airflow.scheduler.tasks.without_dagrun (count) | Number of tasks without DagRuns or with DagRuns not in Running state Shown as task |
airflow.scheduler_heartbeat (count) | Scheduler heartbeats |
airflow.sla_callback_notification_failure (count) | Number of failed SLA miss callback notification attempts. |
airflow.sla_email_notification_failure (count) | Number of failed SLA miss email notification attempts Shown as task |
airflow.sla_missed (count) | Number of SLA misses. |
airflow.smart_sensor_operator.exception_failures (count) | Number of failures caused by exception in the previous smart sensor poking loop Shown as error |
airflow.smart_sensor_operator.infra_failures (count) | Number of infrastructure failures in the previous smart sensor poking loop Shown as error |
airflow.smart_sensor_operator.poked_exception (count) | Number of exceptions in the previous smart sensor poking loop Shown as error |
airflow.smart_sensor_operator.poked_success (count) | Number of newly succeeded tasks poked by the smart sensor in the previous poking loop Shown as task |
airflow.smart_sensor_operator.poked_tasks (count) | Number of tasks poked by the smart sensor in the previous poking loop Shown as task |
airflow.task.cpu_usage (gauge) | Percentage of CPU used by a task. Shown as percent |
airflow.task.duration (gauge) | Milliseconds taken to run a task. Shown as millisecond |
airflow.task.instance_created (gauge) | Task instances created Shown as second |
airflow.task.mem_usage (gauge) | Percentage of memory used by a task. Shown as percent |
airflow.task.queued_duration (gauge) | Milliseconds a task spends in the Queued state, before being Running. Shown as millisecond |
airflow.task.scheduled_duration (gauge) | Milliseconds a task spends in the Scheduled state, before being Queued. Shown as millisecond |
airflow.task_instance_created (count) | Number of tasks instances created for a given Operator Shown as task |
airflow.task_removed_from_dag (count) | Number of tasks removed for a given dag (i.e. task no longer exists in DAG) Shown as task |
airflow.task_restored_to_dag (count) | Number of tasks restored for a given dag (i.e. task instance which was previously in REMOVED state in the DB is added to DAG file) Shown as task |
airflow.ti.finish (count) | Number of completed task in a given dag. Shown as task |
airflow.ti.start (count) | Number of started task in a given dag. Shown as task |
airflow.ti_failures (count) | Overall task instances failures Shown as task |
airflow.ti_successes (count) | Overall task instances successes Shown as task |
airflow.triggerer_heartbeat (count) | Triggerer heartbeats. |
airflow.triggers.blocked_main_thread (count) | Number of triggers that blocked the main thread (likely due to not being async). |
airflow.triggers.failed (count) | Number of triggers that errored before they could fire an event. |
airflow.triggers.running (gauge) | Number of triggers currently running for a triggerer (described by hostname). |
airflow.triggers.succeeded (count) | Number of triggers that have fired at least one event. |
airflow.zombies_killed (count) | Zombie tasks killed Shown as task |