---
title: 'Enable Data Observability: Jobs Monitoring for Apache Airflow'
description: >-
  Monitor Apache Airflow DAG workflows with Data Observability: Jobs Monitoring
  using OpenLineage provider across Kubernetes, Amazon MWAA, and other
  platforms.
breadcrumbs: >-
  Docs > Data Observability Overview > Data Observability: Jobs Monitoring >
  Enable Data Observability: Jobs Monitoring for Apache Airflow
---

# Enable Data Observability: Jobs Monitoring for Apache Airflow

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site). ().
{% /alert %}

{% /callout %}

[Data Observability: Jobs Monitoring](https://docs.datadoghq.com/data_jobs) provides visibility into the performance and reliability of workflows run by Apache Airflow DAGs.

{% tab title="Kubernetes" %}
### Requirements{% #requirements %}

- [Apache Airflow 2.7](https://github.com/apache/airflow/releases/tag/2.7.0) or later, including Airflow 3
- [apache-airflow-providers-openlineage](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html)

### Setup{% #setup %}

To get started, follow the instructions below.

1. Install `openlineage` provider for **both Airflow schedulers and Airflow workers** by adding the following into your `requirements.txt` file or wherever your Airflow dependencies are managed:

   ```text
   apache-airflow-providers-openlineage
   ```

1. Configure `openlineage` provider. Choose one of the following configuration options and set the environment variables, making them available to pods where you run Airflow schedulers and Airflow workers:

**Option 1: Datadog Transport (Recommended)**

**Requirements**: Requires `apache-airflow-providers-openlineage` version 2.7.3 or later and `openlineage-python` version 1.37.0 or later.

   ```shell
   export DD_API_KEY=<DD_API_KEY>
   export DD_SITE=<DD_SITE>
   export OPENLINEAGE__TRANSPORT__TYPE=datadog
   # OPENLINEAGE_NAMESPACE sets the 'env' tag value in Datadog. You can hardcode this to a different value
   export OPENLINEAGE_NAMESPACE=${AIRFLOW_ENV_NAME}
   ```

   - Replace `<DD_API_KEY>` with your valid [Datadog API key](https://docs.datadoghq.com/account_management/api-app-keys/#api-keys).
   - Replace `<DD_SITE>` with your Datadog site (for example, ).

**Option 2: Composite Transport**

**Requirements**: Requires `apache-airflow-providers-openlineage` version 1.11.0 or later and `openlineage-python` version 1.37.0 or later.

Use this option if you're already using OpenLineage with another system and want to add Datadog as an additional destination. The composite transport sends events to all configured transports.

For example, if you're using an HTTP transport to send events to another system:

   ```shell
   # Your existing HTTP transport configuration
   export OPENLINEAGE__TRANSPORT__TYPE=composite
   export OPENLINEAGE__TRANSPORT__TRANSPORTS__EXISTING__TYPE=http
   export OPENLINEAGE__TRANSPORT__TRANSPORTS__EXISTING__URL=<YOUR_EXISTING_URL>
   export OPENLINEAGE__TRANSPORT__TRANSPORTS__EXISTING__AUTH__TYPE=api_key
   export OPENLINEAGE__TRANSPORT__TRANSPORTS__EXISTING__AUTH__API_KEY=<YOUR_EXISTING_API_KEY>
   
   # Add Datadog as an additional transport
   export DD_API_KEY=<DD_API_KEY>
   export DD_SITE=<DD_SITE>
   export OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__TYPE=datadog
   # OPENLINEAGE_NAMESPACE sets the 'env' tag value in Datadog. You can hardcode this to a different value
   export OPENLINEAGE_NAMESPACE=${AIRFLOW_ENV_NAME}
   ```

   - Replace `<DD_API_KEY>` with your valid [Datadog API key](https://docs.datadoghq.com/account_management/api-app-keys/#api-keys).
   - Replace `<DD_SITE>` with your Datadog site (for example, ).
   - Replace `<YOUR_EXISTING_URL>` and `<YOUR_EXISTING_API_KEY>` with your existing OpenLineage transport configuration.

In this example, OpenLineage events are sent to both your existing system and Datadog. You can configure multiple transports by giving each one a unique name (like `EXISTING` and `DATADOG` in the example above).

**Option 3: Simple Configuration**

This option uses the URL-based configuration and works with all versions of the OpenLineage provider:

   ```shell
   export OPENLINEAGE_URL=<DD_DATA_OBSERVABILITY_INTAKE>
   export OPENLINEAGE_API_KEY=<DD_API_KEY>
   # OPENLINEAGE_NAMESPACE sets the 'env' tag value in Datadog. You can hardcode this to a different value
   export OPENLINEAGE_NAMESPACE=${AIRFLOW_ENV_NAME}
   ```

   - Replace `<DD_DATA_OBSERVABILITY_INTAKE>` with `https://data-obs-intake.`.

   - Replace `<DD_API_KEY>` with your valid [Datadog API key](https://docs.datadoghq.com/account_management/api-app-keys/#api-keys).

   - If you're using **Airflow v2.7 or v2.8**, also add these two environment variables along with the previous ones. This fixes an OpenLinage config issue fixed at `apache-airflow-providers-openlineage` v1.7, while Airflow v2.7 and v2.8 use previous versions.

     ```shell
     #!/bin/sh
     # Required for Airflow v2.7 & v2.8 only
     export AIRFLOW__OPENLINEAGE__CONFIG_PATH=""
     export AIRFLOW__OPENLINEAGE__DISABLED_FOR_OPERATORS=""
     ```

Check official documentation [configuration-openlineage](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/configurations-ref.html#configuration-openlineage) for other supported configurations of the `openlineage` provider.

1. Trigger an update to your Airflow pods and wait for them to finish.

1. Optionally, set up log collection for correlating task logs to DAG run executions in Data Observability: Jobs Monitoring. Correlation requires the logs directory to follow the [default log filename format](https://airflow.apache.org/docs/apache-airflow/2.9.3/configurations-ref.html#log-filename-template).

The `PATH_TO_AIRFLOW_LOGS` value is `$AIRFLOW_HOME/logs` in standard deployments, but may differ if customized. Add the following annotation to your pod:

   ```yaml
   ad.datadoghq.com/base.logs: '[{"type": "file", "path": "PATH_TO_AIRFLOW_LOGS/*/*/*/*.log", "source": "airflow"}]'
   ```

Adding `"source": "airflow"` enables the extraction of the correlation-required attributes by the [Airflow integration](https://docs.datadoghq.com/integrations/airflow/?tab=containerized) logs pipeline.

These file paths are relative to the Agent container. Mount the directory containing the log file into both the application and Agent containers so the Agent can access it. For details, see [Collect logs from a container local log file](https://docs.datadoghq.com/containers/kubernetes/log/?tab=datadogoperator#from-a-container-local-log-file).

**Note**: Log collection requires the Datadog Agent to already be installed on your Kubernetes cluster. If you haven't installed it yet, see the [Kubernetes installation documentation](https://docs.datadoghq.com/containers/kubernetes/installation/?tab=datadogoperator#installation).

For more methods to set up log collection on Kubernetes, see the [Kubernetes and Integrations configuration section](https://docs.datadoghq.com/containers/kubernetes/integrations/?tab=annotations#configuration).

### Validation{% #validation %}

In Datadog, view the [Data Observability: Jobs Monitoring](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html) page to see a list of your Airflow job runs after the setup.

### Troubleshooting{% #troubleshooting %}

Set `OPENLINEAGE_CLIENT_LOGGING` to `DEBUG` along with the other environment variables set previously for OpenLineage client and its child modules. This can be useful in troubleshooting during the configuration of `openlineage` provider.
{% /tab %}

{% tab title="Amazon MWAA" %}
### Requirements{% #requirements %}

- [Apache Airflow 2.7.0](https://github.com/apache/airflow/releases/tag/2.7.0) or later, including Airflow 3
- [apache-airflow-providers-openlineage](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html)

### Setup{% #setup %}

{% alert level="info" %}
**If you are using Airflow 2.7.2, 2.8.1, or 2.9.2**: MWAA default constraints pin older `apache-airflow-providers-openlineage` versions`. These versions include known issues that can degrade the Data Observability experience. To upgrade to provider versions with fixes, see [Upgrade OpenLineage provider on Amazon MWAA for Airflow 2.7.2, 2.8.1, and 2.9.2](https://docs.datadoghq.com/data_observability/jobs_monitoring/airflow_mwaa_upgrade/).
{% /alert %}

To get started, follow the instructions below.

1. Install `openlineage` provider by adding the following into your `requirements.txt` file:

   ```text
   apache-airflow-providers-openlineage
   ```

1. Configure `openlineage` provider. The simplest option is to set the following environment variables in your [Amazon MWAA start script](https://docs.aws.amazon.com/mwaa/latest/userguide/using-startup-script.html):

   ```shell
   #!/bin/sh
   export OPENLINEAGE_URL=<DD_DATA_OBSERVABILITY_INTAKE>
   export OPENLINEAGE_API_KEY=<DD_API_KEY>
   # AIRFLOW__OPENLINEAGE__NAMESPACE sets the 'env' tag value in Datadog. You can hardcode this to a different value
   export AIRFLOW__OPENLINEAGE__NAMESPACE=${AIRFLOW_ENV_NAME}
   ```

   - Replace `<DD_DATA_OBSERVABILITY_INTAKE>` fully with `https://data-obs-intake.`.
   - Replace `<DD_API_KEY>` fully with your valid [Datadog API key](https://docs.datadoghq.com/account_management/api-app-keys/#api-keys).
   - If you're using **Airflow v2.7 or v2.8**, also add these two environment variables to the startup script. This fixes an OpenLinage config issue fixed at `apache-airflow-providers-openlineage` v1.7, while Airflow v2.7 and v2.8 use previous versions.
     ```shell
     #!/bin/sh
     # Required for Airflow v2.7 & v2.8 only
     export AIRFLOW__OPENLINEAGE__CONFIG_PATH=""
     export AIRFLOW__OPENLINEAGE__DISABLED_FOR_OPERATORS=""
     ```

Check official documentation [configuration-openlineage](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/configurations-ref.html#configuration-openlineage) for other supported configurations of `openlineage` provider.

1. Deploy your updated `requirements.txt` and [Amazon MWAA startup script](https://docs.aws.amazon.com/mwaa/latest/userguide/using-startup-script.html) to your Amazon S3 folder configured for your Amazon MWAA Environment.

1. Optionally, set up Log Collection for correlating task logs to DAG run executions in DJM:

   1. Configure Amazon MWAA to [send logs to CloudWatch](https://docs.aws.amazon.com/mwaa/latest/userguide/monitoring-airflow.html#monitoring-airflow-enable).
   1. [Send the logs to Datadog](https://docs.datadoghq.com/integrations/amazon_web_services/?tab=roledelegation#log-collection).

### Validation{% #validation %}

In Datadog, view the [Data Observability: Jobs Monitoring](https://app.datadoghq.com/data-jobs/) page to see a list of your Airflow job runs after the setup.

### Troubleshooting{% #troubleshooting %}

Ensure your Execution role configured for your Amazon MWAA Environment has the right permissions to the `requirements.txt` and [Amazon MWAA start script](https://docs.aws.amazon.com/mwaa/latest/userguide/using-startup-script.html). This is required if you are managing your own Execution role and it's the first time you are adding those supporting files. See official guide [Amazon MWAA execution role](https://docs.aws.amazon.com/mwaa/latest/userguide/mwaa-create-role.html) for details if needed.

Set `OPENLINEAGE_CLIENT_LOGGING` to `DEBUG` in the [Amazon MWAA start script](https://docs.aws.amazon.com/mwaa/latest/userguide/using-startup-script.html) for OpenLineage client and its child modules. This can be useful in troubleshooting during the configuration of `openlineage` provider.
{% /tab %}

{% tab title="Astronomer" %}

{% alert level="danger" %}
For Astronomer customers using Astro, [Astro offers lineage features that rely on the Airflow OpenLineage provider](https://www.astronomer.io/docs/learn/airflow-openlineage#lineage-on-astro). Data Observability: Jobs Monitoring depends on the same OpenLineage provider and uses the [Composite](https://openlineage.io/docs/client/python#composite) transport to add additional transport.
{% /alert %}

### Requirements{% #requirements %}

- [Astro Runtime 12.1.0+](https://www.astronomer.io/docs/astro/runtime-release-notes#astro-runtime-1210)
- [`apache-airflow-providers-openlineage` 1.11.0+](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html)
- [`openlineage-python` 1.23.0+](https://github.com/OpenLineage/OpenLineage/releases/tag/1.23.0)

### Setup{% #setup %}

1. To set up the OpenLineage provider, define the following environment variables. You can configure these variables in your Astronomer deployment using either of the following methods:

   - [From the Astro UI](https://www.astronomer.io/docs/astro/manage-env-vars#using-the-astro-ui): Navigate to your deployment settings and add the environment variables directly.
   - [In the Dockerfile](https://www.astronomer.io/docs/astro/manage-env-vars#using-your-dockerfile): Define the environment variables in your `Dockerfile` to ensure they are included during the build process.

   ```shell
   OPENLINEAGE__TRANSPORT__TYPE=composite
   OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__TYPE=http
   OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__URL=<DD_DATA_OBSERVABILITY_INTAKE>
   OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__AUTH__TYPE=api_key
   OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__AUTH__API_KEY=<DD_API_KEY>
   OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__COMPRESSION=gzip
   ```

   - replace `<DD_DATA_OBSERVABILITY_INTAKE>` with `https://data-obs-intake.`.
   - replace `<DD_API_KEY>` with your valid [Datadog API key](https://docs.datadoghq.com/account_management/api-app-keys/#api-keys).

**Optional:**

   - Set `AIRFLOW__OPENLINEAGE__NAMESPACE` with a unique name for the `env` tag on all DAGs in the Airflow deployment. This allows Datadog to logically separate this deployment's jobs from those of other Airflow deployments.
   - Set `OPENLINEAGE_CLIENT_LOGGING` to `DEBUG` for the OpenLineage client and its child modules to log at a `DEBUG` logging level. This can be useful for troubleshooting during the configuration of an OpenLineage provider.

See the [Astronomer official guide](https://www.astronomer.io/docs/astro/environment-variables/#management-options) for managing environment variables for a deployment. See Apache Airflow's [OpenLineage Configuration Reference](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/configurations-ref.html#configuration-openlineage) for other supported configurations of the OpenLineage provider.

1. Trigger a update to your deployment and wait for it to finish.

### Validation{% #validation %}

In Datadog, view the [Data Observability: Jobs Monitoring](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html) page to see a list of your Airflow job runs after the setup.

### Troubleshooting{% #troubleshooting %}

Check that the OpenLineage environment variables are correctly set on the Astronomer deployment.

**Note**: Using the `.env` file to add the environment variables does not work because the variables are only applied to the local Airflow environment.
{% /tab %}

{% tab title="Google Cloud Composer" %}

{% alert level="danger" %}
Data Observability: Jobs Monitoring for Airflow is not yet compatible with [Dataplex](https://cloud.google.com/composer/docs/composer-2/lineage-integration) data lineage. Setting up OpenLineage for Data Observability: Jobs Monitoring overrides your existing Dataplex transport configuration.
{% /alert %}

### Requirements{% #requirements %}

- [Cloud Composer 2](https://cloud.google.com/composer/docs/composer-versioning-overview) or later
- [apache-airflow-providers-openlineage](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html)

### Setup{% #setup %}

To get started, follow the instructions below.

In the Advanced Configuration tab, under **Airflow configuration override**, click **Add Airflow configuration override** and configure these settings:

- In Section 1, enter `openlineage`.

- In Key 1, enter `disabled`.

- In Value 1, enter `False` to make sure OpenLineage is activated.

- In Section 2, enter `openlineage`.

- In Key 2, enter `transport`.

- In Value 2, enter the following:

  ```text
  {
   "type": "http",
   "url": "<DD_DATA_OBSERVABILITY_INTAKE>",
   "auth": {
      "type": "api_key",
      "api_key": "<DD_API_KEY>"
   }
  }
  ```

- Replace `<DD_DATA_OBSERVABILITY_INTAKE>` fully with `https://data-obs-intake.`.
- Replace `<DD_API_KEY>` fully with your valid [Datadog API key](https://docs.datadoghq.com/account_management/api-app-keys/#api-keys).

**Optional:** Configure the OpenLineage namespace to set the `env` tag value in Datadog:

- In Section 3, enter `openlineage`.
- In Key 3, enter `namespace`.
- In Value 3, enter your Composer environment value (for example, prod, dev, staging, or test).

**Note:** If [Dataplex](https://cloud.google.com/composer/docs/composer-2/lineage-integration) is enabled, the namespace is already set by default to the Composer environment name. Manually setting the OpenLineage namespace here optionally allows you to override this default value.

Check official [Airflow](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/configurations-ref.html#configuration-openlineage) and [Composer](https://cloud.google.com/composer/docs/airflow-configurations) documentation pages for other supported configurations of the `openlineage` provider in Google Cloud Composer.

### Validation{% #validation %}

In Datadog, view the [Data Observability: Jobs Monitoring](https://app.datadoghq.com/data-jobs/) page to see a list of your Airflow job runs after the setup.

### Troubleshooting{% #troubleshooting %}

Set `OPENLINEAGE_CLIENT_LOGGING` to `DEBUG` in the Environment variables tab of the Composer page for OpenLineage client and its child modules. This can be useful in troubleshooting as you configure the `openlineage` provider.
{% /tab %}

## Advanced Configuration{% #advanced-configuration %}

### Link your dbt jobs with Airflow tasks{% #link-your-dbt-jobs-with-airflow-tasks %}

You can monitor your dbt jobs that are running in Airflow by connecting the dbt telemetry with respective Airflow tasks, using [OpenLineage dbt integration](https://openlineage.io/docs/integrations/dbt/).

To see the link between Airflow tasks and dbt jobs, follow those steps:

1. Install `openlineage-dbt`. Reference [Using dbt with Amazon MWAA](https://docs.aws.amazon.com/mwaa/latest/userguide/samples-dbt.html) to setup dbt in the virtual environment.

```shell
pip3 install openlineage-dbt>=1.36.0
```
Change the dbt invocation to `dbt-ol` (OpenLineage wrapper for dbt).
Also, add the –consume-structured-logs flag to view dbt jobs while the command is still running.

```bash
dbt-ol run --consume-structured-logs --project-dir=$TEMP_DIR --profiles-dir=$PROFILES_DIR
```
In your DAG file, add the OPENLINEAGE_PARENT_ID variable to the environment of the Airflow task that runs the dbt process:
```python
dbt_run = BashOperator(
    task_id="dbt_run",
    dag=dag,
    bash_command=f"dbt-ol run --consume-structured-logs --project-dir=$TEMP_DIR --profiles-dir=$PROFILES_DIR",
    append_env=True,
    env={
        "OPENLINEAGE_PARENT_ID": "{{ macros.OpenLineageProviderPlugin.lineage_parent_id(task_instance) }}",
    },
)
```

### Link your Spark jobs with Airflow tasks{% #link-your-spark-jobs-with-airflow-tasks %}

OpenLineage integration can automatically inject Airflow's parent job information (namespace, job name, run id) into Spark application properties. This creates a parent-child relationship between Airflow tasks and Spark jobs, enabling you to troubleshoot both systems in one place.

**Note**: This feature requires `apache-airflow-providers-openlineage` version 2.1.0 or later (supported from Airflow 2.9+).

1. **Verify operator compatibility**: Check the [Apache Airflow OpenLineage documentation](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html#passing-parent-job-information-to-spark-jobs) to confirm your Spark operators are supported. This feature only works with specific operators like SparkSubmitOperator and LivyOperator.

1. Make sure your Spark jobs are actively monitored through [Data Observability: Jobs Monitoring](https://app.datadoghq.com/data-jobs/).

1. Enable automatic parent job information injection by setting the following configuration:

```shell
AIRFLOW__OPENLINEAGE__SPARK_INJECT_PARENT_JOB_INFO=true
```

This automatically injects parent job properties for all supported Spark Operators. To disable for specific operators, set `openlineage_inject_parent_job_info=False` on the operator.

## Further Reading{% #further-reading %}

- [Troubleshoot and optimize data processing workloads with Data Jobs Monitoring](https://www.datadoghq.com/blog/data-jobs-monitoring/)
- [Observing the data lifecycle with Datadog](https://www.datadoghq.com/blog/data-observability-monitoring)
- [Data Observability: Jobs Monitoring](https://docs.datadoghq.com/data_jobs)
