Requirements
Setup
To get started, follow the instructions below.
Install openlineage
provider for both Airflow schedulers and Airflow workers by adding the following into your requirements.txt
file or wherever your Airflow depedencies are managed:
For Airflow 2.7 or later:
apache-airflow-providers-openlineage
For Airflow 2.5 & 2.6 :
Configure openlineage
provider. The simplest option is to set the following environment variables and make them available to pods where you run Airflow schedulers and Airflow workers:
export OPENLINEAGE_URL=<DD_DATA_OBSERVABILITY_INTAKE>
export OPENLINEAGE_API_KEY=<DD_API_KEY>
export AIRFLOW__OPENLINEAGE__NAMESPACE=${AIRFLOW_ENV_NAME}
- Replace
<DD_DATA_OBSERVABILITY_INTAKE>
with https://data-obs-intake.
. - Replace
<DD_API_KEY>
with your valid Datadog API key. - If you’re using Airflow v2.7 or v2.8, also add these two environment variables along with the previous ones. This fixes an OpenLinage config issue fixed at
apache-airflow-providers-openlineage
v1.7, while Airflow v2.7 and v2.8 use previous versions.#!/bin/sh
# Required for Airflow v2.7 & v2.8 only
export AIRFLOW__OPENLINEAGE__CONFIG_PATH=""
export AIRFLOW__OPENLINEAGE__DISABLED_FOR_OPERATORS=""
Check official documentation configuration-openlineage for other supported configurations of the openlineage
provider.
Trigger an update to your Airflow pods and wait for them to finish.
Validation
In Datadog, view the Data Jobs Monitoring page to see a list of your Airflow job runs after the setup.
Troubleshooting
Set OPENLINEAGE_CLIENT_LOGGING
to DEBUG
along with the other environment variables set previously for OpenLineage client and its child modules. This can be useful in troubleshooting during the configuration of openlineage
provider.
Requirements
Setup
To get started, follow the instructions below.
Install openlineage
provider by adding the following into your requirements.txt
file:
For Airflow 2.7 or later:
apache-airflow-providers-openlineage
For Airflow 2.5 & 2.6 :
Configure openlineage
provider. The simplest option is to set the following environment variables in your Amazon MWAA start script:
#!/bin/sh
export OPENLINEAGE_URL=<DD_DATA_OBSERVABILITY_INTAKE>
export OPENLINEAGE_API_KEY=<DD_API_KEY>
export AIRFLOW__OPENLINEAGE__NAMESPACE=${AIRFLOW_ENV_NAME}
- Replace
<DD_DATA_OBSERVABILITY_INTAKE>
fully with https://data-obs-intake.
. - Replace
<DD_API_KEY>
fully with your valid Datadog API key. - If you’re using Airflow v2.7 or v2.8, also add these two environment variables to the startup script. This fixes an OpenLinage config issue fixed at
apache-airflow-providers-openlineage
v1.7, while Airflow v2.7 and v2.8 use previous versions.#!/bin/sh
# Required for Airflow v2.7 & v2.8 only
export AIRFLOW__OPENLINEAGE__CONFIG_PATH=""
export AIRFLOW__OPENLINEAGE__DISABLED_FOR_OPERATORS=""
Check official documentation configuration-openlineage for other supported configurations of openlineage
provider.
Deploy your updated requirements.txt
and Amazon MWAA startup script to your Amazon S3 folder configured for your Amazon MWAA Environment.
Optionally, set up Log Collection for correlating task logs to DAG run executions in DJM:
- Configure Amazon MWAA to send logs to CloudWatch.
- Send the logs to Datadog.
Validation
In Datadog, view the Data Jobs Monitoring page to see a list of your Airflow job runs after the setup.
Troubleshooting
Ensure your Execution role configured for your Amazon MWAA Environment has the right permissions to the requirements.txt
and Amazon MWAA start script. This is required if you are managing your own Execution role and it’s the first time you are adding those supporting files. See official guide Amazon MWAA execution role for details if needed.
Set OPENLINEAGE_CLIENT_LOGGING
to DEBUG
in the Amazon MWAA start script for OpenLineage client and its child modules. This can be useful in troubleshooting during the configuration of openlineage
provider.
Requirements
Setup
Install the OpenLineage provider (apache-airflow-providers-openlineage
) 1.11.0+ and openlineage-python
1.23.0+. Add the following to your requirements.txt
file inside your Astro project:
apache-airflow-providers-openlineage>=1.11.0
openlineage-python>=1.23.0
To set up the OpenLineage provider, define the following environment variables. You can configure these variables in your Astronomer deployment using either of the following methods:
- From the Astro UI: Navigate to your deployment settings and add the environment variables directly.
- In the Dockerfile: Define the environment variables in your
Dockerfile
to ensure they are included during the build process.
OPENLINEAGE__TRANSPORT__TYPE=composite
OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__TYPE=http
OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__URL=<DD_DATA_OBSERVABILITY_INTAKE>
OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__AUTH__TYPE=api_key
OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__AUTH__API_KEY=<DD_API_KEY>
OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__COMPRESSION=gzip
- replace
<DD_DATA_OBSERVABILITY_INTAKE>
with https://data-obs-intake.
. - replace
<DD_API_KEY>
with your valid Datadog API key.
Optional:
- Set
AIRFLOW__OPENLINEAGE__NAMESPACE
with a unique name for your Airflow deployment. This allows Datadog to logically separate this deployment’s jobs from those of other Airflow deployments. - Set
OPENLINEAGE_CLIENT_LOGGING
to DEBUG
for the OpenLineage client and its child modules to log at a DEBUG
logging level. This can be useful for troubleshooting during the configuration of an OpenLineage provider.
See the Astronomer official guide for managing environment variables for a deployment. See Apache Airflow’s OpenLineage Configuration Reference for other supported configurations of the OpenLineage provider.
Trigger a update to your deployment and wait for it to finish.
Validation
In Datadog, view the Data Jobs Monitoring page to see a list of your Airflow job runs after the setup.
Troubleshooting
Check that the OpenLineage environment variables are correctly set on the Astronomer deployment.
Note: Using the .env
file to add the environment variables does not work because the variables are only applied to the local Airflow environment.