- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
To try the preview for Airflow monitoring, follow the setup instructions below.
Data Jobs Monitoring provides visibility into the performance and reliability of workflows run by Apache Airflow DAGs.
Data Jobs Monitoring supports Apache Airflow deployments with apache-airflow-providers-openlineage installed.
To get started, follow the instructions below.
Install openlineage
provider by adding the following into your requirements.txt
file or wherever your Airflow depedencies are managed:
apache-airflow-providers-openlineage>=1.11.0
Configure openlineage
provider. The simplest option is to set the following environment variables and make them available to pods where you run Airflow schedulers and Airflow workers:
OPENLINEAGE_URL=<DD_DATA_OBSERVABILITY_INTAKE>
OPENLINEAGE_API_KEY=<DD_API_KEY>
openlineage
provider for both Airflow schedulers and Airflow workers.<DD_DATA_OBSERVABILITY_INTAKE>
with https://data-obs-intake.
.<DD_API_KEY>
with your valid Datadog API key.Optional:
AIRFLOW__OPENLINEAGE__NAMESPACE
with a unique name for your Airflow deployment. This allows Datadog to logically separate this deployment’s jobs from those of other Airflow deployments.OPENLINEAGE_CLIENT_LOGGING
to DEBUG
for OpenLineage client and its child modules. This can be useful in troubleshooting during the configuration of openlineage
provider.Check official documentation configuration-openlineage for other supported configurations of the openlineage
provider.
Trigger an update to your Airflow pods and wait for them to finish.
Data Jobs Monitoring is supported for Apache Airflow deployment with apache-airflow-providers-openlineage installed.
To get started, follow the instructions below.
Install openlineage
provider by adding the following into your requirements.txt
file:
apache-airflow-providers-openlineage>=1.11.0
Ensure the openlineage provider version is compatible with your constraints file. If no constraints file is specified in requirements.txt
, ensure compatibility with the default Apache Airflow constraints for your Airflow version. Refer to the Amazon MWAA User Guide for guidance on specifying Python dependencies in requirements.txt
.
Configure openlineage
provider. The simplest option is to set the following environment variables in your Amazon MWAA start script:
#!/bin/sh
export OPENLINEAGE_URL=<DD_DATA_OBSERVABILITY_INTAKE>
export OPENLINEAGE_API_KEY=<DD_API_KEY>
<DD_DATA_OBSERVABILITY_INTAKE>
fully with https://data-obs-intake.
.<DD_API_KEY>
fully with your valid Datadog API key.Optional:
AIRFLOW__OPENLINEAGE__NAMESPACE
with a unique name for your Airflow deployment. This allows Datadog to logically separate this deployment’s jobs from those of other Airflow deployments.OPENLINEAGE_CLIENT_LOGGING
to DEBUG
for OpenLineage client and its child modules. This can be useful in troubleshooting during the configuration of openlineage
provider.Check official documentation configuration-openlineage for other supported configurations of openlineage
provider.
Deploy your updated requirements.txt
and Amazon MWAA start script to your Amazon S3 folder configured for your Amazon MWAA Environment.
Ensure your Execution role configured for your Amazon MWAA Environment has the right permissions to the requirements.txt
and Amazon MWAA start script. This is required if you are managing your own Execution role and it’s the first time you are adding those supporting files. See official guide Amazon MWAA execution role for details if needed.
Install the OpenLineage provider (apache-airflow-providers-openlineage
) 1.11.0+ and openlineage-python
1.23.0+. Add the following to your requirements.txt
file inside your Astro project:
apache-airflow-providers-openlineage>=1.11.0
openlineage-python>=1.23.0
Configure the OpenLineage provider. You can do this by setting the following environment variables using the Astro UI:
OPENLINEAGE__TRANSPORT__TYPE=composite
OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__TYPE=http
OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__URL=<DD_DATA_OBSERVABILITY_INTAKE>
OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__AUTH__TYPE=api_key
OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__AUTH__API_KEY=<DD_API_KEY>
OPENLINEAGE__TRANSPORT__TRANSPORTS__DATADOG__COMPRESSION=gzip
<DD_DATA_OBSERVABILITY_INTAKE>
with https://data-obs-intake.
.<DD_API_KEY>
with your valid Datadog API key.Optional:
AIRFLOW__OPENLINEAGE__NAMESPACE
with a unique name for your Airflow deployment. This allows Datadog to logically separate this deployment’s jobs from those of other Airflow deployments.OPENLINEAGE_CLIENT_LOGGING
to DEBUG
for the OpenLineage client and its child modules to log at a DEBUG
logging level. This can be useful for troubleshooting during the configuration of an OpenLineage provider.See the Astronomer official guide for managing environment variables for a deployment. See Apache Airflow’s OpenLineage Configuration Reference for other supported configurations of the OpenLineage provider.
Trigger a update to your deployment and wait for it to finish.
In Datadog, view the Data Jobs Monitoring page to see a list of your Airflow job runs after the setup.
You can troubleshoot Airflow tasks that run Spark jobs more efficiently by connecting the Spark job run info and telemetry with the respective Airflow task.
Prerequisites: your Spark jobs are currently monitored through Data Jobs Monitoring and are submitted through SparkSubmitOperators from your Airflow jobs.
To see the link between Airflow task and the the Spark application it submitted, follow these steps:
Configure Airflow to turn off lazy loading of Airflow plugins by setting lazy_load_plugins config to False
in your airflow.cfg
or exporting the following environment variable where your Airflow schedulers and Airflow workers run:
export AIRFLOW__CORE__LAZY_LOAD_PLUGINS='False'
Update your Airflow job’s DAG file by adding the following Spark configurations to your SparkSubmitOperator where you submit your Spark Application:
SparkSubmitOperator(
conf={
"spark.openlineage.parentJobNamespace": "{{ macros.OpenLineageProviderPlugin.lineage_job_namespace() }}",
"spark.openlineage.parentJobName": "{{ macros.OpenLineageProviderPlugin.lineage_job_name(task_instance) }}",
"spark.openlineage.parentRunId": "{{ macros.OpenLineageProviderPlugin.lineage_run_id(task_instance) }}",
},
)
See Lineage job & run macros for the definitions of referenced macros.
Once you have re-deployed your Airflow environment with the updated lazy_load_plugins config and the updated DAG file, and your Airflow DAG as been re-run, go to Data Jobs Monitoring page. You can then find your latest Airflow job run and see a SpanLink in the Airflow Job Run trace to the trace of the launched Spark Application. This makes it possible to debug issues in Airflow or Spark all in one place.