- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Data Jobs Monitoring gives visibility into the performance and reliability of your Apache Spark and Databricks jobs.
Follow these steps to enable Data Jobs Monitoring for Databricks.
In your Databricks workspace, click on your profile in the top right corner and go to Settings. Select Developer in the left side bar. Next to Access tokens, click Manage.
Click Generate new token, enter “Datadog Integration” in the Comment field, remove the default value in Lifetime (days), and click Generate. Take note of your token.
Important:
As an alternative, follow the official Databricks documentation to generate access token for a service principal.
In Datadog, open the Databricks integration tile.
On the Configure tab, click Add Databricks Workspace.
Enter a workspace name, your Databricks workspace URL, and the Databricks token you generated.
In the Select products to set up integration section, make sure the Data Jobs Monitoring product is Enabled.
In the Datadog Agent Setup section, chooose either
The Datadog Agent must be installed on Databricks clusters to monitor Databricks jobs that run on all-purpose or job clusters.
Datadog can install and manage a global init script in the Databricks workspace. The Datadog Agent is installed on all clusters in the workspace, when they start.
In the Select products to set up integration section, make sure the Data Jobs Monitoring product is Enabled.
In the Datadog Agent Setup section, select the Managed by Datadog toggle button.
Click Select API Key to either select an existing Datadog API key or create a new Datadog API key.
Click Save Databricks Workspace.
On the Configure tab, click the workspace in the list of workspaces
Click the Configured Products tab
Make sure the Data Jobs Monitoring product is Enabled.
In the Datadog Agent Setup section, select the Managed by Datadog toggle button.
Click Select API Key to either select an existing Datadog API key or create a new Datadog API key.
Click Save at the bottom of the browser window.
In Databricks, click your display name (email address) in the upper right corner of the page.
Select Settings and click the Compute tab.
In the All purpose clusters section, next to Global init scripts, click Manage.
Click Add. Name your script. Then, in the Script field, copy and paste the following script, remembering to replace the placeholders with your parameter values.
#!/bin/bash
# Required parameters
export DD_API_KEY=<YOUR API KEY>
export DD_SITE=<YOUR DATADOG SITE>
export DATABRICKS_WORKSPACE="<YOUR WORKSPACE NAME>"
# Download and run the latest init script
bash -c "$(curl -L https://dd-data-jobs-monitoring-setup.s3.amazonaws.com/scripts/databricks/databricks_init_latest.sh)" || true
The script above sets the required parameters, downloads and runs the latest init script for Data Jobs Monitoring in Databricks. If you want to pin your script to a specific version, you can replace the file name in the URL with databricks_init_1.5.1.sh
to use the last stable version.
To enable the script for all new and restarted clusters, toggle Enabled.
Click Add.
Provide the values for the init script parameters at the beginning of the global init script.
export DD_API_KEY=<YOUR API KEY>
export DD_SITE=<YOUR DATADOG SITE>
export DATABRICKS_WORKSPACE="<YOUR WORKSPACE NAME>"
Optionally, you can also set other init script parameters and Datadog environment variables here, such as DD_ENV
and DD_SERVICE
. The script can be configured using the following parameters:
Variable | Description | Default |
---|---|---|
DD_API_KEY | Your Datadog API key. | |
DD_SITE | Your Datadog site. | |
DATABRICKS_WORKSPACE | Name of your Databricks Workspace. It should match the name provided in the Datadog-Databricks integration step. Enclose the name in double quotes if it contains whitespace. | |
DRIVER_LOGS_ENABLED | Collect spark driver logs in Datadog. | false |
WORKER_LOGS_ENABLED | Collect spark workers logs in Datadog. | false |
DD_DJM_ADD_LOGS_TO_FAILURE_REPORT | Include init script logs for debugging when reporting a failure back to Datadog. | false |
In Databricks, create a init script file in Workspace with the following content. Be sure to make note of the file path.
#!/bin/bash
# Download and run the latest init script
bash -c "$(curl -L https://dd-data-jobs-monitoring-setup.s3.amazonaws.com/scripts/databricks/databricks_init_latest.sh)" || true
The script above downloads and runs the latest init script for Data Jobs Monitoring in Databricks. If you want to pin your script to a specific version, you can replace the file name in the URL with databricks_init_1.3.1.sh
to use the last stable version.
On the cluster configuration page, click the Advanced options toggle.
At the bottom of the page, go to the Init Scripts tab.
- Under the **Destination** drop-down, select `Workspace`. - Under **Init script path**, enter the path to your init script. - Click **Add**.In Databricks, on the cluster configuration page, click the Advanced options toggle.
At the bottom of the page, go to the Spark tab.
In the Environment variables textbox, provide the values for the init script parameters.
DD_API_KEY=<YOUR API KEY>
DD_SITE=<YOUR DATADOG SITE>
DATABRICKS_WORKSPACE="<YOUR WORKSPACE NAME>"
Optionally, you can also set other init script parameters and Datadog environment variables here, such as DD_ENV
and DD_SERVICE
. The script can be configured using the following parameters:
Variable | Description | Default |
---|---|---|
DD_API_KEY | Your Datadog API key. | |
DD_SITE | Your Datadog site. | |
DATABRICKS_WORKSPACE | Name of your Databricks Workspace. It should match the name provided in the Datadog-Databricks integration step. Enclose the name in double quotes if it contains whitespace. | |
DRIVER_LOGS_ENABLED | Collect spark driver logs in Datadog. | false |
WORKER_LOGS_ENABLED | Collect spark workers logs in Datadog. | false |
DD_DJM_ADD_LOGS_TO_FAILURE_REPORT | Include init script logs for debugging when reporting a failure back to Datadog. | false |
The init script installs the Agent when clusters start.
Already-running all-purpose clusters or long-lived job clusters must be manually restarted for the init script to install the Datadog Agent.
For scheduled jobs that run on job clusters, the init script installs the Datadog Agent automatically on the next run.
In Datadog, view the Data Jobs Monitoring page to see a list of all your Databricks jobs.
If you don’t see any data in DJM after installing the product, follow those steps.
The init script installs the Datadog Agent. To make sure it is properly installed, ssh into the cluster and run the Agent status command:
sudo datadog-agent status
If the Agent is not installed, view the installation logs located in /tmp/datadog-djm-init.log
.
If you need further assistance from Datadog support, add the following environment variable to the init script. This ensures that logs are sent to Datadog when a failure occurs.
export DD_DJM_ADD_LOGS_TO_FAILURE_REPORT=true
런타임의 Spark 스팬(span)에서 태그를 설정할 수 있습니다. 본 태그는 태그가 추가된 후 실행되는 스팬(span)에만 적용됩니다.
// 다음 모든 Spark 컴퓨팅에 태그 추가
sparkContext.setLocalProperty("spark.datadog.tags.key", "value")
spark.read.parquet(...)
런타임 태그를 제거하려면 다음을 수행합니다.
// 다음 모든 Spark 컴퓨팅에서 태그 제거
sparkContext.setLocalProperty("spark.datadog.tags.key", null)
This configuration is applicable if you want cluster resource utilization data about your jobs and create a new job and cluster for each run via the one-time run API endpoint (common when using orchestration tools outside of Databricks such as Airflow or Azure Data Factory).
If you are submitting Databricks Jobs through the one-time run API endpoint, each job run has a unique job ID. This can make it difficult to group and analyze cluster metrics for jobs that use ephemeral clusters. To aggregate cluster utilization from the same job and assess performance across multiple runs, you must set the DD_JOB_NAME
variable inside the spark_env_vars
of every new_cluster
to the same value as your request payload’s run_name
.
Here’s an example of a one-time job run request body:
{
"run_name": "Example Job",
"idempotency_token": "8f018174-4792-40d5-bcbc-3e6a527352c8",
"tasks": [
{
"task_key": "Example Task",
"description": "Description of task",
"depends_on": [],
"notebook_task": {
"notebook_path": "/Path/to/example/task/notebook",
"source": "WORKSPACE"
},
"new_cluster": {
"num_workers": 1,
"spark_version": "13.3.x-scala2.12",
"node_type_id": "i3.xlarge",
"spark_env_vars": {
"DD_JOB_NAME": "Example Job"
}
}
}
]
}
With Databricks Networking Restrictions, Datadog may not have access to your Databricks APIs, which is required to collect traces for Databricks job executions along with tags and other metadata.
If you are controlling Databricks API access through IP access lists, allow-listing Datadog’s specific IP addresses allows your cluster to perform all these interactions with Datadog services. Please see Databricks documentation for more details on how to manage IP access lists in Databricks.
If you are using Databricks Private Connectivity, the steps to configure the connection depend on your cloud provider.
Refer to the guide for your cloud environment:
For further assistance, contact the Datadog support team.
추가 유용한 문서, 링크 및 기사: