- Essentials
- In The App
- Infrastructure
- Application Performance
- Log Management
- Security Platform
- UX Monitoring
- Administration
Supported OS
Monitor your Databricks clusters with the Datadog Spark integration.
Monitor Databricks Spark applications with the Datadog Spark integration. Install the Datadog Agent on your clusters following the Configuration instructions for your appropriate cluster.
Configure the Spark integration to monitor your Apache Spark Cluster on Databricks and collect system and Spark metrics.
Determine the best init script below for your Databricks cluster environment.
Copy and run the contents into a notebook. The notebook creates an init script that installs a Datadog Agent on your clusters. The notebook only needs to be run once to save the script as a global configuration. For more information about the Databricks Datadog Init scripts, see Apache Spark Cluster Monitoring with Databricks and Datadog.
<init-script-folder>
path to where you want your init scripts to be saved in.Configure a new Databricks cluster with the cluster-scoped init script path using the UI, Databricks CLI, or invoking the Clusters API.
DD_API_KEY
environment variable in the cluster’s Advanced Options with your Datadog API key.DD_ENV
environment variable under Advanced Options to add a global environment tag to better identify your clusters.DD_SITE
to your site URL.Install the Datadog Agent on the driver node of the cluster. This is a updated version of the Datadog Init Script Databricks notebook example.
After creating the datadog-install-driver-only.sh
script, add the init script path in the cluster configuration page.
%python
dbutils.fs.put("dbfs:/<init-script-folder>/datadog-install-driver-only.sh","""
#!/bin/bash
echo "Running on the driver? $DB_IS_DRIVER"
echo "Driver ip: $DB_DRIVER_IP"
cat <<EOF >> /tmp/start_datadog.sh
#!/bin/bash
if [[ \${DB_IS_DRIVER} = "TRUE" ]]; then
echo "On the driver. Installing Datadog ..."
# CONFIGURE HOST TAGS FOR CLUSTER
DD_TAGS="environment:\${DD_ENV}","databricks_cluster_id:\${DB_CLUSTER_ID}","databricks_cluster_name:\${DB_CLUSTER_NAME}","spark_host_ip:\${SPARK_LOCAL_IP}","spark_node:driver"
# INSTALL THE LATEST DATADOG AGENT 7
DD_AGENT_MAJOR_VERSION=7 DD_API_KEY=\$DD_API_KEY DD_HOST_TAGS=DD_TAGS bash -c "\$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"
# WAIT FOR DATADOG AGENT TO BE INSTALLED
while [ -z \$datadoginstalled ]; do
if [ -e "/etc/datadog-agent/datadog.yaml" ]; then
datadoginstalled=TRUE
fi
sleep 2
done
echo "Datadog Agent is installed"
# ENABLE LOGS IN datadog.yaml TO COLLECT DRIVER LOGS
echo "logs_enabled: true" >> /etc/datadog-agent/datadog.yaml
# WAITING UNTIL MASTER PARAMS ARE LOADED, THEN GRABBING IP AND PORT
while [ -z \$gotparams ]; do
if [ -e "/tmp/master-params" ]; then
DB_DRIVER_PORT=\$(cat /tmp/master-params | cut -d' ' -f2)
gotparams=TRUE
fi
sleep 2
done
hostip=\$(hostname -I | xargs)
# WRITING CONFIG FILE FOR SPARK INTEGRATION WITH STRUCTURED STREAMING METRICS ENABLED AND LOGS CONFIGURATION
# MODIFY TO INCLUDE OTHER OPTIONS IN spark.d/conf.yaml.example
echo "init_config:
instances:
- spark_url: http://\$DB_DRIVER_IP:\$DB_DRIVER_PORT
spark_cluster_mode: spark_standalone_mode
cluster_name: \${hostip}
streaming_metrics: true
logs:
- type: file
path: /databricks/driver/logs/*.log
source: spark
service: databricks
log_processing_rules:
- type: multi_line
name: new_log_start_with_date
pattern: \d{2,4}[\-\/]\d{2,4}[\-\/]\d{2,4}.*" > /etc/datadog-agent/conf.d/spark.yaml
# RESTARTING AGENT
sudo service datadog-agent restart
fi
EOF
# CLEANING UP
if [ \$DB_IS_DRIVER ]; then
chmod a+x /tmp/start_datadog.sh
/tmp/start_datadog.sh >> /tmp/datadog_start.log 2>&1 & disown
fi
""", True)
After creating the datadog-install-driver-workers.sh
script, add the init script path in the cluster configuration page.
%python
dbutils.fs.put("dbfs:/<init-script-folder>/datadog-install-driver-workers.sh","""
#!/bin/bash
cat <<EOF >> /tmp/start_datadog.sh
#!/bin/bash
hostip=$(hostname -I | xargs)
if [[ \${DB_IS_DRIVER} = "TRUE" ]]; then
echo "Installing Datadog agent in the driver (master node) ..."
# CONFIGURE HOST TAGS FOR DRIVER
DD_TAGS="environment:\${DD_ENV}","databricks_cluster_id:\${DB_CLUSTER_ID}","databricks_cluster_name:\${DB_CLUSTER_NAME}","spark_host_ip:\${SPARK_LOCAL_IP}","spark_node:driver"
# INSTALL THE LATEST DATADOG AGENT 7 ON DRIVER AND WORKER NODES
DD_AGENT_MAJOR_VERSION=7 DD_API_KEY=\$DD_API_KEY DD_HOST_TAGS=\$DD_TAGS bash -c "\$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"
# WAIT FOR DATADOG AGENT TO BE INSTALLED
while [ -z \$datadoginstalled ]; do
if [ -e "/etc/datadog-agent/datadog.yaml" ]; then
datadoginstalled=TRUE
fi
sleep 2
done
echo "Datadog Agent is installed"
# ENABLE LOGS IN datadog.yaml TO COLLECT DRIVER LOGS
echo "logs_enabled: true" >> /etc/datadog-agent/datadog.yaml
while [ -z \$gotparams ]; do
if [ -e "/tmp/driver-env.sh" ]; then
DB_DRIVER_PORT=\$(grep -i "CONF_UI_PORT" /tmp/driver-env.sh | cut -d'=' -f2)
gotparams=TRUE
fi
sleep 2
done
# WRITING CONFIG FILE FOR SPARK INTEGRATION WITH STRUCTURED STREAMING METRICS ENABLED
# MODIFY TO INCLUDE OTHER OPTIONS IN spark.d/conf.yaml.example
echo "init_config:
instances:
- spark_url: http://\${DB_DRIVER_IP}:\${DB_DRIVER_PORT}
spark_cluster_mode: spark_driver_mode
cluster_name: \${hostip}
streaming_metrics: true
logs:
- type: file
path: /databricks/driver/logs/*.log
source: spark
service: databricks
log_processing_rules:
- type: multi_line
name: new_log_start_with_date
pattern: \d{2,4}[\-\/]\d{2,4}[\-\/]\d{2,4}.*" > /etc/datadog-agent/conf.d/spark.yaml
else
# CONFIGURE HOST TAGS FOR WORKERS
DD_TAGS="environment:\${DD_ENV}","databricks_cluster_id:\${DB_CLUSTER_ID}","databricks_cluster_name:\${DB_CLUSTER_NAME}","spark_host_ip:\${SPARK_LOCAL_IP}","spark_node:worker"
# INSTALL THE LATEST DATADOG AGENT 7 ON DRIVER AND WORKER NODES
DD_AGENT_MAJOR_VERSION=7 DD_API_KEY=\$DD_API_KEY DD_HOST_TAGS=\$DD_TAGS bash -c "\$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"
fi
# RESTARTING AGENT
sudo service datadog-agent restart
EOF
# CLEANING UP
chmod a+x /tmp/start_datadog.sh
/tmp/start_datadog.sh >> /tmp/datadog_start.log 2>&1 & disown
""", True)
After creating the datadog-install-job-driver-mode.sh
script, add the init script path in the cluster configuration page.
Note: Job clusters are monitored in spark_driver_mode
with the Spark UI port.
%python
dbutils.fs.put("dbfs:/<init-script-folder>/datadog-install-job-driver-mode.sh","""
#!/bin/bash
echo "Running on the driver? $DB_IS_DRIVER"
echo "Driver ip: $DB_DRIVER_IP"
cat <<EOF >> /tmp/start_datadog.sh
#!/bin/bash
if [ \$DB_IS_DRIVER ]; then
echo "On the driver. Installing Datadog ..."
# CONFIGURE HOST TAGS FOR DRIVER
DD_TAGS="environment:\${DD_ENV}","databricks_cluster_id:\${DB_CLUSTER_ID}","databricks_cluster_name:\${DB_CLUSTER_NAME}","spark_host_ip:\${SPARK_LOCAL_IP}","spark_node:driver"
# INSTALL THE LATEST DATADOG AGENT 7 ON DRIVER AND WORKER NODES
DD_AGENT_MAJOR_VERSION=7 DD_API_KEY=\$DD_API_KEY DD_HOST_TAGS=\$DD_TAGS bash -c "\$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"
# WAIT FOR DATADOG AGENT TO BE INSTALLED
while [ -z \$datadoginstalled ]; do
if [ -e "/etc/datadog-agent/datadog.yaml" ]; then
datadoginstalled=TRUE
fi
sleep 2
done
echo "Datadog Agent is installed"
# ENABLE LOGS IN datadog.yaml TO COLLECT DRIVER LOGS
echo "logs_enabled: true" >> /etc/datadog-agent/datadog.yaml
while [ -z \$gotparams ]; do
if [ -e "/tmp/driver-env.sh" ]; then
DB_DRIVER_PORT=\$(grep -i "CONF_UI_PORT" /tmp/driver-env.sh | cut -d'=' -f2)
gotparams=TRUE
fi
sleep 2
done
current=\$(hostname -I | xargs)
# WRITING SPARK CONFIG FILE
echo "init_config:
instances:
- spark_url: http://\$DB_DRIVER_IP:\$DB_DRIVER_PORT
spark_cluster_mode: spark_driver_mode
cluster_name: \$current
logs:
- type: file
path: /databricks/driver/logs/*.log
source: spark
service: databricks
log_processing_rules:
- type: multi_line
name: new_log_start_with_date
pattern: \d{2,4}[\-\/]\d{2,4}[\-\/]\d{2,4}.*" > /etc/datadog-agent/conf.d/spark.yaml
# RESTARTING AGENT
sudo service datadog-agent restart
fi
EOF
# CLEANING UP
if [ \$DB_IS_DRIVER ]; then
chmod a+x /tmp/start_datadog.sh
/tmp/start_datadog.sh >> /tmp/datadog_start.log 2>&1 & disown
fi
""", True)
Run the Agent’s status subcommand and look for spark
under the Checks section.
See the Spark integration documentation for a list of metrics collected.
See the Spark integration documentation for the list of service checks collected.
The Databricks integration does not include any events.
Need help? Contact Datadog support.
Additional helpful documentation, links, and articles: