Enable Data Jobs Monitoring for Databricks

Join the Beta!

Data Jobs Monitoring is in private beta. Fill out this form to join the wait list.

Request Access

Data Jobs Monitoring gives visibility into the performance and reliability of your Apache Spark and Databricks jobs.

Setup

Follow these steps to enable Data Jobs Monitoring for Databricks.

  1. Configure the Datadog-Databricks integration with your Databricks API token.
  2. Install the Datadog Agent on your Databricks cluster(s).
  3. Add your Datadog API key in Databricks.

Configure the Datadog-Databricks integration

  1. In your Databricks workspace, go to Settings > Developer. Next to Access tokens, click Manage.
  2. Click Generate new token, enter a comment, and click Generate. Take note of your token.
  3. In Datadog, open the Databricks integration tile.
  4. On the Configure tab, click Add New.
  5. Enter a workspace name, your Databricks workspace URL, and the Databricks token you generated.
    In the Datadog-Databricks integration tile, a Databricks workspace is displayed. This workspace has a name, URL, and API token.

Install the Datadog Agent on your Databricks cluster(s)

Use one of the following init scripts:

You can choose to install the Agent globally, or on a specific Databricks cluster.

  1. In Databricks, go to Settings > Compute. In the All purpose clusters section, next to Global init scripts, click Manage.

  2. Click Add. Name your script. Then, in the Script field, copy and paste the init script.

  3. To enable the script for all new and restarted clusters, toggle Enabled.

    Databricks UI, admin settings, global init scripts. A script called 'install-datadog-agent' is in a list with an enabled toggle.
  4. Click Add.

  1. Download the init script.

  2. In Databricks, on the cluster configuration page, click the Advanced options toggle.

  3. At the bottom of the page, go to the Init Scripts tab.

    Databricks UI, cluster configuration advanced options,  Init Scripts tab. A 'Destination' drop-down and an 'Init script path' file selector.
    • Under the Destination drop-down, select Workspace.
    • Under Init script path, enter the path to your init script.
    • Click Add.

Add your Datadog API key in Databricks

  1. Find your Datadog API key.

  2. In Databricks, on the cluster configuration page, click the Advanced options toggle.

  3. At the bottom of the page, go to the Spark tab.

    Databricks UI, cluster configuration advanced options, Spark tab. A textbox titled 'Environment variables' contains values for DD_API_KEY and DD_SITE.

    In the Environment variables textbox, set values for:

    For example, if your Datadog site is , paste the following into the box:

    DD_API_KEY=<YOUR API KEY>
    DD_SITE=
    

    Optionally, you can also set other Datadog environment variables here, such as DD_ENV and DD_SERVICE.

  4. Click Confirm.

Validation

In Datadog, view the Data Jobs Monitoring page to see a list of all your Databricks jobs.

Tag spans at runtime

You can set tags on Spark spans at runtime. These tags are applied only to spans that start after the tag is added.

// Add tag for all next Spark computations
sparkContext.setLocalProperty("spark.datadog.tags.key", "value")
spark.read.parquet(...)

To remove a runtime tag:

// Remove tag for all next Spark computations
sparkContext.setLocalProperty("spark.datadog.tags.key", null)

Further Reading

Additional helpful documentation, links, and articles: