The Quality Monitoring integration with Databricks only supports Unity Catalog. Reach out to your account representative if you use Hive Metastore.

Overview

The Databricks integration connects Datadog to your Databricks workspace to sync metadata, query history, and table-level metrics. Use it to monitor data freshness, detect anomalies, and trace lineage across your data stack.

Note: The instructions below are for Quality Monitoring. For Jobs Monitoring, see Enable Data Jobs Monitoring for Databricks.

Set up your account in Databricks

Step 1 - Connect the Databricks integration tile

  1. Complete the installation instructions in the Databricks integration documentation using Datadog’s integration tile. Take note of the service principal’s application ID and save it somewhere safe, because it will be referenced later.

    Note: Workspace Admin permissions are not required for Quality Monitoring.

  2. When configuring the integration, turn on the Data Observability toggle.

  3. Click Save Databricks Workspace.

Step 2 - Grant access

Grant read-only access to the scope of data you want to monitor:

Use the full catalog access option for simpler setup. It automatically includes future tables without needing to update permissions.

GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<application_id>`;
GRANT USE_SCHEMA ON CATALOG <catalog_name> TO `<application_id>`;
GRANT SELECT ON CATALOG <catalog_name> TO `<application_id>`;

Use the specific tables option for least-privilege access or if you only need to monitor a subset of your data. You must update permissions when adding new tables.

GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<application_id>`;
GRANT USE_SCHEMA ON SCHEMA <catalog_name>.<schema_name> TO `<application_id>`;
GRANT SELECT ON TABLE <catalog_name>.<schema_name>.<table_name> TO `<application_id>`;

These permissions are needed for the following reasons:

  • GRANT USE_CATALOG is required to navigate into the catalog and discover schemas.
  • GRANT USE_SCHEMA is required to enumerate tables and monitor schema-level health.
  • GRANT SELECT is required for data quality monitoring, such as custom SQL or distribution checks.

Next steps

After you configure the integration, Datadog begins syncing your metadata and query history in the background. Initial syncs can take several hours depending on the size of your Databricks deployment.

Further reading

Additional helpful documentation, links, and articles: