Connect Databricks for Warehouse-Native Experiment Analysis

Ce produit n'est pas pris en charge par le site Datadog que vous avez sélectionné. ().
Cette page n'est pas encore disponible en français, sa traduction est en cours.
Si vous avez des questions ou des retours sur notre projet de traduction actuel, n'hésitez pas à nous contacter.

Overview

Warehouse-native experiment analysis lets you run statistical computations directly in your data warehouse.

To set this up for Databricks, connect a Databricks service account to Datadog and configure your experiment settings. This guide covers:

Prerequisites

Datadog Experiments connects to Databricks through the Datadog Databricks integration. If you already have a Databricks integration configured for the workspace you plan to use, skip to Step 1. Otherwise, expand the section below to create a service principal.

In your Databricks Workspace:

  1. Click your profile in the top right corner and select Settings.
  2. In the Settings menu, click Identity and access.
  3. On the Service principals row, click Manage, then:
    1. Click Add service principal, then Add new.
    2. Enter a service principal name and click Add.
  4. Click the name of the new service principal to open its details page.
  5. Select the Permissions tab, then:
    1. Click Grant access.
    2. Under User, Group or Service Principal, enter the service principal name.
    3. Using the Permission dropdown, select Manage.
    4. Click Save.
  6. Select the Secrets tab, then:
    1. Click Generate secret.
    2. Set the Lifetime (days) value to the maximum allowed (for example, 730).
    3. Click Generate.
    4. Note your Secret and Client ID.
    5. Click Done.
  7. In the Settings menu, click Identity and access.
  8. On the Groups row, click Manage, then:
    1. Click admins, then Add members.
    2. Enter the service principal name and click Add.

After you create the service principal, continue to Step 1 to grant the required permissions.

If you plan to use other warehouse observability functionality in Datadog, see Datadog's Databricks integration documentation to determine which resources to enable.

Step 1: Grant permissions to the service principal

You must be an account admin to grant these permissions.

In your Databricks Workspace, open the SQL Editor to run the following commands and grant the service principal permissions for warehouse-native experiment analysis.

The Databricks Workspace with SQL Editor highlighted in the left navigation under the SQL section, Queries listed below it, a New Query tab open with the New SQL editor: ON toggle at the top, an empty query editor, and a Run all (1000) button with a dropdown arrow.

Grant read access to source tables

Grant the service principal read access to the tables containing your experiment metrics. Run both GRANT USE commands, then run the GRANT SELECT option that matches your access needs. Replace <catalog>, <schema>, <table>, and <principal> with the appropriate values.

GRANT USE CATALOG ON CATALOG <catalog> TO `<principal>`;
GRANT USE SCHEMA ON SCHEMA <catalog>.<schema> TO `<principal>`;

-- Option 1: Give read access to a single table
GRANT SELECT ON TABLE <catalog>.<schema>.<table> TO `<principal>`;

-- Option 2: Give read access to all tables in the schema
GRANT SELECT ON ALL TABLES IN SCHEMA <catalog>.<schema> TO `<principal>`;

Create an output schema

Run the following commands to create a schema where Datadog Experiments can write intermediate results and temporary tables. Replace datadog_experiments_output with your output schema name, and <catalog> and <principal> with the appropriate values.

CREATE SCHEMA IF NOT EXISTS <catalog>.datadog_experiments_output;
GRANT USE SCHEMA ON SCHEMA <catalog>.datadog_experiments_output TO `<principal>`;
GRANT CREATE TABLE ON SCHEMA <catalog>.datadog_experiments_output TO `<principal>`;

Configure a volume for temporary data staging

Datadog Experiments uses a volume to temporarily save exposure data before copying it into a Databricks table. Run the following commands to create and grant access to this volume. Replace datadog_experiments_output with your output schema name, and <catalog> and <principal> with the appropriate values.

CREATE VOLUME IF NOT EXISTS <catalog>.datadog_experiments_output.datadog_experiments_volume;
GRANT READ VOLUME ON VOLUME <catalog>.datadog_experiments_output.datadog_experiments_volume TO `<principal>`;
GRANT WRITE VOLUME ON VOLUME <catalog>.datadog_experiments_output.datadog_experiments_volume TO `<principal>`;

Grant SQL warehouse access

Grant the service principal access to the SQL warehouse that Datadog Experiments uses to run queries.

  1. Navigate to SQL Warehouses in your Databricks Workspace.
  2. Select the warehouse for Datadog Experiments.
  3. At the top right corner, click Permissions.
  4. Grant the service principal the Can use permission.
  5. Close the Manage permissions modal.

Step 2: Connect Databricks to Datadog

To connect your Databricks Workspace to Datadog for warehouse-native experiment analysis:

  1. Navigate to Datadog’s integrations page and search for Databricks.
  2. Click the Databricks tile to open its modal.
  3. Select the Configure tab and click Add Databricks Workspace. If this is your first Databricks account, the setup form appears automatically.
  4. Under the Connect a new Databricks Workspace section, enter:
    • Workspace Name.
    • Workspace URL.
    • Client ID.
    • Client Secret.
    • System Tables SQL Warehouse ID.
  5. Toggle off Jobs Monitoring and all other products.
  6. Toggle off the Metrics - Model Serving resource.
  7. Click Save Databricks Workspace.

Step 3: Configure experiment settings

Datadog supports one warehouse connection per organization. Connecting Databricks replaces any existing warehouse connection (for example, Snowflake).

After you set up your Databricks integration and workspace, configure the experiment settings in Datadog:

  1. Open Datadog Product Analytics.
  2. In the left navigation, hover over Settings and click Experiments.
  3. Select the Warehouse Connections tab.
  4. Click Connect a data warehouse. If you already have a warehouse connected, click Edit instead.
  5. Select the Databricks tile.
  6. Using the Account dropdown, select the Databricks Workspace you configured in Step 2.
  7. Enter the Catalog, Schema, and Volume name you configured in Step 1. If your catalog and schema do not appear in the dropdown, enter them manually to add them to the list.
  8. Click Save.
The Edit Data Warehouse modal with Databricks selected, showing input fields for Account, Catalog, Schema, and Volume Name.

After you save your warehouse connection, create experiment metrics using your Databricks data.

Further reading