---
title: Databricks (Zerobus) Destination
description: Datadog, the leading service for cloud-scale monitoring.
breadcrumbs: >-
  Docs > Observability Pipelines > Destinations > Databricks (Zerobus)
  Destination
---

# Databricks (Zerobus) Destination

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com, us2.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site.md). ().
{% /alert %}

{% /callout %}
Available for:
{% icon name="icon-logs" /%}
 Logs 
{% callout %}
##### Join the Preview!

The Databricks (Zerobus) destination is in Preview. Contact your account manager to request access.
{% /callout %}

## Overview{% #overview %}

Use Observability Pipelines' Databricks (Zerobus) destination to send logs to a Databricks Unity Catalog table. The destination streams logs to the [Zerobus Ingest API](https://docs.databricks.com/aws/en/ingestion/zerobus-overview) and authenticates to Databricks with an OAuth service principal.

## Prerequisites{% #prerequisites %}

Before you configure the Databricks (Zerobus) destination, you must:

- Set up a Unity Catalog schema and table that the Observability Pipelines Worker writes logs to.
- Set up a service principal that the Worker uses to authenticate to Databricks. The service principal needs permission to read and write to the table.

### Set up a schema and table{% #set-up-a-schema-and-table %}

The SQL examples in this section use the following placeholders:

| Placeholder               | Description                                | Example                       |
| ------------------------- | ------------------------------------------ | ----------------------------- |
| `<USER>`                  | The user who creates the schema and table. | `databricks-user@example.com` |
| `<CATALOG_NAME>`          | The Unity Catalog name.                    | `main`                        |
| `<SCHEMA_NAME>`           | The schema name.                           | `obs_pipelines`               |
| `<TABLE_NAME>`            | The table name.                            | `apache_common_logs`          |
| `<YOUR_MANAGED_LOCATION>` | (Optional) The managed location URI.       | `s3://your-bucket/managed`    |

**Note**: The `GRANT` commands must be run by a Databricks workspace admin.

In the Databricks workspace:

1. If you're not a Databricks workspace admin, have an admin run the following command to grant your user permission to create a schema:

   ```sql
   GRANT CREATE SCHEMA ON CATALOG <CATALOG_NAME> TO <USER>;
   ```

1. Create the schema:

   ```sql
   CREATE SCHEMA IF NOT EXISTS <CATALOG_NAME>.<SCHEMA_NAME>
   MANAGED LOCATION '<YOUR_MANAGED_LOCATION>';
   ```

   - **Note**: `MANAGED LOCATION` is optional. See Databricks' [Create Schemas](https://docs.databricks.com/aws/en/schemas/create-schema) documentation for more information.

1. If you're not an admin user, have an admin run the following command to grant your user permission to create a table on the schema:

   ```sql
   GRANT CREATE TABLE ON SCHEMA <CATALOG_NAME>.<SCHEMA_NAME> TO <USER>;
   ```

1. Run the following command to create the table that Observability Pipelines writes log data to:

   ```sql
   CREATE TABLE <CATALOG_NAME>.<SCHEMA_NAME>.<TABLE_NAME> (
     host STRING,
     message STRING,
     service STRING,
     source_type STRING,
     timestamp TIMESTAMP
   );
   ```

   - See Databricks' [Create a Unity Catalog Managed Table](https://docs.databricks.com/aws/en/tables/managed#create-a-managed-table) documentation for more information.

The fully qualified table name is `catalog.schema.table`, for example `main.obs_pipelines.apache_common_logs`. This is the value you enter for **Table Name** when you set up the Observability Pipelines Databricks destination.

### Set up a service principal{% #set-up-a-service-principal %}

The Databricks [Zerobus Ingest API](https://docs.databricks.com/aws/en/ingestion/zerobus-overview) uses OAuth authentication. When you create the service principal, the OAuth client secret is generated and the OAuth client ID is the service principal's UUID.

To create a service principal:

1. In your Databricks workspace, navigate to **User Settings** > **Identity and access** > **Service principals**.
1. Click **Add service principal**.
1. After the service principal is created, generate an OAuth secret for it.
   - Take note of the service principal's **Application ID** (client ID) and the OAuth client secret. You need both of them when you configure the Observability Pipelines Databricks destination.
1. Run this SQL in Databricks to grant the service principal access to the catalog, schema, and table. Replace `<SERVICE_PRINCIPAL_UUID>` with the service principal's application ID from the previous step:
   ```sql
   GRANT USE CATALOG ON CATALOG <CATALOG_NAME> TO <SERVICE_PRINCIPAL_UUID>;
   GRANT USE SCHEMA ON SCHEMA <CATALOG_NAME>.<SCHEMA_NAME> TO <SERVICE_PRINCIPAL_UUID>;
   GRANT SELECT, MODIFY ON TABLE <CATALOG_NAME>.<SCHEMA_NAME>.<TABLE_NAME> TO <SERVICE_PRINCIPAL_UUID>;
   ```

See Databricks' [Add service principals to your account](https://docs.databricks.com/aws/en/admin/users-groups/manage-service-principals#-add-service-principals-to-your-account) and [Grant permissions on an object](https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/?language=Catalog%C2%A0Explorer#-grant-permissions-on-an-object) documentation for more information.

## Setup{% #setup %}

Configure the Databricks (Zerobus) destination when you [set up a pipeline](https://docs.datadoghq.com/observability_pipelines/configuration/set_up_pipelines.md). You can set up a pipeline in the [UI](https://app.datadoghq.com/observability-pipelines), using the [API](https://docs.datadoghq.com/api/latest/observability-pipelines.md), or with [Terraform](https://registry.terraform.io/providers/datadog/datadog/latest/docs/resources/observability_pipeline). The steps in this section are configured in the UI.

**Note**: Log fields that are not present in the table schema are dropped. For example, if a log has the fields `id`, `name`, and `host`, and the table schema only contains the columns `name` and `host`, then the `id` field is dropped and not written to the table.

After you select the Databricks (Zerobus) destination in the pipeline UI:

{% alert level="warning" %}
Databricks (Zerobus) doesn't convert timestamps in string format to Databricks' [`TIMESTAMP` type](https://docs.databricks.com/aws/en/sql/language-manual/data-types/timestamp-type). If your table uses a timestamp column, see Convert string timestamps to timestamp format for more information.
{% /alert %}

{% alert level="danger" %}
Only enter the identifier for the OAuth client secret. Do not enter the actual value.
{% /alert %}

1. Enter the **Ingestion Endpoint** for your Databricks workspace, such as `https://<workspace_id>.zerobus.<region>.cloud.databricks.com`. The Worker sends logs to this endpoint.
1. Enter the **Table Name** in the format `catalog.schema.table`, such as `main.obs_pipelines.apache_common_logs`.
1. Enter the **Unity Catalog Endpoint** for your Databricks workspace, such as `https://<workspace>.cloud.databricks.com`. The Worker uses this endpoint to read the table's schema.
1. In the **Auth - Client ID** field, enter the application ID of the service principal, such as `abcdefgh-1234-5678-abcd-ef0123456789`.
1. In the **Auth - Client Secret** field, enter the identifier for your OAuth client secret. If you leave it blank, the default is used.

### Optional settings{% #optional-settings %}

#### Buffering{% #buffering %}

Toggle the switch to enable **Buffering Options**. Enable a configurable buffer on your destination to ensure intermittent latency or an outage at the destination doesn't create immediate backpressure, and allow events to continue to be ingested from your source. Disk buffers can also increase pipeline durability by writing data to disk, ensuring buffered data persists through a Worker restart. See [Destination buffers](https://docs.datadoghq.com/observability_pipelines/scaling_and_performance/buffering_and_backpressure.md#destination-buffers) for more information.

- If left unconfigured, your destination uses a memory buffer with a capacity of 500 events.
- To configure a buffer on your destination:
  1. Select the buffer type you want to set (**Memory** or **Disk**).
  1. Enter the buffer size and select the unit.
     1. Maximum memory buffer size is 128 GB.
     1. Maximum disk buffer size is 500 GB.
  1. In the **Behavior on full buffer** dropdown menu, select whether you want to **block** events or **drop new events** when the buffer is full.

### Convert string timestamps to timestamp format{% #convert-string-timestamps-to-timestamp-format %}

If your logs have timestamps in string format and your Databricks table has a timestamp column declared as a [`TIMESTAMP` type](https://docs.databricks.com/aws/en/sql/language-manual/data-types/timestamp-type), you must convert the string to timestamp format before sending logs to the Databricks (Zerobus) destination. Databricks (Zerobus) can only convert the timestamp format to its `TIMESTAMP` type.

If you do not convert the string timestamp, the Worker throws an error similar to:

```
Protobuf encoding failed: Error converting timestamp field: Can't convert '2012-04-23T10[41]15Z' to i64: invalid digit found in string
```

To convert timestamps in string format to timestamp format:

1. Add a [Custom Processor](https://docs.datadoghq.com/observability_pipelines/processors/custom_processor.md#setup) to your pipeline.
1. Add a function with the following custom script:
   ```
   .timestamp = parse_timestamp!(.timestamp, format: "%+")
   ```
See [parse_timestamp](https://docs.datadoghq.com/observability_pipelines/processors/custom_processor.md#parse_timestamp) for more information.

## Secret defaults{% #secret-defaults %}

These are the defaults used for secret identifiers and environment variables.

**Note**: If you enter secret identifiers and then choose to use environment variables, the environment variable is the identifier entered and prepended with `DD_OP`. For example, if you entered `PASSWORD_1` for a password identifier, the environment variable for that password is `DD_OP_PASSWORD_1`.

{% tab title="Secrets Management" %}

- Databricks OAuth client secret identifier:
  - References the OAuth client secret for the service principal the Observability Pipelines Worker uses to authenticate to Databricks.
  - The default identifier is `DESTINATION_DATABRICKS_ZEROBUS_OAUTH_CLIENT_SECRET`.

{% /tab %}

{% tab title="Environment Variables" %}

- Databricks OAuth client secret:
  - The OAuth client secret for the service principal the Observability Pipelines Worker uses to authenticate to Databricks.
  - The default environment variable is `DD_OP_DESTINATION_DATABRICKS_ZEROBUS_OAUTH_CLIENT_SECRET`.

{% /tab %}

## How the destination works{% #how-the-destination-works %}

### Event batching{% #event-batching %}

A batch of events is flushed when one of these parameters is met. See [event batching](https://docs.datadoghq.com/observability_pipelines/destinations.md#event-batching) for more information.

| Maximum Events | Maximum Size (MB) | Timeout (seconds) |
| -------------- | ----------------- | ----------------- |
| None           | 10                | 1                 |
