---
title: Enrichment Table Processor
description: Datadog, the leading service for cloud-scale monitoring.
breadcrumbs: Docs > Observability Pipelines > Processors > Enrichment Table Processor
---

# Enrichment Table Processor

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site). ().
{% /alert %}

{% /callout %}
Available for:
{% icon name="icon-logs" /%}
 Logs 
## Overview{% #overview %}

Logs can contain information like IP addresses, user IDs, or service names that often need additional context. With the Enrichment Table processor, you can add context to your logs, using lookup datasets stored in Datadog [Reference Tables](https://docs.datadoghq.com/reference_tables/?tab=cloudstorage), local files, or MaxMind GeoIP tables. The processor matches logs based on a specified key and appends information from your lookup file to the log. If you use Reference Tables, you can connect to and enrich logs with SaaS-based datasets directly stored in ServiceNow, Snowflake, S3, and more.

### When to use this processor{% #when-to-use-this-processor %}

The following are use cases for enriching logs from integrations.

#### Cloud object storage{% #cloud-object-storage %}

Cloud object storage services (Amazon S3, Azure Blob Storage, Google Cloud Storage) are scalable storage services for large volumes of structured and unstructured reference data.

Use the Enrichment Table processor to enrich logs with externally maintained reference datasets, such as threat intelligence feeds, allow- and denylists, asset inventories, compliance mappings stored as CSVs, or other types of files that are updated regularly.

#### Databricks{% #databricks %}

Databricks is a cloud-based data lakehouse used for machine learning (ML), advanced analytics, and big data workloads.

Use the Enrichment Table processor to:

- Add predictions or scores generated by ML models, such as fraud likelihoods and anomaly detection results.
- Reference datasets stored in Databricks, such as customer profiles, device information, or security info.

In Datadog's Databricks integration documentation, see [Reference Tables Configuration](https://docs.datadoghq.com/integrations/databricks/?tab=useaserviceprincipalforoauth#reference-table-configuration) for information on how to set up Reference Tables for Databricks.

#### Salesforce{% #salesforce %}

Salesforce is a Customer Relationship Management (CRM) tool used to track and store sales opportunities, accounts, contacts, deals, and contracts.

Use the Enrichment Table processor to:

- Attach customer and account information, such as the industry type, ARR, and owner, to operational logs for prioritizing incidents.
- Enrich marketing- or sales-focused dashboards with operational signals like latency spikes tied to customers.

In Datadog's Salesforce integration documentation, see [Enable ingestion of reference tables](https://docs.datadoghq.com/integrations/salesforce/#optional-enable-ingestion-of-reference-tables) for information on how to set up Reference Tables for Salesforce.

#### ServiceNow (CMDB){% #servicenow-cmdb %}

ServiceNow is an IT service management platform with a Configuration Management Database (CMDB) that tracks infrastructure assets, applications, and dependencies.

Use the Enrichment Table processor to:

- Enrich logs with infrastructure ownership and dependency context, such as which team owns the host and which business unit that team supports.
- Add information directly from CMDB records to telemetry.

In Datadog's ServiceNow CMDB documentation, see [Reference Tables](https://docs.datadoghq.com/integrations/guide/servicenow-cmdb-enrichment-setup/#reference-tables) for information on how to set up Reference Tables for ServiceNow CMDB.

#### Snowflake{% #snowflake %}

Snowflake is a cloud-native data warehouse/lake that centralizes structured and semi-structured data.

Use the Enrichment Table processor to:

- Add customer metadata (account tier, region, SLA) to logs.
- Join security events with user or asset attributes stored in Snowflake.

In Datadog's Snowflake integration documentation, see [Reference Tables](https://docs.datadoghq.com/integrations/snowflake-web/#reference-tables) for information on how to set up Reference Tables for Snowflake.

## Setup{% #setup %}

To set up the Enrichment Table processor:

1. Click **Add enrichment**.
1. Define a **filter query**. Only logs that match the specified filter query are sent through the processor. **Note**: All logs, regardless of whether they match the filter query, are sent to the next step in the pipeline. See [Search Syntax](https://docs.datadoghq.com/observability_pipelines/search_syntax/logs/) for more information.
1. In the **Set lookup mapping** section, select the type of lookup dataset you want to use.
   {% tab title="Reference Table" %}

   1. Select the Reference Table in the dropdown menu. See Using reference tables for more information.
   1. Click **Manage** to go to the Reference Tables configuration page.
   1. (Optional) Select specific columns with which to enrich your logs.
      - Observability Pipelines enriches logs with all columns in the table by default. Each column in the table is added as an attribute to the log, where the attribute name is the column name and the attribute value is the column value.
      - If you want to enrich your logs with specific columns from your Reference Table, select the columns' corresponding attributes in the dropdown menu.
   1. Enter a Datadog Application key identifier. Observability Pipelines uses [application keys](https://docs.datadoghq.com/account_management/api-app-keys/#application-keys) to access Datadog's programmatic API when enriching data. Ensure you application key is:
      - Associated with a [Service Account](https://docs.datadoghq.com/account_management/org_settings/service_accounts#service-account-application-keys) (not a personal Datadog user account).
      - Limited to the [`reference_tables_read`](https://docs.datadoghq.com/account_management/rbac/permissions/#reference-tables) scope.
   1. Enter the source attribute of the log. The source attribute's value is what you want Observability Pipelines to find in the Reference Table. See the Enrichment file example for more information.
   1. Enter the target attribute. The target attribute's value stores, as a JSON object, the information found in the Reference Table. See the Enrichment file example for more information.
   1. Click **Save**.

   {% /tab %}

   {% tab title="File" %}

   1. Enter the file path.
      - **Note**: All file paths are made relative to the configuration data directory, which is `/var/lib/observability-pipelines-worker/config/` by default. The file must be owned by the `observability-pipelines-worker group` and `observability-pipelines-worker` user, or at least readable by the group or user. See [Advanced Worker Configurations](https://docs.datadoghq.com/observability_pipelines/configuration/install_the_worker/advanced_worker_configurations/) for more information.
   1. Enter the column name. The column name in the enrichment table is used for matching the source attribute value. See the Enrichment file example for more information.
   1. Enter the source attribute of the log. The source attribute's value is what you want Observability Pipelines to find in the Reference Table.
   1. Enter the target attribute. The target attribute's value stores the information found in the Reference Table as a JSON object.
   1. Click **Save**.

   {% /tab %}

   {% tab title="GeoIP" %}

   1. For GeoIP, enter the GeoIP path to your `.mmdb` file relative to the `<DD_OP_DATA_DIR>/config` directory.
      - **Note**: All file paths are made relative to the configuration data directory, which is `/var/lib/observability-pipelines-worker/config/` by default. The file must be owned by the `observability-pipelines-worker group` and `observability-pipelines-worker` user, or at least readable by the group or user. See [Advanced Worker Configurations](https://docs.datadoghq.com/observability_pipelines/configuration/install_the_worker/advanced_worker_configurations/) for more information.
   1. Enter the source attribute of the log. The source attribute's value is what you want Observabiity Pipelines to find in the Reference Table. See the Enrichment file example for more information.
   1. Enter the target attribute. The target attribute's value stores the information found in the Reference Table as a JSON object. See the Enrichment file example for more information.
   1. Click **Save**.

   {% /tab %}

##### Enrichment file example{% #enrichment-file-example %}

For this example:

- This is the Reference Table that the enrichment processor uses:
| merch_id | merchant_name   | city      | state    |
| -------- | --------------- | --------- | -------- |
| 803      | Andy's Ottomans | Boise     | Idaho    |
| 536      | Cindy's Couches | Boulder   | Colorado |
| 235      | Debra's Benches | Las Vegas | Nevada   |
- `merchant_id` is used as the source attribute and `merchant_info` as the target attribute.
- `merch_id` is set as the column name the processor uses to find the source attribute's value. **Note**: The source attribute's value does not have to match the column name.

If the enrichment processor receives a log with `"merchant_id":"536"`:

- The processor looks for the value `536` in the Reference Table's `merch_id` column.
- After it finds the value, it adds the entire row of information from the Reference Table to the `merchant_info` attribute as a JSON object:

```
merchant_info {
    "merchant_name":"Cindy's Couches",
    "city":"Boulder",
    "state":"Colorado"
}
```

## How the processor works{% #how-the-processor-works %}

### Using Reference Tables{% #using-reference-tables %}

[Reference Tables](https://docs.datadoghq.com/reference_tables/?tab=cloudstorage#reference-table-limits) allow you to store information like customer details, asset lists, and service dependency information in Datadog. The Enrichment Table processor pulls rows from Reference Tables on demand and caches them locally. Table rows persist in the cache for about 10 minutes (30 minutes for a negative lookup, where the row was not found in the table). After that, they are evicted or refreshed.

When the processor encounters a log that does not have a corresponding row in the cache, the log data is buffered in memory until the row is retrieved from the Reference Table. If the buffer reaches its maximum capacity, it begins sending the oldest buffered logs downstream without enrichment. The processor does not exert upstream backpressure.

A request to read the Reference Tables is sent every second or when 250 keys are queued for a lookup.

If an authentication error occurs while connecting to the Reference Table or after a series of failed requests, Datadog flushes buffered logs downstream without enrichment, to prevent the logs from waiting indefinitely, and the buffer stops accepting new logs. The processor periodically retries requests and automatically resumes normal operations when a request succeeds.

If an error that causes a log to be sent without enrichment occurs, you can view it in the Worker logs. It also increments the `pipelines.component_errors_total` metric.

Datadog does not recommend using the processor on a log field with high cardinality (in the order of 10,000 or more possible values within a time frame of 10 minutes). The Reference Tables API is subject to rate limits and might deny Worker requests. Reach out to [Datadog support](https://docs.datadoghq.com/help/) if you continue to notice rate limit warnings in the Worker logs while running the processor.

### Metrics{% #metrics %}

#### Processor metrics{% #processor-metrics %}

To see metrics about your Enrichment Table processor, add the tags `component_type=enrichment_table` and `component_id=<processor_id>` to processor metrics:

{% dl %}

{% dt %}
`pipelines.enrichment_rows_not_found_total`
{% /dt %}

{% dd %}
Number of processed logs that do not have corresponding rows in the table.
{% /dd %}

{% dt %}
`pipelines.component_errors_total`
{% /dt %}

{% dd %}
Number of logs that cannot be enriched because of an error. These errors are reported with the tag `error_code=did_not_enrich_event`.
{% /dd %}

{% dd %}
The tag `reason` may contain the following values:- `target_exists`: The target value to store the enriched data already exists and is not an object.- `too_many_pending_lookups`: The buffer or lookup queue is full.- `lookup_failed`: The lookup key was not found in the log, not a string, or not an integer.
{% /dd %}

{% /dl %}

#### Buffer metrics (when enabled){% #buffer-metrics-when-enabled %}

To see buffer metrics for your Enrichment Table processor, add these tags to buffer metrics:

- `component_type=enrichment_table`
- `component_id=<processor_id>`
- `buffer_id=enrichment_table_buffer`

These metrics are specific to processor buffers, located upstream of a processor. Each processor emits its own respective buffer metrics. **Note**: Processor buffers are not configurable, but these metrics can help monitor backpressure as it propagates through your pipeline's processors.

- Use the `component_id` tag to filter or group by individual components.
- Use the `component_type` tag to filter or group by the processor type, such as `quota` for the Quota processor.

{% dl %}

{% dt %}
`pipelines.transform_buffer_utilization`
{% /dt %}

{% dd %}
**Description**: Histogram of how many events are buffered in a processor.
{% /dd %}

{% dd %}
**Metric type**: histogram
{% /dd %}

{% dt %}
`pipelines.transform_buffer_utilization_level`
{% /dt %}

{% dd %}
**Description**: Event count in a processor's buffer.
{% /dd %}

{% dd %}
**Metric type**: gauge
{% /dd %}

{% dt %}
`pipelines.transform_buffer_utilization_mean`
{% /dt %}

{% dd %}
**Description**: The exponentially weighted moving average (EWMA) of the number of events in a processor's buffer.
{% /dd %}

{% dd %}
**Metric type**: gauge
{% /dd %}

{% dt %}
`pipelines.transform_buffer_max_size_events`
{% /dt %}

{% dd %}
**Description**: A processor buffer's maximum event capacity.
{% /dd %}

{% dd %}
**Metric type**: gauge
{% /dd %}

{% /dl %}

#### Reference Table metrics{% #reference-table-metrics %}

To see metrics about your Enrichment Table processor using a Reference Table, add the tags `component_type:enrichment_table` and `component_id=<processor_id>` to the metrics below. The tag `reference_table:<table_uuid>` can also be used to aggregate across all processors using the same Reference Table.

{% dl %}

{% dt %}
`pipelines.enrichment_rows_not_found_total`
{% /dt %}

{% dd %}
This counter is incremented for each processed log that does not have a corresponding row in the table.
{% /dd %}

{% dt %}
`pipelines.enrichment_cache_hits_total`
{% /dt %}

{% dd %}
Number of cache hits, that is logs that could be enriched without being buffered.
{% /dd %}

{% dt %}
`pipelines.enrichment_cache_misses_total`
{% /dt %}

{% dd %}
Number of cache misses, that is logs that required buffering and sending a request to the Reference Tables API.
{% /dd %}

{% dt %}
`pipelines.component_errors_total`
{% /dt %}

{% dd %}
Number of logs that cannot be enriched because of an error. These errors are reported with the tag `error_code=did_not_enrich_event`.
{% /dd %}

{% dd %}
The tag `reason` may contain the following values:- `target_exists`: The target value to store the enriched data already exists and is not an object.- `too_many_pending_lookups`: The buffer or lookup queue is full.- `lookup_failed`: The lookup key was not found in the log, not a string or an integer.- `reference_table_read_error`: Unrecoverable errors or too many consecutive errors occurred while trying to read the Reference Table.
{% /dd %}

{% /dl %}

The metrics below are common to all processors consuming the same Reference Table and use the tags `component_type:enrichment_table`, `component_id=reference_table_<table_uuid>` and `reference_table:<table_uuid>`.

{% dl %}

{% dt %}
`pipelines.reference_table_cached_rows`
{% /dt %}

{% dd %}
This gauge metric reports the number of rows stored in the local cache. The tag `found:true` reports rows existing in the table, and `found:false` reports rows that do not exist in the table.
{% /dd %}

{% dt %}
`pipelines.reference_table_queued_keys`
{% /dt %}

{% dd %}
This gauge metric reports the number of row keys waiting to be read from the Reference Tables API. The queue has a maximum capacity of 5,000 keys. When a log attempts to insert a key that would exceed this limit, the log is immediately sent downstream without enrichment.
{% /dd %}

{% dt %}
`pipelines.reference_table_fetched_keys_total`
{% /dt %}

{% dd %}
For each request sent to the Reference Tables API, this counter is incremented with the number of rows fetched in that request.
{% /dd %}

{% /dl %}
