Enrichment Table Processor
이 페이지는 아직 한국어로 제공되지 않습니다. 번역 작업 중입니다.
현재 번역 프로젝트에 대한 질문이나 피드백이 있으신 경우
언제든지 연락주시기 바랍니다.Join the Preview!
The Enrichment Table processor using Reference Tables is in Preview. Use this form to request access.
Request AccessOverview
Logs can contain information like IP addresses, user IDs, or service names that often need additional context. With the Enrichment Table processor, you can add context to your logs, using lookup datasets stored in Datadog Reference Tables, local files, or MaxMind GeoIP tables. The processor matches logs based on a specified key and appends information from your lookup file to the log. If you use Reference Tables, you can connect to and enrich logs with SaaS-based datasets directly stored in ServiceNow, Snowflake, S3, and more.
When to use this processor
The following are use cases for enriching logs from integrations.
Cloud object storage
Cloud object storage services (Amazon S3, Azure Blob Storage, Google Cloud Storage) are scalable storage services for large volumes of structured and unstructured reference data.
Use the Enrichment Table processor to enrich logs with externally maintained reference datasets, such as threat intelligence feeds, allow- and denylists, asset inventories, compliance mappings stored as CSVs, or other types of files that are updated regularly.
Databricks
Databricks is a cloud-based data lakehouse used for machine learning (ML), advanced analytics, and big data workloads.
Use the Enrichment Table processor to:
- Add predictions or scores generated by ML models, such as fraud likelihoods and anomaly detection results.
- Reference datasets stored in Databricks, such as customer profiles, device information, or security info.
In Datadog’s Databricks integration documentation, see Reference Tables Configuration for information on how to set up Reference Tables for Databricks.
Salesforce
Salesforce is a Customer Relationship Management (CRM) tool used to track and store sales opportunities, accounts, contacts, deals, and contracts.
Use the Enrichment Table processor to:
- Attach customer and account information, such as the industry type, ARR, and owner, to operational logs for prioritizing incidents.
- Enrich marketing- or sales-focused dashboards with operational signals like latency spikes tied to customers.
In Datadog’s Salesforce integration documentation, see Enable ingestion of reference tables for information on how to set up Reference Tables for Salesforce.
ServiceNow (CMDB)
ServiceNow is an IT service management platform with a Configuration Management Database (CMDB) that tracks infrastructure assets, applications, and dependencies.
Use the Enrichment Table processor to:
- Enrich logs with infrastructure ownership and dependency context, such as which team owns the host and which business unit that team supports.
- Add information directly from CMDB records to telemetry.
In Datadog’s ServiceNow CMDB documentation, see Reference Tables for information on how to set up Reference Tables for ServiceNow CMDB.
Snowflake
Snowflake is a cloud-native data warehouse/lake that centralizes structured and semi-structured data.
Use the Enrichment Table processor to:
- Add customer metadata (account tier, region, SLA) to logs.
- Join security events with user or asset attributes stored in Snowflake.
In Datadog’s Snowflake integration documentation, see Reference Tables for information on how to set up Reference Tables for Snowflake.
Setup
To set up the Enrichment Table processor:
- Click Add enrichment.
- Define a filter query. Only logs that match the specified filter query are sent through the processor. Note: All logs, regardless of whether they match the filter query, are sent to the next step in the pipeline.
- In the Set lookup mapping section, select the type of lookup dataset you want to use.
- Select the Reference Table in the dropdown menu. See Using reference tables for more information.
- Click Manage to go to the Reference Tables configuration page.
- (Optional) Select specific columns with which to enrich your logs.
- Observability Pipelines enriches logs with all columns in the table by default. Each column in the table is added as an attribute to the log, where the attribute name is the column name and the attribute value is the column value.
- If you want to enrich your logs with specific columns from your Reference Table, select the columns’ corresponding attributes in the dropdown menu.
- Enter a Datadog Application key identifier. Observability Pipelines uses application keys to access Datadog’s programmatic API when enriching data. Ensure you application key is:
- Enter the source attribute of the log. The source attribute’s value is what you want Observability Pipelines to find in the Reference Table. See the Enrichment file example for more information.
- Enter the target attribute. The target attribute’s value stores, as a JSON object, the information found in the Reference Table. See the Enrichment file example for more information.
- Click Save.
- Enter the file path.
- Note: All file paths are made relative to the configuration data directory, which is
/var/lib/observability-pipelines-worker/config/ by default. The file must be owned by the observability-pipelines-worker group and observability-pipelines-worker user, or at least readable by the group or user. See Advanced Worker Configurations for more information.
- Enter the column name. The column name in the enrichment table is used for matching the source attribute value. See the Enrichment file example for more information.
- Enter the source attribute of the log. The source attribute’s value is what you want Observability Pipelines to find in the Reference Table.
- Enter the target attribute. The target attribute’s value stores the information found in the Reference Table as a JSON object.
- Click Save.
- For GeoIP, enter the GeoIP path to your
.mmdb file relative to the <DD_OP_DATA_DIR>/config directory.- Note: All file paths are made relative to the configuration data directory, which is
/var/lib/observability-pipelines-worker/config/ by default. The file must be owned by the observability-pipelines-worker group and observability-pipelines-worker user, or at least readable by the group or user. See Advanced Worker Configurations for more information.
- Enter the source attribute of the log. The source attribute’s value is what you want Observabiity Pipelines to find in the Reference Table. See the Enrichment file example for more information.
- Enter the target attribute. The target attribute’s value stores the information found in the Reference Table as a JSON object. See the Enrichment file example for more information.
- Click Save.
Enrichment file example
For this example:
- This is the Reference Table that the enrichment processor uses:
| merch_id | merchant_name | city | state |
|---|
| 803 | Andy’s Ottomans | Boise | Idaho |
| 536 | Cindy’s Couches | Boulder | Colorado |
| 235 | Debra’s Benches | Las Vegas | Nevada |
merchant_id is used as the source attribute and merchant_info as the target attribute.merch_id is set as the column name the processor uses to find the source attribute’s value. Note: The source attribute’s value does not have to match the column name.
If the enrichment processor receives a log with "merchant_id":"536":
- The processor looks for the value
536 in the Reference Table’s merch_id column. - After it finds the value, it adds the entire row of information from the Reference Table to the
merchant_info attribute as a JSON object:
merchant_info {
"merchant_name":"Cindy's Couches",
"city":"Boulder",
"state":"Colorado"
}
How the processor works
Using Reference Tables
Reference Tables allow you to store information like customer details, asset lists, and service dependency information in Datadog. The Enrichment Table processor pulls rows from Reference Tables on demand and caches them locally. Table rows persists in the cache for about 10 minutes. After that, they are evicted or refreshed.
When the processor encounters a log that does not have a corresponding row in the cache, the log data is buffered in memory until the row is retrieved from the Reference Table. If the buffer reaches its maximum capacity, it begins sending the oldest buffered logs downstream without enrichment. The processor does not exert upstream backpressure.
If an authentication error occurs while connecting to the Reference Table or after a series of failed requests, Datadog flushes buffered logs downstream without enrichment, to prevent the logs from waiting indefinitely and causing the buffer to stop accepting new logs. The processor periodically retries requests and automatically resumes normal operations when a request succeeds.
If an error that causes a log to be sent without enrichment occurs, you can view it in the Worker logs. It also increments the pipelines.component_errors_total metric.
Datadog does not recommend using the processor on a log field with high cardinality (more than 5,000 possible values). The Reference Tables API is subject to rate limits and might deny Worker requests. Reach out to Datadog support if you continue to notice rate limit warnings in the Worker logs while running the processor.
Metrics
Processor metrics
To see metrics about your Enrichment Table processor, add the tags component_type=enrichment_table and component_id=<processor_id> to processor metrics:
pipelines.enrichment_rows_not_found_total- Number of processed logs that do not have corresponding rows in the table.
pipelines.component_errors_total- Number of logs that cannot be enriched because of an error. These errors are reported with the tag
error_code=did_not_enrich_event. - The tag
reason may contain the following values:
- target_exists: The target value to store the enriched data already exists and is not an object.
- too_many_pending_lookups: The buffer or lookup queue is full.
- lookup_failed: The lookup key was either not found in the log, not a string, or an authentication error occurred.
Buffer metrics (when buffering is enabled)
To see buffer metrics for your Enrichment Table processor, add these tags to buffer metrics:
component_type=enrichment_tablecomponent_id=<processor_id>buffer_id=enrichment_table_buffer
Use these metrics to analyze buffer performance. All metrics are emitted on a one-second interval, unless otherwise stated. Note: counter metrics, such as pipelines.buffer_received_events_total, represent the count per second and not the cumulative total, even though total is in the metric name.
Tags for metrics
- Use the
component_id tag to filter or group by individual components. - Use the
component_type tag to filter or group by sources, processors, or destinations. Note: For processors, use component_type:transform.
Destination buffer metrics
These metrics are specific to destination buffers, located upstream of a destination. Each destination emits its own respective buffer metrics.
pipelines.buffer_size_events- Description: Number of events in a destination’s buffer.
- Metric type: gauge
pipelines.buffer_size_bytes- Description: Number of bytes in a destination’s buffer.
- Metric type: gauge
pipelines.buffer_received_events_total- Description: Events received by a destination’s buffer.
- Metric type: counter
pipelines.buffer_received_bytes_total- Description: Bytes received by a destination’s buffer.
- Metric type: counter
pipelines.buffer_sent_events_total- Description: Events sent downstream by a destination’s buffer.
- Metric type: counter
pipelines.buffer_sent_bytes_total- Description: Bytes sent downstream by a destination’s buffer.
- Metric type: counter
pipelines.buffer_discarded_events_total- Description: Events discarded by the buffer.
- Metric type: counter
- Additional tags:
intentional:true means an incoming event was dropped because the buffer was configured to drop the newest logs when it’s full. intentional:false means the event was dropped due to an error. pipelines.buffer_discarded_bytes_total- Description: Bytes discarded by the buffer.
- Metric type: counter
- Additional tags:
intentional:true means an incoming event was dropped because the buffer was configured to drop the newest logs when it’s full. intentional:false means the event was dropped due to an error.
Source buffer metrics
These metrics are specific to source buffers, located downstream of a source. Each source emits its own respective buffer metrics. Note: Source buffers are not configurable, but these metrics can help monitor backpressure as it propagates to your pipeline’s source.
pipelines.source_buffer_utilization- Description: Event count in a source’s buffer.
- Metric type: histogram
pipelines.source_buffer_utilization_level- Description: Number of events in a source’s buffer.
- Metric type: gauge
pipelines.source_buffer_utilization_mean- Description: The exponentially weighted moving average (EWMA) of the number of events in the source’s buffer.
- Metric type: gauge
pipelines.source_buffer_max_size_events- Description: A source buffer’s maximum event capacity.
- Metric type: gauge
Processor buffer metrics
These metrics are specific to processor buffers, located upstream of a processor. Each processor emits its own respective buffer metrics. Note: Processor buffers are not configurable, but these metrics can help monitor backpressure as it propagates through your pipeline’s processors.
pipelines.transform_buffer_utilization- Description: Event count in a processor’s buffer.
- Metric type: histogram
pipelines.transform_buffer_utilization_level- Description: Event count in a processor’s buffer.
- Metric type: gauge
pipelines.transform_buffer_utilization_mean- Description: The exponentially weighted moving average (EWMA) of the number of events in a processor’s buffer.
- Metric type: gauge
pipelines.transform_buffer_max_size_events- Description: A processor buffer’s maximum event capacity.
- Metric type: gauge
Deprecated buffer metrics
These metrics are still emitted by the Observability Pipelines Worker for backwards compatibility. Datadog recommends using the replacements when possible.
pipelines.buffer_events- Description: Number of events in a destination’s buffer. Use
pipelines.buffer_size_events instead. - Metric type: gauge
pipelines.buffer_byte_size- Description: Number of bytes in a destination’s buffer. Use
pipelines.buffer_size_bytes instead. - Metric type: gauge
Reference Table metrics
To see metrics about your Enrichment Table processor using a Reference Table, add the tags component_type:enrichment_table and component_id:reference_table_<table-id> to the metrics:
pipelines.enrichment_rows_not_found_total- This counter is incremented for each processed log that does not have a corresponding row in the table.
pipelines.reference_table_cached_rows- This gauge metric reports the number of rows stored in the local cache. The tag
found:true reports rows existing in the table, and found:false reports rows that do not exist in the table. pipelines.reference_table_queued_keys- This gauge metric reports the number of row keys waiting to be read from the Reference Tables API. The queue has a maximum capacity of 5,000 keys. When a log attempts to insert a key that would exceed this limit, the log is immediately sent downstream without enrichment.
pipelines.reference_table_fetched_keys_total- For each request sent to the Reference Tables API, this counter is incremented with the number of rows fetched in that request.
Filter query syntax
Each processor has a corresponding filter query in their fields. Processors only process logs that match their filter query. And for all processors except the Filter processor, logs that do not match the query are sent to the next step of the pipeline. For the Filter processor, logs that do not match the query are dropped.
The following are logs filter query examples:
NOT (status:debug): This filters for logs that do not have the status DEBUG.status:ok service:flask-web-app: This filters for all logs with the status OK from your flask-web-app service.- This query can also be written as:
status:ok AND service:flask-web-app.
host:COMP-A9JNGYK OR host:COMP-J58KAS: This filter query only matches logs from the labeled hosts.user.status:inactive: This filters for logs with the status inactive nested under the user attribute.http.status:[200 TO 299] or http.status:{300 TO 399}: These two filters represent the syntax to query a range for http.status. Ranges can be used across any attribute.
Learn more about writing filter queries in Observability Pipelines Search Syntax.