---
title: Sensitive Data Scanner Processor
description: Datadog, the leading service for cloud-scale monitoring.
breadcrumbs: Docs > Observability Pipelines > Processors > Sensitive Data Scanner Processor
---

> For the complete documentation index, see [llms.txt](https://docs.datadoghq.com/llms.txt).

# Sensitive Data Scanner Processor

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com, us2.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site.md). ({% placeholder "user-datadog-site-name" /%}).
{% /alert %}

{% /callout %}
Available for:
{% icon name="icon-logs" /%}
 Logs 
## Overview{% #overview %}

The Sensitive Data Scanner processor scans logs to detect and redact or hash sensitive information such as PII, PCI, and custom sensitive data. You can pick from Datadog's library of predefined rules, or input custom Regex rules to scan for sensitive data.

You can set up the pipeline and processor in the UI, [API](https://docs.datadoghq.com/api/latest/observability-pipelines.md#create-a-new-pipeline), or Terraform.

See Best practices to optimize performance for tips on reducing resource usage.

## Set up the processor in the UI{% #set-up-the-processor-in-the-ui %}

To set up the processor:

1. Define a filter query. Only logs that match the specified filter query are scanned and processed. All logs are sent to the next step in the pipeline, regardless of whether they match the filter query. See [Search Syntax](https://docs.datadoghq.com/observability_pipelines/search_syntax/logs.md) for more information.
1. Click Add Scanning Rule.
1. Select one of the following:

{% tab title="Library rules" %}

1. In the dropdown menu, select the library rule you want to use.
1. Recommended keywords are automatically added based on the library rule selected. After the scanning rule has been added, you can add additional keywords or remove recommended keywords.
1. In the Define rule target and conditions section, select if you want to scan the Entire Event, Specific Attributes, or Exclude Attributes in the dropdown menu.
   - If you are scanning the entire event, you can optionally exclude specific attributes from getting scanned. Use path notation (`outer_key.inner_key`) to access nested keys. For specified attributes with nested data, all nested data is excluded.
   - If you are scanning specific attributes, specify which attributes you want to scan. Use path notation (`outer_key.inner_key`) to access nested keys. For specified attributes with nested data, all nested data is scanned.
1. For Define actions on match, select the action you want to take for the matched information. **Note**: Redaction, partial redaction, and hashing are all irreversible actions.
   - Redact: Replaces all matching values with the text you specify in the Replacement text field.
   - Partially Redact: Replaces a specified portion of all matched data. In the Redact section, specify the number of characters you want to redact and which part of the matched data to redact.
   - Hash: Replaces all matched data with a unique identifier. The UTF-8 bytes of the match are hashed with the 64-bit fingerprint of FarmHash.
1. Optionally, click Add Field to add tags you want to associate with the matched events.
1. Add a name for the scanning rule.
1. Optionally, add a description for the rule.
1. Click Save.

### Add additional keywords{% #add-additional-keywords %}

After adding scanning rules from the library, you can edit each rule separately and add additional keywords to the keyword dictionary.

1. Navigate to your [pipeline](https://app.datadoghq.com/observability-pipelines).
1. In the Sensitive Data Scanner processor with the rule you want to edit, click Manage Scanning Rules.
1. Toggle Use recommended keywords if you want the rule to use them. Otherwise, add your own keywords to the Create keyword dictionary field. You can also require that these keywords be within a specified number of characters of a match. By default, keywords must be within 30 characters before a matched value.
1. Click Update.

{% /tab %}

{% tab title="Custom rules" %}

1. In the Define match conditions section, specify the regex pattern to use for matching against events in the Define the regex field. See [Writing Effective Grok Parsing Rules with Regular Expressions](https://docs.datadoghq.com/logs/guide/regex_log_parsing.md) for more information. Sensitive Data Scanner supports Perl Compatible Regular Expressions (PCRE), but the following patterns are not supported:
   - Backreferences and capturing sub-expressions (lookarounds)
   - Arbitrary zero-width assertions
   - Subroutine references and recursive patterns
   - Conditional patterns
   - Backtracking control verbs
   - The `\C` "single-byte" directive (which breaks UTF-8 sequences)
   - The `\R` newline match
   - The `\K` start of match reset directive
   - Callouts and embedded code
   - Atomic grouping and possessive quantifiers
1. Enter sample data in the Add sample data field to verify that your regex pattern is valid.
1. For Create keyword dictionary, add keywords to refine detection accuracy when matching regex conditions. For example, if you are scanning for a sixteen-digit Visa credit card number, you can add keywords like `visa`, `credit`, and `card`. You can also require that these keywords be within a specified number of characters of a match. By default, keywords must be within 30 characters before a matched value.
1. In the Define rule target and conditions section, select if you want to scan the Entire Event, Specific Attributes, or Exclude Attributes in the dropdown menu.
   - If you are scanning the entire event, you can optionally exclude specific attributes from getting scanned. Use path notation (`outer_key.inner_key`) to access nested keys. For specified attributes with nested data, all nested data is excluded.
   - If you are scanning specific attributes, specify which attributes you want to scan. Use path notation (`outer_key.inner_key`) to access nested keys. For specified attributes with nested data, all nested data is scanned.
1. For Define actions on match, select the action you want to take for the matched information. **Note**: Redaction, partial redaction, and hashing are all irreversible actions.
   - Redact: Replaces all matching values with the text you specify in the Replacement text field.
   - Partially Redact: Replaces a specified portion of all matched data. In the Redact section, specify the number of characters you want to redact and which part of the matched data to redact.
   - Hash: Replaces all matched data with a unique identifier. The UTF-8 bytes of the match is hashed with the 64-bit fingerprint of FarmHash.
1. Optionally, click Add Field to add tags you want to associate with the matched events.
1. Add a name for the scanning rule.
1. Optionally, add a description for the rule.
1. Click Add Rule.

{% /tab %}

### Delete a rule{% #delete-a-rule %}

To delete a rule in the Sensitive Data Scanner:

1. Navigate to [Observability Pipelines](https://app.datadoghq.com/observability-pipelines).
1. Select your pipeline.
1. Click the Sensitive Data Scanner processor to expand it.
1. Click Manage Scanning Rules.
1. Select the rule you want to delete.
1. Click Delete.

### Path notation example{% #path-notation-example %}

For this log structure example:

```json
{
    "outer_key": {
        "inner_key": "inner_value",
        "a": {
            "double_inner_key": "double_inner_value",
            "b": "b value"
        },
        "c": "c value"
    },
    "d": "d value"
}
```

Follow these reference rules:

- Use `outer_key.inner_key` to reference the key with the value `inner_value`.
- Use `outer_key.a.double_inner_key` to reference the key with the value `double_inner_value`.

To specify a nested field with a literal `.` in the attribute key, wrap the key in escaped quotes in the search query. For example, the search query `"service.status":disabled` matches the event `{"service.status": "disabled"}`.

## Set up the processor using Terraform{% #set-up-the-processor-using-terraform %}

You can use the [Datadog Observability Pipeline Terraform resource](https://registry.terraform.io/providers/DataDog/datadog/latest/docs/resources/observability_pipeline) to set up a pipeline with the Sensitive Data Scanner processor. To add a rule to the Sensitive Data Scanner processor using Terraform:

1. Use the [Datadog Sensitive Data Scanner Standard Pattern](https://registry.terraform.io/providers/DataDog/datadog/latest/docs/data-sources/sensitive_data_scanner_standard_pattern) data source to retrieve the rule ID of the Sensitive Data Scanner [library rule](https://docs.datadoghq.com/security/sensitive_data_scanner/scanning_rules/library_rules.md).

   ```terraform
   data "datadog_sensitive_data_scanner_standard_pattern" "<RULE_IDENTIFIER>" {
     filter = "<RULE_NAME>"
   }
      
```

Replace the placeholders:

   - `<RULE_IDENTIFIER>` with a name to use when you later set up the Sensitive Data Scanner processor in the Observability Pipeline resource.
   - `<RULE_NAME>` with the exact name of the rule. See [Library Rules](https://docs.datadoghq.com/security/sensitive_data_scanner/scanning_rules/library_rules.md) for the full list of rules.

For example, if you want to use the [AWS Access Key ID Scanner](https://docs.datadoghq.com/security/sensitive_data_scanner/scanning_rules/library_rules.md?search=AWS+Access+Key+ID+Scanner), configure the data source as follows:


   ```terraform
   data "datadog_sensitive_data_scanner_standard_pattern" "aws_access_key" {
     filter = "AWS Access Key ID Scanner"
   }
      
```
See the full configuration example on how to add data sources for multiple rules.


1. Add a [rule](https://registry.terraform.io/providers/DataDog/datadog/latest/docs/resources/observability_pipeline#nested-schema-for-configprocessor_groupprocessorsensitive_data_scanner) block in your Observability Pipeline resource for the library rule.

   ```terraform
   ...
     sensitive_data_scanner {
       rule {
         name = "<YOUR_RULE_NAME>"
         tags = []
         on_match {
           redact {
             replace = "***"
           }
         }
         pattern {
           library {
             id                       = data.datadog_sensitive_data_scanner_standard_pattern.<RULE_IDENTIFIER>.id
             use_recommended_keywords = true
           }
         }
         scope {
           all = true
         }
       }
     }
      
```

Replace the placeholders:

   - `<YOUR_RULE_NAME>` with a name for the rule. This name is shown in the Pipelines UI.
   - `<RULE_IDENTIFIER>` with the rule identifier you used in the data source in step 1.

For example, if you use the [AWS Access Key ID Scanner](https://docs.datadoghq.com/security/sensitive_data_scanner/scanning_rules/library_rules.md?search=AWS+Access+Key+ID+Scanner) data source from step 1, configure the rule block as follows:

   ```terraform
   ...
     sensitive_data_scanner {
       rule {
         name = "Redact AWS Access Key IDs"
         tags = []
         on_match {
           redact {
             replace = "***"
           }
         }
         pattern {
           library {
             id                       = data.datadog_sensitive_data_scanner_standard_pattern.aws_access_key.id
             use_recommended_keywords = true
           }
         }
         scope {
           all = true
         }
       }
     }
      
```

See the full configuration example on how to add multiple rules.

1. Repeat steps 1 and 2 for all library rules you want to add.

### Full configuration example{% #full-configuration-example %}

{% image
   source="https://docs.dd-static.net/images/observability_pipelines/processors/sds_tf_ui.7727712c35cf1ff6544b402a8b815fb2.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/observability_pipelines/processors/sds_tf_ui.7727712c35cf1ff6544b402a8b815fb2.png?auto=format&fit=max&w=850&dpr=2 2x"
   alt="The Sensitive Data Scanner processor panel showing two scanning rules: Redact AWS Access Key IDs and Redact US SSNs" /%}

If you want to use the Sensitive Data Scanner processor to scan for AWS Access Key IDs and US Social Security Numbers, and redact them by replacing them with the string `***`:

1. Use the [Datadog Sensitive Data Scanner Standard Pattern](https://registry.terraform.io/providers/DataDog/datadog/latest/docs/data-sources/sensitive_data_scanner_standard_pattern) data source to retrieve the rule IDs for the [AWS Access Key ID Scanner](https://docs.datadoghq.com/security/sensitive_data_scanner/scanning_rules/library_rules.md?search=AWS+Access+Key+ID+Scanner) and the [US Social Security Number Scanner](https://docs.datadoghq.com/security/sensitive_data_scanner/scanning_rules/library_rules.md?search=US+Social+Security+Number+Scanner).
1. In your [Datadog Observability Pipeline](https://registry.terraform.io/providers/DataDog/datadog/latest/docs/resources/observability_pipeline) resource's Sensitive Data Scanner processor, use the Sensitive Data Scanner rules defined in the data sources.

```terraform
data "datadog_sensitive_data_scanner_standard_pattern" "aws_access_key" {
  filter = "AWS Access Key ID Scanner"
}
data "datadog_sensitive_data_scanner_standard_pattern" "us_ssn" {
  filter = "US Social Security Number Scanner"
}

resource "datadog_observability_pipeline" "sensitive_data_pipeline" {
  name = "Sensitive Data Pipeline"

  config {
    source {
      id = "source-0"
      datadog_agent {}
    }

    processor_group {
      display_name = "Processors"
      enabled      = true
      id           = "group-0"
      include      = "*"
      inputs       = ["source-0"]

      processor {
        display_name = "Sensitive Data Scanner"
        enabled      = true
        id           = "processor-sds-0"
        include      = "*"

        sensitive_data_scanner {
          rule {
            name = "Redact AWS Access Key IDs"
            tags = []
            on_match {
              redact {
                replace = "***"
              }
            }
            pattern {
              library {
                id                       = data.datadog_sensitive_data_scanner_standard_pattern.aws_access_key.id
                use_recommended_keywords = true
              }
            }
            scope {
              all = true
            }
          }
          rule {
            name = "Redact US SSNs"
            tags = []
            on_match {
              redact {
                replace = "***"
              }
            }
            pattern {
              library {
                id                       = data.datadog_sensitive_data_scanner_standard_pattern.us_ssn.id
                use_recommended_keywords = true
              }
            }
            scope {
              all = true
            }
          }
        }
      }
    }

    destination {
      id     = "destination-0"
      inputs = ["group-0"]
      datadog_logs {}
    }
  }
}
```

## Best practices to optimize performance{% #best-practices-to-optimize-performance %}

The Sensitive Data Scanner processor is CPU intensive. Use the following best practices to optimize performance.

### Only enable rules you need{% #only-enable-rules-you-need %}

Rules that are enabled but not used consume unnecessary resources. Check the Sensitive Data Scanner processor to view how many matches each rule has had over the past 24 hours.

1. Navigate to [Observability Pipelines](https://app.datadoghq.com/observability-pipelines).
1. Select your pipeline.
1. Click the Sensitive Data Scanner processor to expand it.
1. Click View Scanning Rules to open the side panel and see Matches in the last 24 hours for each rule.

See Delete a rule to delete an unused rule.

### Only scan the events and fields that need to be scanned for sensitive data{% #only-scan-the-events-and-fields-that-need-to-be-scanned-for-sensitive-data %}

The time it takes the Sensitive Data Scanner to scan an event roughly scales with the size of the event. To optimize processor performance:

- If you know the types of events you want to scan, define a processor query that only sends the events you want to the processor.

- Reduce scanning time by targeting specific event attributes for scanning or excluding event attributes from being scanned. See the Define rule target and conditions step in Set up the processor.

### Evaluate and benchmark performance optimizations{% #evaluate-and-benchmark-performance-optimizations %}

Use the `pipelines.component_latency_seconds` metric to:

- Benchmark processor performance when you add a rule
- Evaluate performance after making optimization changes, such as reducing the number of fields being scanned and removing unused rules

To view the `pipelines.component_latency_seconds` metric:

1. Navigate to [Metrics Explorer](https://app.datadoghq.com/metric/explorer).
1. In the metric field, enter `pipelines.component_latency_seconds`.
1. In the from field, enter the tag `component_id:<COMPONENT_ID>`, where `<COMPONENT_ID>` is the ID for your Sensitive Data Scanner processor.

**Note**: `pipelines.component_latency_seconds` is a distribution metric so you must enable percentiles for that metric. See [Enabling advanced query functionality](https://docs.datadoghq.com/metrics/distributions.md#enabling-advanced-query-functionality) for instructions.

## Metrics{% #metrics %}

For [component metrics](https://docs.datadoghq.com/observability_pipelines/monitoring_and_troubleshooting/pipeline_usage_metrics.md#component-metrics) and [processor buffer metrics](https://docs.datadoghq.com/observability_pipelines/monitoring_and_troubleshooting/pipeline_usage_metrics.md#processor-buffer-metrics) emitted by all processors, see the [Pipelines Usage Metrics](https://docs.datadoghq.com/observability_pipelines/monitoring_and_troubleshooting/pipeline_usage_metrics.md) documentation.

### Sensitive Data Scanner metrics{% #sensitive-data-scanner-metrics %}

- Use the `component_id` tag to filter or group by individual components.
- The `component_type` tag is `sensitive_data_scanner` for Sensitive Data Scanner processor metrics.

{% dl %}

{% dt %}
`pipelines.sds_rule_matched_total`
{% /dt %}

{% dd %}
**Description**: The number of events that matched a Sensitive Data Scanner rule. Tagged with the matching rule name.
{% /dd %}

{% dd %}
**Metric type**: count
{% /dd %}

{% dt %}
`pipelines.scanned_events`
{% /dt %}

{% dd %}
**Description**: The number of events scanned by the Sensitive Data Scanner engine.
{% /dd %}

{% dd %}
**Metric type**: count
{% /dd %}

{% dt %}
`pipelines.scanning.match_count`
{% /dt %}

{% dd %}
**Description**: The number of matches found by the Sensitive Data Scanner.
{% /dd %}

{% dd %}
**Metric type**: count
{% /dd %}

{% dt %}
`pipelines.scanning.suppressed_match_count`
{% /dt %}

{% dd %}
**Description**: The number of matches suppressed by the Sensitive Data Scanner.
{% /dd %}

{% dd %}
**Metric type**: count
{% /dd %}

{% dt %}
`pipelines.scanning.duration`
{% /dt %}

{% dd %}
**Description**: Accumulated wall-clock time, in seconds, spent scanning events. Use this metric to benchmark processor performance and evaluate optimizations.
{% /dd %}

{% dd %}
**Metric type**: count
{% /dd %}

{% dt %}
`pipelines.scanning.cpu_duration`
{% /dt %}

{% dd %}
**Description**: Accumulated CPU time, in seconds, spent scanning events.
{% /dd %}

{% dd %}
**Metric type**: count
{% /dd %}

{% dt %}
`pipelines.scanner.total_count`
{% /dt %}

{% dd %}
**Description**: The number of Sensitive Data Scanner processors currently running.
{% /dd %}

{% dd %}
**Metric type**: gauge
{% /dd %}

{% dt %}
`pipelines.scanner.total_regexes`
{% /dt %}

{% dd %}
**Description**: The number of regexes held across all Sensitive Data Scanners.
{% /dd %}

{% dd %}
**Metric type**: gauge
{% /dd %}

{% /dl %}

## Further reading{% #further-reading %}

- [Writing Effective Grok Parsing Rules with Regular Expressions](https://docs.datadoghq.com/logs/guide/regex_log_parsing.md)