Overview

Sensitive Data Scanner uses scanning rules to identify, tag, and optionally redact sensitive data in your logs, APM events, and RUM events. Use out-of-the-box scanning rules or create custom rules using regular expression (regex) patterns. This guide goes over best practices for creating custom rules using regex patterns.

Use precise regex patterns

Define regex patterns that are as precise as possible because generic patterns result in more false positives. To refine your regex pattern, add test data in the sample data tester when creating a custom rule. For more information, see step 2 in Add a custom scanning rule.

Refine regex pattern matching

Provide a list of keywords to the keyword dictionary to refine regex pattern matching. The dictionary checks for the matching pattern within a defined proximity of those keywords. For example, if you are scanning for passwords, you can add keywords like password, token, secret, and credential. You can also specify that these keywords be within a certain number of characters of a match. By default, keywords must be within 30 characters before a matched value. See step 2 in Add a custom scanning rule for more information.

A keyword dictionary with password, token, secret, credential

To make matches more precise, you can also do one of the following:

  • Scan the entire event but exclude certain attributes from getting scanned. For example, if you are scanning for personally identifiable information (PII) like names, you might want to exclude attributes such as resource_name and namespace.
  • Scan for specific attributes to narrow the scope of the data that is scanned. For example, if you are scanning for names, you can choose specific attributes such as first_name and last_name.

See step 3 in Add a custom scanning rule for more information.

Use out-of-the-box rules

Whenever possible, use Datadog’s out-of-the-box library rules. These rules are predefined rules that detect common patterns such as email addresses, credit card numbers, API keys, authorization tokens, network and device information, and more. Each rule has recommended keywords for the keyword dictionary to refine matching accuracy. You can also add your own keywords.

Contact support if there is a rule that you want to use and think other users would also benefit from it.

Further reading