Configure the Datadog Agent or Tracer for Data Security
Overview
The performance data and traces that you’re collecting with Datadog can contain sensitive information that you want to filter out, obfuscate, scrub, filter, modify, or just not collect. Additionally, it may contain synthetic traffic that might cause your error counts to be inaccurate or Datadog to not accurately indicate the health of your services.
The Datadog Agent and some tracing libraries have options available to address these situations and modify or discard spans, and various options are described below. These docs cover several common methods for configuring Tracer and Agent to achieve these security requirements.
If your fine-tuning needs aren’t covered and you need assistance, reach out to the Datadog support team.
Generalizing resource names and filtering baseline
Datadog enforces several filtering mechanisms on spans as a baseline, to provide sound defaults for basic security and generalize resource names to facilitate grouping during analysis. In particular:
Environment variables are not collected by the Agent
SQL variables are obfuscated, even when not using prepared statements: For example, the following sql.query
attribute: SELECT data FROM table WHERE key=123 LIMIT 10
has its variables obfuscated, to become the following Resource name: SELECT data FROM table WHERE key = ? LIMIT ?
SQL strings are identified using standard ANSI SQL quotes: This means strings should be surrounded in single quotes ('
). Some SQL variants optionally support double-quotes ("
) for strings, but most treat double-quoted things as identifiers. The Datadog obfuscator treats these as identifiers rather than strings and does not obfuscate them.
Numbers in Resource names (for example, request URLs) are obfuscated For example, the following elasticsearch
attribute:
Elasticsearch : {
method : GET,
url : /user.0123456789/friends/_count
}
has its number in the URL obfuscated, to become the following Resource name: GET /user.?/friends/_count
Agent trace obfuscation
Agent trace obfuscation is disabled by default. Enable it in your datadog.yaml
configuration file to obfuscate all information attached to your traces.
This option works with the following services:
mongodb
elasticsearch
redis
memcached
http
remove_stack_traces
Note: You can use automatic scrubbing for multiple types of services at the same time. Configure each in the obfuscation
section of your datadog.yaml
file.
Applies to spans of type mongodb
, more specifically: to the mongodb.query
span tags.
apm_config:
enabled: true
## (...)
obfuscation:
# MongoDB obfuscation rules. Applies to spans of type "mongodb".
# More specifically, to the "mongodb.query" tag.
mongodb:
enabled: true
# Values for the keys listed here will not be obfuscated.
keep_values:
- document_id
- template_id
keep_values
- defines a set of keys to exclude from Agent trace obfuscation.
Applies to spans of type elasticsearch
, more specifically, to the elasticsearch.body
span tags:
apm_config:
enabled: true
## (...)
obfuscation:
# ElasticSearch obfuscation rules. Applies to spans of type "elasticsearch".
# More specifically, to the "elasticsearch.body" tag.
elasticsearch:
enabled: true
# Values for the keys listed here will not be obfuscated.
keep_values:
- client_id
- product_id
Applies to spans of type redis
, more specifically, to the redis.raw_command
span tags:
apm_config:
enabled: true
## (...)
obfuscation:
redis:
enabled: true
Applies to spans of type memcached
, more specifically, to the memcached.command
span tags:
apm_config:
enabled: true
## (...)
obfuscation:
memcached:
enabled: true
HTTP obfuscation rules for http.url
metadata in spans of type http
:
apm_config:
enabled: true
## (...)
obfuscation:
http:
remove_query_string: true
remove_paths_with_digits: true
remove_query_string
: If true, obfuscates query strings in URLs.remove_paths_with_digits
: If true, path segments in URLs containing digits are replaced by “?”.
Set the remove_stack_traces
parameter to true, to remove stack traces and replace them with ?
.
apm_config:
enabled: true
## (...)
obfuscation:
remove_stack_traces: true
HTTP data collected
Datadog is standardizing the tags collected for web spans across the supported tracing libraries. Check your library’s release notes to see if it has implemented collecting these tags. For fully standardized libraries, see Span Tags Semantics.
Redacting the query in the URL
The http.url
tag is assigned the full URL value, including the query string. The query string could contain sensitive data, so by default Datadog parses it and redacts suspicious-looking values. This redaction process is configurable. To modify the regular expression used for redaction, set the DD_TRACE_OBFUSCATION_QUERY_STRING_REGEXP
environment variable to a valid regex of your choice. Valid regex is platform-specific. When the regex finds a suspicious key-value pair, it replaces it with <redacted>
.
If you do not want to collect the query string, set the DD_HTTP_SERVER_TAG_QUERY_STRING
environment variable to false
. The default value is true
.
To collect trace header tags, set the DD_TRACE_HEADER_TAGS
environment variable with a map of case-insensitive header keys to tag names. The library applies matching header values as tags on root spans. The setting also accepts entries without a specified tag name, for example:
DD_TRACE_HEADER_TAGS=CASE-insensitive-Header:my-tag-name,User-ID:userId,My-Header-And-Tag-Name
Scrub sensitive data from your spans
To scrub sensitive data from your span’s tags, use the replace_tags
setting in your datadog.yaml configuration file or the DD_APM_REPLACE_TAGS
environment variable. The value of the setting or environment variable is a list of one or more groups of parameters that specify how to replace sensitive data in your tags. These parameters are:
name
: The key of the tag to replace. To match all tags, use *
. To match the resource, use resource.name
.pattern
: The regexp pattern to match against.repl
: The replacement string.
For example:
apm_config:
replace_tags:
# Replace all characters starting at the `token/` string in the tag "http.url" with "?"
- name: "http.url"
pattern: "token/(.*)"
repl: "?"
# Remove trailing "/" character in resource names
- name: "resource.name"
pattern: "(.*)\/$"
repl: "$1"
# Replace all the occurrences of "foo" in any tag with "bar"
- name: "*"
pattern: "foo"
repl: "bar"
# Remove all "error.stack" tag's value
- name: "error.stack"
pattern: "(?s).*"
# Replace series of numbers in error messages
- name: "error.msg"
pattern: "[0-9]{10}"
repl: "[REDACTED]"
DD_APM_REPLACE_TAGS=[
{
"name": "http.url",
"pattern": "token/(.*)",
"repl": "?"
},
{
"name": "resource.name",
"pattern": "(.*)\/$",
"repl": "$1"
},
{
"name": "*",
"pattern": "foo",
"repl": "bar"
},
{
"name": "error.stack",
"pattern": "(?s).*"
},
{
"name": "error.msg",
"pattern": "[0-9]{10}",
"repl": "[REDACTED]"
}
]
Put this environment variable in the trace-agent container if you are using the daemonset configuration, or use agents.containers.traceAgent.env
in the values.yaml
file if you are using helm chart.
- name: DD_APM_REPLACE_TAGS
value: '[
{
"name": "http.url",
"pattern": "token/(.*)",
"repl": "?"
},
{
"name": "resource.name",
"pattern": "(.*)\/$",
"repl": "$1"
},
{
"name": "*",
"pattern": "foo",
"repl": "bar"
},
{
"name": "error.stack",
"pattern": "(?s).*"
},
{
"name": "error.msg",
"pattern": "[0-9]{10}",
"repl": "[REDACTED]"
}
]'
- DD_APM_REPLACE_TAGS=[{"name":"http.url","pattern":"token/(.*)","repl":"?"},{"name":"resource.name","pattern":"(.*)\/$","repl": "$1"},{"name":"*","pattern":"foo","repl":"bar"},{"name":"error.stack","pattern":"(?s).*"}, {"name": "error.msg", "pattern": "[0-9]{10}", "repl": "[REDACTED]"}]
Exclude resources from being collected
For an in depth overview of the options to avoid tracing specific resources, see Ignoring Unwanted Resources.
If your services include simulated traffic such as health checks, you may want to exclude these traces from being collected so the metrics for your services match production traffic.
The Agent can be configured to exclude a specific resource from traces sent by the Agent to Datadog. To prevent the submission of specific resources, use the ignore_resources
setting in the datadog.yaml
file . Then create a list of one or more regular expressions, specifying which resources the Agent filters out based on their resource name.
If you are running in a containerized environment, set DD_APM_IGNORE_RESOURCES
on the container with the Datadog Agent instead. See the Docker APM Agent environment variables for details.
## @param ignore_resources - list of strings - optional
## A list of regular expressions can be provided to exclude certain traces based on their resource name.
## All entries must be surrounded by double quotes and separated by commas.
# ignore_resources: ["(GET|POST) /healthcheck","API::NotesController#index"]
Submit Traces directly to the Agent API
If you require tailored instrumentation for a specific application, consider using the Agent-side tracing API to select individual spans to include in traces. See the API documentation for additional information.
Modifying spans with the Datadog tracer
While this page deals with modifying data once it has reached the Datadog Agent, some tracing libraries are extensible. You can write a custom post-processor to intercept spans and adjust or discard them accordingly (for example, based on a regular expression match). View the Custom Instrumentation documentation for your language for more information.
Telemetry collection
Datadog may gather environmental and diagnostic information about your tracing libraries for processing; this may include information about the host running an application, operating system, programming language and runtime, APM integrations used, and application dependencies. Additionally, Datadog may collect information such as diagnostic logs, crash dumps with obfuscated stack traces, and various system performance metrics.
To disable this telemetry collection, set DD_INSTRUMENTATION_TELEMETRY_ENABLED
environment variable to false
in your instrumented application.
PCI DSS compliance for compliance for APM
PCI compliance for APM is only available for new Datadog organizations created in the
US1 site.
PCI compliance for APM is available when you create a new Datadog organization. To set up a PCI-compliant Datadog org, follow these steps:
- Set up a new Datadog org in the US1 site. PCI DSS compliance is only supported for new orgs created in US1.
- Contact Datadog support or your Customer Success Manager to request that the new org be configured as a PCI-compliant org.
- Enable Audit Trail in the new org. Audit Trail must be enabled and remain enabled for PCI DSS compliance.
- After Datadog support or Customer Success confirms that the new org is PCI DSS compliant, configure the Agent configuration file to send spans to the dedicated PCI-compliant endpoint (
https://trace-pci.agent.datadoghq.com
):apm_config:
apm_dd_url: <https://trace-pci.agent.datadoghq.com>
To enable PCI compliance for logs, see PCI DSS compliance for Log Management.
PCI compliance for APM is not available for the site.