Advanced Log Collection
Datadog's Research Report: The State of Serverless Report: The State of Serverless

Advanced Log Collection

Apply log processing rules to a specific log collection configurations to:

Note: If you set up multiple processing rules, they are applied sequentially and each rule is applied on the result of the previous one.

To apply a processing rule to all logs collected by a Datadog Agent, see the Global processing rules section.

Filter logs

To send only a specific subset of logs to Datadog use the log_processing_rules parameter in your configuration file with the exclude_at_match or include_at_match type.

ParameterDescription
exclude_at_matchIf the specified pattern is contained in the message, the log is excluded and not sent to Datadog.

For example, filtering out logs that contain a Datadog email address:

logs:
  - type: file
    path: /my/test/file.log
    service: cardpayment
    source: java
    log_processing_rules:
    - type: exclude_at_match
      name: exclude_datadoghq_users
      ## Regexp can be anything
      pattern: \w+@datadoghq.com
ParameterDescription
include_at_matchOnly logs with a message that includes the specified pattern are sent to Datadog.

For example, sending only logs that contain a Datadog email address:

logs:
  - type: file
    path: /my/test/file.log
    service: cardpayment
    source: java
    log_processing_rules:
    - type: include_at_match
      name: include_datadoghq_users
      ## Regexp can be anything
      pattern: \w+@datadoghq.com

Scrub sensitive data from your logs

If your logs contain sensitive information that need redacting, configure the Datadog Agent to scrub sensitive sequences by using the log_processing_rules parameter in your configuration file with the mask_sequences type.

This replaces all matched groups with the value of the replace_placeholder parameter.

For example, redact credit card numbers:

logs:
 - type: file
   path: /my/test/file.log
   service: cardpayment
   source: java
   log_processing_rules:
      - type: mask_sequences
        name: mask_credit_cards
        replace_placeholder: "[masked_credit_card]"
        ##One pattern that contains capture groups
        pattern: (?:4[0-9]{12}(?:[0-9]{3})?|[25][1-7][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})

Note: With Agent version 7.17+, the replace_placeholder string can expand references to capture groups such as $1, $2 and so forth. If you want a string to follow the capture group with no space in between, use the format ${<GROUP_NUMBER>}. For instance, to scrub user information from the log User email: foo.bar@example.com, the following configuration:

# (...)
pattern: "(User email: )[^@]*@(.*)"
replace_placeholder: "$1 masked_user@${2}"

Would send the following log to Datadog: User email: masked_user@example.com

Commonly Used Log Processing Rules

Generic String: “sensitive-info”

Lines containing the string sensitive-info are not sent to Datadog.

  - type: exclude_at_match
    name: exclude_sensitive_info
    pattern: (?:sensitive\-info)
my_key=value

When the string “my_key=” is found, letters, numbers, spaces, and underscores following the string are redacted with my_key=[VALUE REDACTED].

- type: mask_sequences
  name: redact_key_match_letters_numbers_spaces_unders
  replace_placeholder: "my_key=[VALUE REDACTED]"
  pattern: (?:my_key=[A-Za-z0-9\s_]*[A-Za-z0-9][A-Za-z0-9\s_])

When the string “my_key=” is found, all characters following the string until the next period are redacted with my_key=[VALUE REDACTED].

- type: mask_sequences
  name: redact_key_match_to_period
  replace_placeholder: "my_key=[VALUE REDACTED]"
  pattern: (?:my_key=[^.])
Social Ssecurity Numbers

Redact Social Security Numbers.

  - type: mask_sequences
    name: social_security_number_basic
    pattern: (?:\d{3}-?\d{2}-?\d{4})
    replace_placeholder: "[SSN REDACTED]"
Email Address

Redact email addresses using the RFC 5322 regex specification.

  - type: mask_sequences
    name: RFC_5322_email
    pattern: (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
    replace_placeholder: "[EMAIL REDACTED]"
Credit Card (Visa, MC, AMEX, DINERS, DISCOVER, JCB)

Redact credit card numbers for Visa, Mastercard, American Express, Diner’s Club, Discover Card, and JCB.

- type: mask_sequences
  name: visa_mc_amex_diners_discover_jcb_credit_card
  replace_placeholder: "[CREDIT CARD REDACTED]"
  pattern: (?:4[0-9]{12}(?:[0-9]{3})?|[25][1-7][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})
Postal Codes

Redact postal codes (US).

- type: mask_sequences
  name: postal_codes
  replace_placeholder: "[POSTAL CODE REDACTED]"
  pattern: (?:\d{5}-\d{4}|\d{5}|[A-Z]\d[A-Z] \d[A-Z]\d)
Between Parentheses

Redact characters after string ExampleConfig( until the closing paranthesis.

- type: mask_sequences
  name: Example_config_redaction
  replace_placeholder: "ExampleConfig([REDACTED, REDACTED]"
  pattern: (?:ExampleConfig\([^\)]+)
Between Brackets

Redact characters after string on Example [ until the closing bracket.

- type: mask_sequences
  name: on_Example_redaction
  replace_placeholder: "on Example [Example REDACTED]"
  pattern: (?:on Example\s?[^\s]+)
Class A IP Address

Redact Class A IP Addresses, range 1.0.0.1 to 126.255.255.254.

- type: mask_sequences
  name: simple_ip_address
  replace_placeholder: "[IP REDACTED]"
  pattern: (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

Multi-line aggregation

If your logs are not sent in JSON and you want to aggregate several lines into a single entry, configure the Datadog Agent to detect a new log using a specific regex pattern instead of having one log per line. This is accomplished by using the log_processing_rules parameter in your configuration file with the multi_line type which aggregates all lines into a single entry until the given pattern is detected again.

For example, every Java log line starts with a timestamp in yyyy-dd-mm format. These lines include a stack trace that can be sent as two logs:

2018-01-03T09:24:24.983Z UTC Exception in thread "main" java.lang.NullPointerException
        at com.example.myproject.Book.getTitle(Book.java:16)
        at com.example.myproject.Author.getBookTitles(Author.java:25)
        at com.example.myproject.Bootstrap.main(Bootstrap.java:14)
2018-01-03T09:26:24.365Z UTC starting upload of /my/file.gz

To send the example logs above with a configuration file, use the following log_processing_rules:

logs:
 - type: file
   path: /var/log/pg_log.log
   service: database
   source: postgresql
   log_processing_rules:
      - type: multi_line
        name: new_log_start_with_date
        pattern: \d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])

In a Docker environment, use the label com.datadoghq.ad.logs on your container to specify the log_processing_rules, for example:

 labels:
    com.datadoghq.ad.logs: '[{"source": "postgresql", "service": "database", "log_processing_rules": [{"type": "multi_line", "name": "log_start_with_date", "pattern" : "\\d{4}-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01])"}]}]'

In a Kubernetes environment, use the pod annotation ad.datadoghq.com on your pod to specify the log_processing_rules, for example:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: postgres
spec:
  selector:
    matchLabels:
      app: database
  template:
    metadata:
      annotations:
        ad.datadoghq.com/postgres.logs: '[{"source": "postgresql", "service": "database", "log_processing_rules": [{"type": "multi_line", "name": "log_start_with_date", "pattern" : "\\d{4}-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01])"}]}]'
      labels:
        app: database
      name: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:latest

Note: Escape regex characters in your patterns when performing multi-line aggregation with pod annotations. For example, \d becomes \\d, \w becomes \\w, etc.

More examples:

Raw stringPattern
14:20:15\d{2}:\d{2}:\d{2}
11/10/2014\d{2}\/\d{2}\/\d{4}
Thu Jun 16 08:29:03 2016\w{3}\s+\w{3}\s+\d{2}\s\d{2}:\d{2}:\d{2}
20180228\d{8}

Note: Regex patterns for multi-line logs must start at the beginning of a log. Patterns cannot be matched mid-line.

Tail directories by using wildcards

If your log files are labeled by date or all stored in the same directory, configure your Datadog Agent to monitor them all and automatically detect new ones by using wildcards in the path attribute. If you want to exclude some files matching the chosen path, list them in the exclude_paths attribute.

  • Using path: /var/log/myapp/*.log:

    • Matches all .log file contained in the /var/log/myapp/ directory.
    • Doesn’t match /var/log/myapp/myapp.conf.
  • Using path: /var/log/myapp/*/*.log:

    • Matches /var/log/myapp/log/myfile.log.
    • Matches /var/log/myapp/errorLog/myerrorfile.log
    • Doesn’t match /var/log/myapp/mylogfile.log.

Configuration example:

logs:
 - type: file
   path: /var/log/myapp/*.log
   exclude_paths:
     - /var/log/myapp/debug.log
     - /var/log/myapp/trace.log
   service: mywebapp
   source: go

The example above will match /var/log/myapp/log/myfile.log but /var/log/myapp/log/debug.log and /var/log/myapp/log/trace.log will never be tailed.

Note: The Agent requires read and execute permissions on a directory to list all the available files in it.

Global processing rules

For Datadog Agent v6.10+, the exclude_at_match, include_at_match, and mask_sequences processing rules can be defined globally in the Agent’s main configuration file or through an environment variable:

In the datadog.yaml file:

logs_config:
  processing_rules:
     - type: exclude_at_match
       name: exclude_healthcheck
       pattern: healthcheck
     - type: mask_sequences
       name: mask_user_email
       pattern: \w+@datadoghq.com
       replace_placeholder: "MASKED_EMAIL"

Use the environment variable DD_LOGS_CONFIG_PROCESSING_RULES to configure global processing rules, for example:

DD_LOGS_CONFIG_PROCESSING_RULES='[{"type": "mask_sequences", "name": "mask_user_email", "replace_placeholder": "MASKED_EMAIL", "pattern" : "\\w+@datadoghq.com"}]'

Use the env parameter in the helm chart to set the DD_LOGS_CONFIG_PROCESSING_RULES environment variable to configure global processing rules, for example:

env:
  - name: DD_LOGS_CONFIG_PROCESSING_RULES
    value: '[{"type": "mask_sequences", "name": "mask_user_email", "replace_placeholder": "MASKED_EMAIL", "pattern" : "\\w+@datadoghq.com"}]'
All the logs collected by the Datadog Agent are impacted by the global processing rules.

Note: The Datadog Agent does not start the log collector if there is a format issue in the global processing rules. Run the Agent’s status subcommand to troubleshoot any issues.

Further Reading


*Logging without Limits is a trademark of Datadog, Inc.