Advanced Log Collection Configurations

After you set up log collection, you can customize your collection configuration:

To apply a processing rule to all logs collected by a Datadog Agent, see the Global processing rules section.

Notes:

  • If you set up multiple processing rules, they are applied sequentially and each rule is applied on the result of the previous one.
  • Processing rule patterns must conform to Golang regexp syntax.
  • The log_processing_rules parameter is used in integration configurations to customize your log collection configuration. While in the Agent’s main configuration, the processing_rules parameter is used to define global processing rules.

Filter logs

To send only a specific subset of logs to Datadog, use the log_processing_rules parameter in your configuration file with the exclude_at_match or include_at_match type.

Exclude at match

ParameterDescription
exclude_at_matchIf the specified pattern is contained in the message, the log is excluded and not sent to Datadog.

For example, to filter out logs that contain a Datadog email address, use the following log_processing_rules:

logs:
  - type: file
    path: /my/test/file.log
    service: cardpayment
    source: java
    log_processing_rules:
    - type: exclude_at_match
      name: exclude_datadoghq_users
      ## Regexp can be anything
      pattern: \w+@datadoghq.com

In a Docker environment, use the label com.datadoghq.ad.logs on the container sending the logs you want to filter in order to specify the log_processing_rules, for example:

 labels:
    com.datadoghq.ad.logs: >-
      [{
        "source": "java",
        "service": "cardpayment",
        "log_processing_rules": [{
          "type": "exclude_at_match",
          "name": "exclude_datadoghq_users",
          "pattern" : "\\w+@datadoghq.com"
        }]
      }]      

Note: Escape regex characters in your patterns when using labels. For example, \d becomes \\d, \w becomes \\w.

Note: The label value must follow JSON syntax, which means you should not include any trailing commas or comments.

To apply a specific configuration to a given container, Autodiscovery identifies containers by name, NOT image. It tries to match <CONTAINER_IDENTIFIER> to .spec.containers[0].name, not .spec.containers[0].image. To configure using Autodiscovery to collect container logs on a given <CONTAINER_IDENTIFIER> within your pod, add the following annotations to your pod’s log_processing_rules:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: cardpayment
spec:
  selector:
    matchLabels:
      app: cardpayment
  template:
    metadata:
      annotations:
        ad.datadoghq.com/<CONTAINER_IDENTIFIER>.logs: >-
          [{
            "source": "java",
            "service": "cardpayment",
            "log_processing_rules": [{
              "type": "exclude_at_match",
              "name": "exclude_datadoghq_users",
              "pattern" : "\\w+@datadoghq.com"
            }]
          }]          
      labels:
        app: cardpayment
      name: cardpayment
    spec:
      containers:
        - name: '<CONTAINER_IDENTIFIER>'
          image: cardpayment:latest

Note: Escape regex characters in your patterns when using pod annotations. For example, \d becomes \\d, \w becomes \\w.

Note: The annotation value must follow JSON syntax, which means you should not include any trailing commas or comments.

Include at match

ParameterDescription
include_at_matchOnly logs with a message that includes the specified pattern are sent to Datadog. If multiple include_at_match rules are defined, all rules patterns must match in order for the log to be included.

For example, use the following log_processing_rules configuration to filter in logs that contain a Datadog email address:

logs:
  - type: file
    path: /my/test/file.log
    service: cardpayment
    source: java
    log_processing_rules:
    - type: include_at_match
      name: include_datadoghq_users
      ## Regexp can be anything
      pattern: \w+@datadoghq.com

If you want to match one or more patterns, you must define them in a single expression:

logs:
  - type: file
    path: /my/test/file.log
    service: cardpayment
    source: java
    log_processing_rules:
    - type: include_at_match
      name: include_datadoghq_users
      pattern: abc|123

If the patterns are too long to fit legibly on a single line, you can break them into multiple lines:

logs:
  - type: file
    path: /my/test/file.log
    service: cardpayment
    source: java
    log_processing_rules:
    - type: include_at_match
      name: include_datadoghq_users
      pattern: "abc\
|123\
|\\w+@datadoghq.com"

In a Docker environment, use the label com.datadoghq.ad.logs on the container that is sending the logs you want to filter, to specify the log_processing_rules. For example:

 labels:
    com.datadoghq.ad.logs: >-
      [{
        "source": "java",
        "service": "cardpayment",
        "log_processing_rules": [{
          "type": "include_at_match",
          "name": "include_datadoghq_users",
          "pattern" : "\\w+@datadoghq.com"
        }]
      }]      

Note: Escape regex characters in your patterns when using labels. For example, \d becomes \\d, \w becomes \\w.

Note: The label value must follow JSON syntax, which means you should not include any trailing commas or comments.

In a Kubernetes environment, use the pod annotation ad.datadoghq.com on your pod to specify the log_processing_rules. For example:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: cardpayment
spec:
  selector:
    matchLabels:
      app: cardpayment
  template:
    metadata:
      annotations:
        ad.datadoghq.com/<CONTAINER_IDENTIFIER>.logs: >-
          [{
            "source": "java",
            "service": "cardpayment",
            "log_processing_rules": [{
              "type": "include_at_match",
              "name": "include_datadoghq_users",
              "pattern" : "\\w+@datadoghq.com"
            }]
          }]          
      labels:
        app: cardpayment
      name: cardpayment
    spec:
      containers:
        - name: '<CONTAINER_IDENTIFIER>'
          image: cardpayment:latest

Note: Escape regex characters in your patterns when using pod annotations. For example, \d becomes \\d, \w becomes \\w.

Note: The annotation value must follow JSON syntax, which means you should not include any trailing commas or comments.

Scrub sensitive data from your logs

If your logs contain sensitive information that need redacting, configure the Datadog Agent to scrub sensitive sequences by using the log_processing_rules parameter in your configuration file with the mask_sequences type.

This replaces all matched groups with the value of the replace_placeholder parameter.

For example, to redact credit card numbers:

logs:
 - type: file
   path: /my/test/file.log
   service: cardpayment
   source: java
   log_processing_rules:
      - type: mask_sequences
        name: mask_credit_cards
        replace_placeholder: "[masked_credit_card]"
        ##One pattern that contains capture groups
        pattern: (?:4[0-9]{12}(?:[0-9]{3})?|[25][1-7][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})

In a Docker environment, use the label com.datadoghq.ad.logs on your container to specify the log_processing_rules. For example:

 labels:
    com.datadoghq.ad.logs: >-
      [{
        "source": "java",
        "service": "cardpayment",
        "log_processing_rules": [{
          "type": "mask_sequences",
          "name": "mask_credit_cards",
          "replace_placeholder": "[masked_credit_card]",
          "pattern" : "(?:4[0-9]{12}(?:[0-9]{3})?|[25][1-7][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\\d{3})\\d{11})"
        }]
      }]      

Note: Escape regex characters in your patterns when using labels. For example, \d becomes \\d, \w becomes \\w.

Note: The label value must follow JSON syntax, which means you should not include any trailing commas or comments.

In a Kubernetes environment, use the pod annotation ad.datadoghq.com on your pod to specify the log_processing_rules. For example:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: cardpayment
spec:
  selector:
    matchLabels:
      app: cardpayment
  template:
    metadata:
      annotations:
        ad.datadoghq.com/<CONTAINER_IDENTIFIER>.logs: >-
          [{
            "source": "java",
            "service": "cardpayment",
            "log_processing_rules": [{
              "type": "mask_sequences",
              "name": "mask_credit_cards",
              "replace_placeholder": "[masked_credit_card]",
              "pattern" : "(?:4[0-9]{12}(?:[0-9]{3})?|[25][1-7][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\\d{3})\\d{11})"
            }]
          }]          
      labels:
        app: cardpayment
      name: cardpayment
    spec:
      containers:
        - name: '<CONTAINER_IDENTIFIER>'
          image: cardpayment:latest

Note: Escape regex characters in your patterns when using pod annotations. For example, \d becomes \\d, \w becomes \\w.

Note: The annotation value must follow JSON syntax, which means you should not include any trailing commas or comments.

With Agent version 7.17+, the replace_placeholder string can expand references to capture groups such as $1, $2 and so forth. If you want a string to follow the capture group with no space in between, use the format ${<GROUP_NUMBER>}.

For instance, to scrub user information from the log User email: foo.bar@example.com, use:

  • pattern: "(User email: )[^@]*@(.*)"
  • replace_placeholder: "$1 masked_user@${2}"

This sends the following log to Datadog: User email: masked_user@example.com

Multi-line aggregation

If your logs are not sent in JSON and you want to aggregate several lines into a single entry, configure the Datadog Agent to detect a new log using a specific regex pattern instead of having one log per line. Use the multi_line type in the log_processing_rules parameter to aggregates all lines into a single entry until the given pattern is detected again.

For example, every Java log line starts with a timestamp in yyyy-dd-mm format. These lines include a stack trace that can be sent as two logs:

2018-01-03T09:24:24.983Z UTC Exception in thread "main" java.lang.NullPointerException
        at com.example.myproject.Book.getTitle(Book.java:16)
        at com.example.myproject.Author.getBookTitles(Author.java:25)
        at com.example.myproject.Bootstrap.main(Bootstrap.java:14)
2018-01-03T09:26:24.365Z UTC starting upload of /my/file.gz

To send the example logs above with a configuration file, use the following log_processing_rules:

logs:
 - type: file
   path: /var/log/pg_log.log
   service: database
   source: postgresql
   log_processing_rules:
      - type: multi_line
        name: new_log_start_with_date
        pattern: \d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])

In a Docker environment, use the label com.datadoghq.ad.logs on your container to specify the log_processing_rules. For example:

 labels:
    com.datadoghq.ad.logs: >-
      [{
        "source": "postgresql",
        "service": "database",
        "log_processing_rules": [{
          "type": "multi_line",
          "name": "log_start_with_date",
          "pattern" : "\\d{4}-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01])"
        }]
      }]      

In a Kubernetes environment, use the pod annotation ad.datadoghq.com on your pod to specify the log_processing_rules. For example:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: postgres
spec:
  selector:
    matchLabels:
      app: database
  template:
    metadata:
      annotations:
        ad.datadoghq.com/<CONTAINER_IDENTIFIER>.logs: >-
          [{
            "source": "postgresql",
            "service": "database",
            "log_processing_rules": [{
              "type": "multi_line",
              "name": "log_start_with_date",
              "pattern" : "\\d{4}-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01])"
            }]
          }]          
      labels:
        app: database
      name: postgres
    spec:
      containers:
        - name: '<CONTAINER_IDENTIFIER>'
          image: postgres:latest

Note: Escape regex characters in your patterns when performing multi-line aggregation with pod annotations. For example, \d becomes \\d, \w becomes \\w.

Note: The annotation value must follow JSON syntax, which means you should not include any trailing commas or comments.

Important! Regex patterns for multi-line logs must start at the beginning of a log. Patterns cannot be matched mid-line. A never matching pattern may cause log line losses.

More examples:

Raw stringPattern
14:20:15\d{2}:\d{2}:\d{2}
11/10/2014\d{2}\/\d{2}\/\d{4}
Thu Jun 16 08:29:03 2016\w{3}\s+\w{3}\s+\d{2}\s\d{2}:\d{2}:\d{2}\s\d{4}
20180228\d{8}
2020-10-27 05:10:49.657\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\.\d{3}
{“date”: “2018-01-02”\{"date": "\d{4}-\d{2}-\d{2}

Automatic multi-line aggregation

With Agent 7.37+, auto_multi_line_detection can be enabled, which allows the Agent to detect common multi-line patterns automatically.

Enable auto_multi_line_detection globally in the datadog.yaml file:

logs_config:
  auto_multi_line_detection: true

For containerized deployments, you can enable auto_multi_line_detection with the DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION=true environment variable.

It can also be enabled or disabled (overriding the global config) per log configuration:

logs:
  - type: file
    path: /my/test/file.log
    service: testApp
    source: java
    auto_multi_line_detection: true

Automatic multi-line detection uses a list of common regular expressions to attempt to match logs. If the built-in list is not sufficient, you can also add custom patterns in the datadog.yaml file:

logs_config:
  auto_multi_line_detection: true
  auto_multi_line_extra_patterns:
   - \d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])
   - '[A-Za-z_]+ \d+, \d+ \d+:\d+:\d+ (AM|PM)'

In a Docker environment, use the label com.datadoghq.ad.logs on your container to specify the log_processing_rules. For example:

 labels:
    com.datadoghq.ad.logs: >-
      [{
        "source": "java",
        "service": "testApp",
        "auto_multi_line_detection": true
      }]      
apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: testApp
spec:
  selector:
    matchLabels:
      app: testApp
  template:
    metadata:
      annotations:
        ad.datadoghq.com/<CONTAINER_IDENTIFIER>.logs: >-
          [{
            "source": "java",
            "service": "testApp",
            "auto_multi_line_detection": true
          }]          
      labels:
        app: testApp
      name: testApp
    spec:
      containers:
        - name: '<CONTAINER_IDENTIFIER>'
          image: testApp:latest

With this feature enabled, when a new log file is opened the Agent tries to detect a pattern. During this process the logs are sent as single lines. After the detection threshold is met, all future logs for that source are aggregated with the detected pattern, or as single lines if no pattern is found. Detection takes at most 30 seconds or the first 500 logs (whichever comes first).

Note: If you can control the naming pattern of the rotated log, ensure that the rotated file replaces the previously active file with the same name. The Agent reuses a previously detected pattern on the newly rotated file to avoid re-running detection.

Automatic multi-line detection detects logs that begin and comply with the following date/time formats: RFC3339, ANSIC, Unix Date Format, Ruby Date Format, RFC822, RFC822Z, RFC850, RFC1123, RFC1123Z, RFC3339Nano, and default Java logging SimpleFormatter date format.

Commonly used log processing rules

See the dedicated Commonly Used Log Processing Rules FAQ to see a list of examples.

Tail directories using wildcards

If your log files are labeled by date or all stored in the same directory, configure your Datadog Agent to monitor them all and automatically detect new ones using wildcards in the path attribute. If you want to exclude some files matching the chosen path, list them in the exclude_paths attribute.

  • Using path: /var/log/myapp/*.log:

    • Matches all .log file contained in the /var/log/myapp/ directory.
    • Doesn’t match /var/log/myapp/myapp.conf.
  • Using path: /var/log/myapp/*/*.log:

    • Matches /var/log/myapp/log/myfile.log.
    • Matches /var/log/myapp/errorLog/myerrorfile.log
    • Doesn’t match /var/log/myapp/mylogfile.log.

Configuration example for Linux:

logs:
  - type: file
    path: /var/log/myapp/*.log
    exclude_paths:
      - /var/log/myapp/debug.log
      - /var/log/myapp/trace.log
    service: mywebapp
    source: go

The example above matches /var/log/myapp/log/myfile.log and excludes /var/log/myapp/log/debug.log and /var/log/myapp/log/trace.log.

Configuration example for Windows:

logs:
  - type: file
    path: C:\\MyApp\\*.log
    exclude_paths:
      - C:\\MyApp\\MyLog.*.log
    service: mywebapp
    source: csharp

The example above matches C:\\MyApp\\MyLog.log and excludes C:\\MyApp\\MyLog.20230101.log and C:\\MyApp\\MyLog.20230102.log.

Note: The Agent requires read and execute permissions on a directory to list all the available files in it. Note2: The path and exclude_paths values are case sensitive.

Tail most recently modified files first

When prioritizing files to tail, the Datadog Agent sorts the filenames in the directory path by reverse lexicographic order. To sort files based on file modification time, set the configuration option logs_config.file_wildcard_selection_mode to the value by_modification_time.

This option is helpful when the number of total log file matches exceeds logs_config.open_files_limit. Using by_modification_time ensures that the most recently updated files are tailed first in the defined directory path.

To restore default behavior, set the configuration option logs_config.file_wildcard_selection_mode to the valueby_name.

This feature requires Agent version 7.40.0 or above.

Log file encodings

By default, the Datadog Agent assumes that logs use UTF-8 encoding. If your application logs use a different encoding, specify the encoding parameter in the logs configuration setting.

The list below gives the supported encoding values. If you provide an unsupported value, the Agent ignores the value and reads the file as UTF-8.

  • utf-16-le - UTF-16 little-endian (Datadog Agent v6.23/v7.23)
  • utf-16-be - UTF-16 big-endian (Datadog Agent v6.23/v7.23)
  • shift-jis - Shift-JIS (Datadog Agent v6.34/v7.34)

Configuration example:

logs:
  - type: file
    path: /test/log/hello-world.log
    tags: key:value
    service: utf-16-logs
    source: mysql
    encoding: utf-16-be

Note: The encoding parameter is only applicable when the type parameter is set to file.

Global processing rules

For Datadog Agent v6.10+, the exclude_at_match, include_at_match, and mask_sequences processing rules can be defined globally in the Agent’s main configuration file or through an environment variable:

In the datadog.yaml file:

logs_config:
  processing_rules:
    - type: exclude_at_match
      name: exclude_healthcheck
      pattern: healthcheck
    - type: mask_sequences
      name: mask_user_email
      pattern: \w+@datadoghq.com
      replace_placeholder: "MASKED_EMAIL"

Use the environment variable DD_LOGS_CONFIG_PROCESSING_RULES to configure global processing rules, for example:

DD_LOGS_CONFIG_PROCESSING_RULES='[{"type": "mask_sequences", "name": "mask_user_email", "replace_placeholder": "MASKED_EMAIL", "pattern" : "\\w+@datadoghq.com"}]'

Use the env parameter in the helm chart to set the DD_LOGS_CONFIG_PROCESSING_RULES environment variable to configure global processing rules. For example:

env:
  - name: DD_LOGS_CONFIG_PROCESSING_RULES
    value: '[{"type": "mask_sequences", "name": "mask_user_email", "replace_placeholder": "MASKED_EMAIL", "pattern" : "\\w+@datadoghq.com"}]'
All the logs collected by the Datadog Agent are impacted by the global processing rules.

Note: The Datadog Agent does not start the log collector if there is a format issue in the global processing rules. Run the Agent’s status subcommand to troubleshoot any issues.

Further Reading


*Logging without Limits is a trademark of Datadog, Inc.