Automatic Multi-line Detection and Aggregation
This page is not yet available in Spanish. We are working on its translation.
If you have any questions or feedback about our current translation project,
feel free to reach out to us!Overview
Automatic multi-line detection allows the Agent to detect and aggregate common multi-line logs automatically.
Getting started
To enable the Auto multi-line feature in your Agent configuration, set auto_multi_line_detection
to true
in your configuration file, or set the DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION=true
environment variable:
logs_config:
auto_multi_line_detection: true
DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION=true
Default settings
By default, the following features are enabled:
enable_datetime_detection
: This configures automatic datetime aggregation. Logs beginning with a datetime format are used to aggregate logs.enable_json_detection
: This configures JSON detection and rejection. JSON-structured logs are never aggregated.
You can disable these features by setting the following to false
in your configuration file or in your environment variable:
logs_config:
auto_multi_line:
enable_datetime_detection: false
enable_json_detection: false
DD_LOGS_CONFIG_AUTO_MULTI_LINE_ENABLE_DATETIME_DETECTION=false
DD_LOGS_CONFIG_AUTO_MULTI_LINE_ENABLE_JSON_DETECTION=false
Enable multi-line aggregation per integration
You can enable or disable multi-line aggregation for a specific integration’s log collection:
logs:
- type: file
path: /my/test/file.log
service: testApp
source: java
auto_multi_line_detection: false
Auto multi-line detection uses an algorithm to detect any datetime format that occurs in the first 60 bytes of a log line. To prevent false positives, the algorithm requires enough context to consider a datetime format a match.
Your datetime format must include both a date and time component to be detected.
Examples of valid formats that include enough context to be detected:
2021-03-28 13:45:30
2023-03-28T14:33:53.743350Z
Jun 14 15:16:01
2024/05/16 19:46:15
Examples of formats that do not have enough context to be detected:
12:30:2017
12:30:20
2024/05/16
Custom pattern configuration
If datetime aggregation is insufficient or your format is too short to be detected automatically, you can customize the feature in two ways:
Custom samples
A custom sample is a sample of a log on which you want to aggregate. For example, if you want to aggregate a stack trace, the first line of the stack trace would be a good sample to provide. Custom samples are an easier way to aggregate logs than regex patterns.
To configure custom samples, you can use the logs_config
in your datadog.yaml
file or set an environment variable. In the following example, the multi-line detection is looking for the sample "SEVERE Main main Exception occurred"
:
logs_config:
auto_multi_line_detection_custom_samples:
- sample: "SEVERE Main main Exception occurred"
DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION_CUSTOM_SAMPLES='[{"sample": "SEVERE Main main Exception occurred"}]'
This aggregates logs where "SEVERE Main main Exception occurred"
matches the first line. For example:
SEVERE Main main Exception occurred
java.lang.Exception: Something bad happened!
at Main.funcd(Main.java:50)
at Main.funcc(Main.java:49)
at Main.funcb(Main.java:48)
at Main.funca(Main.java:47)
at Main.main(Main.java:29)
How custom samples work
Custom samples tokenize the first 60 bytes of a log line and also tokenize the provided sample.
Tokens include
- words and their length
- whitespace
- numbers and their length
- special characters
- datetime components.
Each log token is compared to each token in the sample. If 75% of the log’s tokens match the sample’s, the log is marked for aggregation.
Datadog recommends using sample-based matching if your logs have a stable format. If you need more flexible matching, you can use regex.
Regex patterns
Regex patterns work similarly to a multi_line
rule. If the regex pattern matches the log, it is used for aggregation.
To configure custom regex patterns, you can use the logs_config
in your datadog.yaml
file or set an environment variable.
logs_config:
auto_multi_line_detection_custom_samples:
- regex: "\\[\\w+\\] Main main Exception occurred"
DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION_CUSTOM_SAMPLES='[{"regex": "\\[\\w+\\] Main main Exception occurred"}]'
You can mix samples and regex patterns to support multiple log formats:
logs_config:
auto_multi_line_detection_custom_samples:
- sample: "CORE | INFO | (pkg/logs/"
- regex: "\\d{4}dog.\\s\\w+"
- sample: "[ERR] Exception"
label: no_aggregate
DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION_CUSTOM_SAMPLES='[
{"sample": "CORE | INFO | (pkg/logs/"},
{"regex": "\\d{4}dog.\\s\\w+"},
{"sample": "[ERR] Exception", "label": "no_aggregate"}
]'
Note: Existing auto_multi_line_extra_patterns
configurations are automatically supported when migrating from V1.
Advanced customization
Auto multi-line detection uses a labeled aggregation system to aggregate logs. The detection step assigns a label to each log, and the aggregation step aggregates logs based on those labels.
Labels
start_group
- Defines beginning of a multi-line log
- Flushes any buffered multi-line log, if present
- Starts a new multi-line log
- Only one multi-line log can be buffered at a time aggregate
- Is added to existing multi-line log
- If no multi-line log exists, flushes immediately
- Default label when nothing else matches no_aggregate
- Declares logs that are never part of aggregation
- Flushes buffered multi-line log, if present
- Flushes sample immediately
- Used for JSON logs
Label configuration
You can provide custom labels to each regex or sample to change the aggregation behavior based on the label rules. This is useful if you want to explicitly include or exclude certain log formats in multi-line aggregation.
logs_config:
auto_multi_line_detection_custom_samples:
# Never aggregate these formats
- sample: "some service we should not aggregate"
label: no_aggregate
- regex: \w*\s(data|dog)
label: no_aggregate
DD_LOGS_CONFIG_AUTO_MULTI_LINE_DETECTION_CUSTOM_SAMPLES='[
{"sample": "some service we should not aggregate", "label": "no_aggregate"},
{"regex": "\\w*\\s(data|dog)", "label": "no_aggregate"}
]'
Monitoring and debugging
You can search for multiline logs or truncated logs by enabling the following settings:
logs_config:
tag_multi_line_logs: true
tag_truncated_logs: true
These settings add the following tags to your logs, allowing you to search for them in the logs explorer:
multiline
: Shows the aggregation source (for example, auto_multiline
, multiline_regex
)truncated
: Shows truncation source (for example, single_line
, multi_line
)
Note: The Agent truncates logs that are too long to process. If a line is too long before multiline aggregation, the Agent assigns it the single_line
tag. If an incorrect pattern causes a log to overflow the aggregation buffer, the Agent applies the multi_line
tag.
Configuration reference
Setting | Type | Default | Description |
---|
logs_config.auto_multi_line_detection_custom_samples | Object | Empty | Custom samples/regex patterns |
logs_config.auto_multi_line.enable_json_detection | Bool | True | Enable JSON detection & rejection |
logs_config.auto_multi_line.enable_datetime_detection | Bool | True | Enable datetime detection |
logs_config.auto_multi_line.timestamp_detector_match_threshold | Float | 0.5 | Timestamp matching threshold |
logs_config.auto_multi_line.tokenizer_max_input_bytes | Int | 60 | Bytes to tokenize |
Further reading
Más enlaces, artículos y documentación útiles: