Pipelines
Security Monitoring is now available Security Monitoring is now available

Pipelines

Pipelines Goal

A processing Pipeline takes a filtered subset of incoming logs and applies over them a list of sequential Processors.

Datadog automatically parses JSON-formatted logs. When your logs are not JSON-formatted, Datadog enables you to add value to your raw logs by sending them through a processing pipeline.

With pipelines, you parse and enrich your logs by chaining them sequentially through processors. This lets you extract meaningful information or attributes from semi-structured text to reuse them as facets.

Each log that comes through the Pipelines is tested against every Pipeline filter. If it matches one then all the processors are applied sequentially before moving to the next pipeline.

So for instance a processing Pipeline can transform this log:

into this log:

With one single pipeline:

Pipelines take logs from a wide variety of formats and translate them into a common format in Datadog.

For instance, a first Pipeline can be defined to extract application log prefix and then each team is free to define their own Pipeline to process the rest of the log message.

Pipeline filters

Filters let you limit what kinds of logs a Pipeline applies to.

The filter syntax is the same as the search bar.

Be aware that the Pipeline filtering is applied before any of the pipeline’s Processors, hence you cannot filter on an attribute that is extracted in the Pipeline itself

The logstream shows which logs your Pipeline applies to:

Nested Pipelines

Nested Pipelines are pipelines within a pipeline. Use Nested Pipelines to split the processing into two steps. For example, first use a high-level filtering such as team and then a second level of filtering based on the integration, service, or any other tag or attribute.

A pipeline can contain Nested Pipelines and Processors whereas a Nested Pipeline can only contain Processors.

It is possible to drag and drop a Pipeline into another Pipeline to transform it into a Nested Pipeline:

Special Pipelines

Reserved attribute Pipeline

Datadog has a list of reserved attributes such as timestamp, status, host, service, and even the log message, those attributes have a specific behavior within Datadog. If you have different attribute names for those in your JSON logs, use the reserved attribute Pipeline to remap your logs attribute to one of the reserved attribute list.

For example: A service that generates the below logs:

{
  "myhost": "host123",
  "myapp": "test-web-2",
  "logger_severity": "Error",
  "log": "cannot establish connection with /api/v1/test",
  "status_code": 500
}

Going into the reserved attribute Pipeline and changing the default mapping to this one:

Would then produce the following log:

If you want to remap an attribute to one of the reserved attributes in a custom Pipeline, use the Log Status Remapper or the Log Date Remapper.

Integration Pipelines

Datadog’s integration processing Pipelines are available for the certain sources when they are set up to collect logs. These pipelines are read-only and parse out your logs in ways appropriate for the particular source. To edit an integration Pipeline, clone it and then edit the clone:

Integration Pipeline Library

To see the full list of Integration Pipelines that Datadog offers, browse the Integration Pipeline Library. The Pipeline Library shows how Datadog processes different log formats by default.

Integration Pipeline Library

To use one Integration Pipeline, Datadog recommends to install the integration by configuring the corresponding log source. Once Datadog receives the first log with this source, the installation will be automatically triggered and the Integration Pipeline will be added to the processing pipelines list. To configure the log source, please refer to the corresponding Integration documentation.

It’s also possible to copy an integration pipeline using the copy button.

Cloning pipeline from Library

Pipelines limitations

To make sure the Log Management solution functions in an optimal way, we set the following technical limits and rules to your log events, as well as to some product features. These have been designed so that you may never reach them.

Limits applied to ingested log events

  • For an optimal use of the platform, we recommend that the size of a log event should not exceed 25K bytes. When using the Datadog Agent, log events larger than 256KB are split into several entries. When using the Datadog TCP or HTTP API directly, log events up to 1MB are accepted by the API.
  • Log events can be submitted up to 18h in the past and 2h in the future.
  • A log event once converted to JSON format should contain less than 256 attributes. Each of those attribute’s keys should be less than 50 characters, be nested in less than 10 successive levels, and their respective value should be less than 1024 characters if promoted as a facet.
  • A log event should not have more than 100 tags and each tag should not exceed 256 characters for a maximum of 10 million unique tags per day.

Log events which do not comply with these limits might be transformed or truncated by the system-or simply not indexed if outside of the provided time range. However, Datadog always tries to do its best to preserve as much as possible to preserve provided user data.

Limits applied to provided features

  • The maximum number of facets is 1000.
  • We recommend using at most 20 Processors per Pipeline.
  • We recommend using at most 10 parsing rules within a grok Processor. We reserve the right to disable underperforming parsing rules, processors, or pipelines that might impact Datadog’s service performance.

Contact support if you reach one of these limits as Datadog might be able to provide you more.

Further Reading