The Service Map for APM is here!

Parsing

Overview

If your logs are JSON-formatted, Datadog automatically parses them, but for other formats, Datadog allows you to enrich your logs with the help of Grok Parser.
The Grok syntax provides an easier way to parse logs than pure regular expressions.
The main usage of the Grok Parser is to extract attributes from semi-structured text messages.

Grok comes with a lot of reusable patterns to parse integers, ip addresses, hostnames, etc…

Parsing rules can be written with the %{MATCHER:EXTRACT:FILTER} syntax:

  • Matcher: rule (possibly a reference to another token rule) that describes what to expect (number, word, notSpace,…)

  • Extract (optional): an identifier representing the capture destination for the piece of text matched by the MATCHER.

  • Filter (optional): a post-processor of the match to transform it

Example for this classic unstructured log:

john connected on 11/08/2017

With the following parsing rule:

MyParsingRule %{word:user} connected on %{date("MM/dd/yyyy"):connect_date}

You would have at the end this structured log:

Parsing example 1

Note: If you have multiple parsing rules in a single Grok parser, only one can match any given log. The first one that matches from top to bottom is the one that does the parsing.

Here is the list of all the matchers and filters natively implemented by Datadog:

Pattern Usage
date("pattern"[, "timezoneId"[, "localeId"]]) Matches a date with the specified pattern and parses to produce a Unix timestamp. See the date Matcher examples.
regex("pattern") Matches a regex. Check the regex Matcher examples.
data Matches any string including spaces and newlines. Equivalent to .*.
notSpace Matches any string until the next space.
boolean("truePattern", "falsePattern") Matches and parses a boolean optionally defining the true and false patterns (defaults to ‘true’ and ‘false’ ignoring case).
numberStr Matches a decimal floating point number and parses it as a string.
number Matches a decimal floating point number and parses it as a double precision number.
numberExtStr Matches a floating point number (with scientific notation support).
numberExt Matches a floating point number (with scientific notation support) and parses it as a double precision number.
integerStr Matches a decimal integer number and parses it as a string.
integer Matches a decimal integer number and parses it as an integer number.
integerExtStr Matches an integer number (with scientific notation support).
integerExt Matches an integer number (with scientific notation support) and parses it as an integer number.
word Matches alphanumeric words.
doubleQuotedString Matches a double-quoted string.
singleQuotedString Matches a single-quoted string.
quotedString Matches a double-quoted or single-quoted string.
uuid Matches a UUID.
mac Matches a MAC address.
ipv4 Matches an IPV4.
ipv6 Matches an IPV6.
ip Matches an IP (v4 or v6).
hostname Matches a hostname.
ipOrHost Matches a hostname or IP.
port Matches a port number.
Pattern Usage
number Parses a match as double precision number.
integer Parses a match as an integer number.
boolean Parses ‘true’ and ‘false’ strings as booleans ignoring case.
date("pattern"[, "timezoneId"[, "localeId"]]) Parses a date with the specified pattern to produce a Unix timestamp. See date Filter examples.
nullIf("value") Returns null if the match is equal to the provided value.
json Parses properly formatted JSON.
rubyhash Parses properly formatted Ruby hash (e.g. {name => "John" "job" => {"company" => "Big Company", "title" => "CTO"}}).
geoip Parses an IP or a host and returns a JSON object that contains the continent, country, city, and location of the IP address.
useragent([decodeuricomponent:true/false]) Parses a user-agent and returns a JSON object that contains the device, OS, and the browser represented by the Agent. Check the User Agent processor.
querystring Extracts all the key-value pairs in a matching URL query string (e.g. ?productId=superproduct&promotionCode=superpromo).
decodeuricomponent This core filter decodes URI components.
lowercase Returns the lower-cased string.
uppercase Returns the upper-cased string.
keyvalue([separatorStr[, characterWhiteList [, quotingStr]]) Extracts key value pattern and returns a JSON object. See key-value Filter examples.
scale(factor) Multiplies the expected numerical value by the provided factor.
array([[openCloseStr, ] separator][, subRuleOrFilter) Parses a string sequence of tokens and returns it as an array.
url Parses a UFL and returns all the tokenized members (domain, query params, port, etc.) in a JSON object. More info on how to parse URLs.

Advanced Settings

At the bottom of your grok processor tiles there is an Advanced Settings section:

Advanced Settings
  • Use the Extract from field to apply your grok processor on a given attribute instead of the default message attribute.

  • Use the Helper Rules field to define tokens for your parsing rules. Helper rules helps you factorize grok patterns across your parsing rules which is useful when you have several rules in the same grok parser that uses the same tokens.

Example for this classic unstructured log:

john id:12345 connected on 11/08/2017 on server XYZ in production

You could use the following parsing rule:

MyParsingRule %{user} %{connection} %{server}

with the following helpers:

user %{word:user.name} id:%{integer:user.id}
connection connected on %{date("MM/dd/yyyy"):connect_date}
server on server %{notSpace:server.name} in %{notSpace:server.env}
helper rules

Examples

Find below some examples demonstrating how to use parsers:

Key value

This is the key value core filter : keyvalue([separatorStr[, characterWhiteList [, quotingStr]]) where:

  • separatorStr : defines the separator. Default =
  • characterWhiteList: defines additional non escaped value chars. Default \\w.\\-_@
  • quotingStr : defines quotes. Default behavior detects quotes (<>, "\"\"", …). When defined default behavior is replaced by allowing only defined quoting char. For example <> matches test= test2=test.

Use filters such as keyvalue() to more-easily map strings to attributes:

log:

user=john connect_date=11/08/2017 id=123 action=click

Rule

rule %{data::keyvalue}
Parsing example 2

You don’t need to specify the name of your parameters as they were already contained in the log. If you add an extract attribute my_attribute in your rule pattern you would have:

Parsing example 2 bis

If = is not the default separator between your key and values, add a parameter in your parsing rule with the wanted splitter.

log:

user: john connect_date: 11/08/2017 id: 123 action: click

Rule

rule %{data::keyvalue(": ")}
Key value parser

If logs contain specials characters in an attribute value such as / in a url for instance, add it to the white-list in the parsing rule:

log:

url=https://app.datadoghq.com/event/stream user=john

Rule:

rule %{data::keyvalue("=","/:")}
Key value whitelist

Other examples:

Raw string Parsing rule Result
key=valueStr %{data::keyvalue} {“key”: “valueStr}
key=<valueStr> %{data::keyvalue} {“key”: “valueStr”}
key:valueStr %{data::keyvalue(":")} {“key”: “valueStr”}
key:“/valueStr” %{data::keyvalue(":", "/")} {“key”: “/valueStr”}
key:={valueStr} %{data::keyvalue(":=", "", "{}")} {“key”: “valueStr”}
key:=valueStr %{data::keyvalue(":=", "")} {“key”: “valueStr”}
key1:=>val1,key2:=>val2 %{data::keyvalue(":=>", ",")} {“key1”: “val1”,“key2”:“val2”}

Parsing dates

The date matcher transforms your timestamp in the EPOCH format.

Raw string Parsing rule Result
14:20:15 %{date("HH:mm:ss"):date} {“date”: 51615000}
11/10/2014 %{date("dd/MM/yyyy"):date} {“date”: 1412978400000}
Thu Jun 16 08:29:03 2016 %{date("EEE MMM dd HH:mm:ss yyyy"):date} {“date”: 1466065743000}
Tue Nov 1 08:29:03 2016 %{date("EEE MMM d HH:mm:ss yyyy"):date} {“date”: 1466065743000}
06/Mar/2013:01:36:30 +0900 %{date("dd/MMM/yyyy:HH:mm:ss Z"):date} {“date”: 1362501390000}
2016-11-29T16:21:36.431+0000 %{date("yyyy-MM-dd'T'HH:mm:ss.SSSZ"):date} {“date”: 1480436496431}
2016-11-29T16:21:36.431+00:00 %{date("yyyy-MM-dd'T'HH:mm:ss.SSSZZ"):date} {“date”: 1480436496431}
06/Feb/2009:12:14:14.655 %{date("dd/MMM/yyyy:HH:mm:ss.SSS"):date} {“date”: 1233922454655}
Thu Jun 16 08:29:03 2016 %{date("EEE MMM dd HH:mm:ss yyyy","Europe/Paris"):date} {“date”: 1466058543000}
2007-08-31 19:22:22.427 ADT %{date("yyyy-MM-dd HH:mm:ss.SSS z"):date} {“date”: 1188675889244}

Note: Parsing a date doesn’t set its value as the log official date, for this use the Log Date Remapper Log Date Remapper in a subsequent Processor.

Conditional pattern

You might have logs with two possible formats which differ in only one attribute. These cases can be handled with a single rule, using conditionals with |.

Log:

john connected on 11/08/2017
12345 connected on 11/08/2017

Rule: Note that “id” is an integer and not a string thanks to the “integer” matcher in the rule.

MyParsingRule (%{integer:user.id}|%{word:user.firstname}) connected on %{date("MM/dd/yyyy"):connect_date}

Results:

Parsing example 4
Parsing example 4 bis

Optional attribute

Some logs contain values that only appear part of the time. In those cases, you can make attribute extraction optional with ()? extracting it only when the attribute is contained in your log.

Log:

john 1234 connected on 11/08/2017 

Rule:

MyParsingRule %{word:user.firstname} (%{integer:user.id} )?connected on %{date("MM/dd/yyyy"):connect_date}

Note: you may usually need to include the space in the optional part otherwise you would end up with two spaces and the rule would not match anymore.

Parsing example 5
Parsing example 5 bis

Regex

Use the regex matcher to match any substring of your log message based on literal regex rules.

Log:

john_1a2b3c4 connected on 11/08/2017

Rule: Here we just look for the id to extract

MyParsingRule %{regex("[a-z]*"):user.firstname}_%{regex("[a-zA-Z0-9]*"):user.id} .*
Parsing example 6

Further Reading