Graphing Primer using JSON

There are two ways to interact with the Graphing Editor: using the GUI (the default method) and writing JSON (the more complete method). This page covers using JSON. To learn more about the GUI editor, visit the main Graphing Primer Page

Graphing with the JSON editor

Grammar

The graph definition language is well-formed JSON and is structured in four parts:

  1. Requests
  2. Events
  3. Visualization
  4. Y Axis

Here is how they fit together in a JSON dictionary:

{
  "requests": [
    {
      "q": "metric{scope}"
    }
  ],
  "events": [
    {
      "q": "search query"
    }
  ],
  "viz": "visualization type",
  "yaxis": {
    "yaxisoptionkey": "yaxisoptionvalue"
  }
}

In other words at the highest level the JSON structure is a dictionary with two, three, or four entries:

  1. “requests” *
  2. “events”
  3. “viz” *
  4. “yaxis”

* only requests and viz are required.

Requests

The general format for a series is:

"requests": [
    {
      "q": "function(aggregation method:metric{scope} [by {group}])"
    }
]

The function and group are optional.

A Series can be further combined together via binary operators (+, -, /, *):

metric{scope} [by {group}] operator metric{scope} [by {group}]

Functions

You can apply functions to the result of each query.

A few of these functions have been further explained in a series of examples. Visit this page for more detail: Examples

FunctionCategoryDescription
abs()Arithmeticabsolute value
log2()Arithmeticbase-2 logarithm
log10()Arithmeticbase-10 logarithm
cumsum()Arithmeticcumulative sum over visible time window
integral()Arithmeticcumulative sum of ([time delta] x [value delta]) over all consecutive pairs of points in the visible time window
.fill()Interpolationchoose how to interpolate missing values
hour_before()Timeshiftmetric values from one hour ago
day_before()Timeshiftmetric values from one day ago
week_before()Timeshiftmetric values from one week ago
month_before()Timeshiftmetric values from one month ago
per_second()Ratethe rate at which the metric changes per second
per_minute()Rateper_second() * 60
per_hour()Rateper_second() * 3600
dt()Ratetime delta between points
diff()Ratevalue delta between points
derivative()Rate1st order derivative; diff() / dt()
ewma_3()Smoothingexponentially weighted moving average with a span of 3
ewma_5()SmoothingEWMA with a span of 5
ewma_10()SmoothingEWMA with a span of 10
ewma_20()SmoothingEWMA with a span of 20
median_3()Smoothingrolling median with a span of 3
median_5()Smoothingrolling median with a span of 5
median_7()Smoothingrolling median with a span of 7
median_9()Smoothingrolling median with a span of 9
.rollup()Rollupoverride default time aggregation type and time period; see the “Rollup” section below for details
count_nonzero()Countcount all the non-zero values
count_not_null()Countcount all the non-null values
top()Rankselect the top series responsive to a given query, according to some ranking method; see the “Top functions” section below for more details
top_offset()Ranksimilar to top(), except with an additional offset parameter, which controls where in the ordered sequence of series the graphing starts. For example, an offset of 2 would start graphing at the number 3 ranked series, according to the chosen ranking metric.
robust_trend()Regressionfit a robust regression trend line using Huber loss; see the “Robust regression” section below for more details
trend_line()Regressionfit an ordinary least squares regression line through the metric values
piecewise_constant()Regressionapproximate the metric with a piecewise function composed of constant-valued segments
anomalies()Algorithmsoverlay a gray band showing the expected behavior of a series based on past behavior; see our guide to anomaly detection
outliers()Algorithmshighlight outlier series; see our guide to outlier detection

.as_count() & .as_rate()

These functions are only intended for metrics submitted as rates or counters via statsd. These functions will have no effect for other metric types. For more on details about how to use .as_count() and .as_rate() please see our blog post.

Rollup

.rollup() is recommended for expert users only. Appending this function to the end of a query allows you to control the number of raw points rolled up into a single point plotted on the graph. The function takes two parameters, method and time: .rollup(method,time)

The method can be sum/min/max/count/avg and time is in seconds. You can use either one individually, or both together like .rollup(sum,120). We impose a limit of 350 points per time range. For example, if you’re requesting .rollup(20) for a month-long window, we will return data at a rollup far greater than 20 seconds in order to prevent returning a gigantic number of points.

Top functions

  • a metric query string with some grouping, e.g. avg:system.cpu.idle{*} by {host}
  • the number of series to be displayed, as an integer.
  • one of 'max', 'min', 'last', 'l2norm', or 'area'. 'area' is the signed area under the curve being graphed, which can be negative. 'l2norm' uses the L2 Norm of the time series, which is always positive, to rank the series.
  • either 'desc' (rank the results in descending order) or 'asc' (ascending order).

The top() method also has convenience functions of the following form, all of which take a single series list as input:

[top, bottom][5, 10, 15, 20]_[mean, min, max, last, area, l2norm]()

For example, bottom10_min() retrieves lowest-valued 10 series using the ‘min’ metric.

Robust regression

The most common type of linear regression – ordinary least squares (OLS) – can be heavily influenced by a small number of points with extreme values. Robust regression is an alternative method for fitting a regression line; it is not influenced as strongly by a small number of extreme values. As an example, see the following plot.

The original metric is shown as a solid blue line. The purple dashed line is an OLS regression line, and the yellow dashed line is a robust regression line. The one short-lived spike in the metric leads to the OLS regression line trending upward, but the robust regression line ignores the spike and does a better job fitting the overall trend in the metric.

There are also a few functions you can append to a query which we recommend for expert users only.

One of these is .rollup(). Appending this function allows you to control the number of points rolled up into a single point. This function takes two parameters, method and time, like so: .rollup(method,time).

The method can be sum/min/max/count/avg and time is in seconds. You can use either one individually, or both combined like .rollup(sum,120). There are some checks on this, though, because for a given time range we do not return more than 350 points. Thus if you’re requesting .rollup(20) where 20 is in seconds, and ask for a month of data, we will be returning the points at a rollup of far greater than 20 seconds.

.as_count() and .as_rate() are two other expert-only functions, which are only intended for metrics submitted in a certain way (for metadata types where that is acceptable). At present, for metrics submitted as rates or counters via statsd, appending .as_count() or .as_rate() will function correctly. For other metrics, including gauges submitted by statsd, .as_count() and .as_rate() will have no effect.

For more on .as_count() please see our blog post here.

Aggregation Method

In most cases, the number of data points available outnumbers the maximum number that can be shown on screen. To overcome this, the data is aggregated using one of 4 available methods: average, max, min, and sum.

Metrics

The metric is the main focus of the graph. You can find the list of metrics available to you in the Metrics Summary. Click on any metric to see more detail about that metric, including the type of data collected, units, tags, hosts, and more.

Scope

A scope lets you filter a Series. It can be a host, a device on a host or any arbitrary tag you can think of that contains only alphanumeric characters plus colons and underscores ([a-zA-Z0-9:_]+).

Examples of scope (meaning in parentheses):

host:my_host                      (related to a given host)
host:my_host, device:my_device    (related to a given device on a given host)
source:my_source                  (related to a given source)
my_tag                            (related to a tagged group of hosts)
my:tag                            (same)
*                                 (wildcard for everything)

Groups

For any given metric, data may come from a number of hosts. The data will normally be aggregated from all these hosts to a single value for each time slot. If you wish to split this out, you can by any tag. To include a data point seperated out by each host, use {host} for your group.

Arithmetic

You can apply simple arithmetic to a Series (+, -, * and /). In this example we graph 5-minute load and its double:

{
  "viz": "timeseries",
  "requests": [
    {
      "q": "system.load.5{intake} * 2"
    },
    {
      "q": "system.load.5{intake}"
    }
  ]
}

You can also add, substract, multiply and divide a Series. Beware that Datadog does not enforce consistency at this point so you can divide apples by oranges.

{
    "viz": "timeseries",
    "requests": [
      {
        "q": "metric{apples} / metric{oranges}"
      }
    ]
}

Events

You can overlay any event from Datadog. The general format is:

"events": [
  {
    "q": "search query"
  }
]

For instance, to indicate that you want events for host X and tag Y:

"events": [
  {
    "q": "host:X tags:Y"
  }
]

or if you’re looking to display all errors:

"events": [
  {
    "q": "status:error"
  }
]

Visualization

Data can be visualized in a few different ways:

  1. Time Series
  2. Heatmap
  3. Distribution
  4. Toplist
  5. Change
  6. Hostmap

The Time Series can be further broken down to:

  1. as line charts
  2. as stacked areas
  3. as slice-n-stack areas
  4. as bar charts

Line Charts

The representation is automatically derived from having multiple requests values.

"requests": [
    {
      "q": "metric1{scope}"
    },
    {
      "q": "metric2{scope}"
    },
    {
      "q": "metric3{scope}"
    }
  ]

Stacked Series

In the case of related Time Series, you can easily draw them as stacked areas by using the following syntax:

"requests": [
    {
      "q": "metric1{scope}, metric2{scope}, metric3{scope}"
    }
]

Instead of one query per chart you can aggregate all queries into one and simply concatenate the queries.

Slice-n-Stack

A useful visualization is to represent a metric shared across hosts and stack the results. For instance, when selecting a tag that applies to more than 1 host you will see that ingress and egress traffic is nicely stacked to give you the sum as well as the split per host. This is useful to spot wild swings in the distribution of network traffic.

Here’s how to do it for any metric:

"requests" [
  {
     "q": "system.net.bytes_rcvd{some_tag, device:eth0} by {host}"
  }
]

Note that in this case you can only have 1 query. But you can also split by device, or a combination of both:

"requests" [
  {
     "q": "system.net.bytes_rcvd{some_tag} by {host,device}"
  }
]

to get traffic for all the tagged hosts, split by host and network device.

Y-Axis Controls

The Datadog y-axis controls (currently just via the JSON editor) allow you to:

  • Clip y-axis to specific ranges
  • Filter series either by specifying a percentage or an absolute value
  • Change y-axis scale from linear to log, sqrt or power scale

There are four configuration settings:

  • min (optional): Specifies minimum value to show on y-axis. It takes a number, or “auto” for default behvior. Default value is “auto”
  • max (optional): Specifies the maximum value to show on y-axis. It takes a number, or “auto” for default behavior. Default value is “auto”
  • scale (optional): Specifies the scale type. Possible values: “linear”, “log”, “sqrt”, “pow##” (eg. pow2, pow0.5, 2 is used if only “pow” was provided”), Default scale is “linear”.
  • units (optional): Specifies whether to show the metric unit along the y-axis. Possible values: “true” or “false”. Default is “false”.

Examples:

"yaxis": {
    "min": "auto",
    "max": 200,
    "scale": "log"
}

"yaxis": {
    "min": 200,
    "scale": "sqrt"
}

"yaxis": {
    "min": 9000,
    "max": 10000
}

"yaxis": {
    "scale": "pow0.1"
}

"yaxis": {
    "scale": "pow3"
}

"yaxis": {
    "units": "true"
}

Filtering

Filter configuration allows you to automatically change y-axis bounds based on a threshold. Thresholds can be a percentage or an absolute value, and it can apply to both both ends of the graph (lower and upper).

For y-axis filtering, there are two ways to set up the configuration.

To begin, there is a simple configuration where you specify an absolute value or a percentage and all values above the value or all values that sit within the top ##% will be cutoff.

Examples:

"yaxis": {
    "filter": 30 // all values above 30 will not appear
}

"yaxis": {
    "filter": "5%" // the top 5% of that data will not appear
}

Advanced configuration works the same way as simple configuration, with the added flexibility of configuring the lower or the upper or both parts of the graph. For example, the following configuration will limit the graph to data points that are not in the bottom 10% nor in the top 30%.

"yaxis": {
    "filter": {
        "top": "30%",
        "bottom": "10%"
    }
}

The following will show all data except those with values higher than 15:

"yaxis": {
    "filter": {
        "above": 15
    }
}

The following will hide data points below 2:

"yaxis": {
    "filter": {
        "below": 2
    }
}

Here is a full JSON example:

{
  "viz": "timeseries",
  "requests": [
    {
      "q": "system.cpu.idle{host:hostname}",
      "stacked": false
    }
  ],
  "events": [],
  "yaxis": {
    "scale": "log"
    "filter": {
         "top": "5%",
         "below": 15
     }
  },
}

Examples

Here is an example using the rate() function, which takes only a single metric as a parameter. Other functions, with the exception of top() and top_offset(), have identical syntax.

{
  "viz": "timeseries",
  "requests": [
    {
      "q": "rate(sum:system.load.5{role:intake-backend2} by {host})",
      "stacked": false
    }
  ]
}

Here is an example using the top() function:

{
  "viz": "timeseries",
  "requests": [
    {
      "q": "top(avg:system.cpu.iowait{*} by {host}, 5, 'max', 'desc')",
      "stacked": false
    }
  ]
}

This will show the graphs for the five series with the highest peak system.cpu.iowait values in the query window.

To look at the hosts with the 6th through 10th highest values (for example), use top_offset instead:

{
  "viz": "timeseries",
  "requests": [
    {
      "q": "top_offset(avg:system.cpu.iowait{*} by {host}, 5, 'max', 'desc', 5)",
      "stacked": false
    }
  ]
}

Here is an example using the week_before() function:

{
  "viz": "timeseries",
  "requests": [
    {
      "q": "sum:haproxy.count_per_status{status:available} - week_before(sum:haproxy.count_per_status{status:available})"
    }
  ]
}