Graphing Primer

There are two ways to interact with the Graphing Editor: using the GUI (the default method) and writing JSON (the more complete method). This page covers using the GUI. To learn more about using JSON, visit the JSON Graphing Primer Page

Find the Graph Editor

On each graph you will find a pencil icon that opens the graph editor.

Graphing Overview

The graph editor has three tabs, Share, JSON, and Edit. Share will allow you to embed the graph on any external web page. JSON is the more flexible editor, but it requires knowledge of the graph definition language to make use of it. Edit is the default tab and will allow you to use a GUI to select the graphing options. The newest features are sometimes only available on the JSON tab.

Graphing with the graphical editor interface

When you first open the graph editor window, you will be on the Edit tab. Here you can use the UI to choose most settings to tweak your graphs. Here is an example of what you might see. This example comes from the first graph in the standard Postgres Integration dashboard:

Graphing Edit Tab

Configuring a graph in a dashboard is a multi-step process. The first two steps depend

1) Choose the Metric to graph

When you create a graph, you will probably have a metric in mind that you want to show. You can select that in the first dropdown in the Choose metrics and events section. If you aren’t sure exactly which metric to use, you might want to start with the Metrics Explorer. You can also look in the Metrics Summary.

The Metrics Explorer will allow you to play around with different graph settings in a more ad-hoc way. The Metrics Summary will allow to learn more about the type of metric as well as setting the default unit for a metric.

2) Select your visualization

Once you have a metric in mind to display in your graph, select your visualization.

Timeseries

The Timeseries visualization is great for showing one or more metrics over time. The time window depends on what is selected on the timeboard or in the graph on a screenboard. Timeseries’ can be displayed as lines, areas, and bars. To see an example of a timeseries graph, . Timeseries is available on both timeboards and screenboards.

Timeseries

Heatmap

The Heatmap visualization is great for showing metrics aggregated across many tags, such as hosts. The more hosts that have a particular value, the darker that square will be. To see an example of a heatmap, . Heatmap is available on both timeboards and screenboards.

Heatmap

Distribution

The Distribution visualization is another way of showing metrics aggregated across many tags, such as hosts. Unlike the Heatmap, Distribution’s x-axis is the quantity rather than time. To see an example of a distribution graph, . Distribution is available on both timeboards and screenboards.

Distribution

Toplist

The Toplist visualization is perfect when you want to see the list of hosts with the most or least of any metric value, such as highest consumers of CPU, hosts with the least disk space, etc. To see an example of a Toplist, . Toplist is available on both timeboards and screenboards.

TopList

Change

The Change graph will show you the change in a value over the time period chosen. To see an example of a Change graph, .

Changegraph

Hostmap

The Hostmap will graph any metric for any subset of hosts on the same hostmap visualization available from the main Infrastructure Hostmap menu. To see an example of a Hostmap, .

Hostmap

3) Filter and Aggregate to show what you need

Filter

Now that you have the metric and a visualization in place, you can filter down the hosts to be graphed. To the right of the metric is a dropdown which by default says (everywhere). Click this and choose the tag(s) you want to filter by. To learn more about tags, refer to the Guide to Tagging.

Aggregation Method

Next to the filter dropdown is the aggregation method. This defaults to avg by but can be changed to max by, min by, or sum by. In most cases, the metric will have many values for each time interval, coming from many hosts or instances. The aggregation method chosen determines how the metrics will be aggregated into a single line. So if you are graphing a metric that is from 100 hosts, sum by will add up all of those values and display the sum.

Aggregation Groups

After the aggregation method you can determine what constitutes a line or grouping in a graph. If you choose host, then you will have a line (in the case of line graphs) for each host. If you choose role, then there is a line for every role. Then that line will be made up of metrics from all the hosts in that role, aggregated using the method you chose above.

4) Rollup to aggregate over time

Regardless of the options chosen above, there will always be some aggregation of data due to the physical size constraints of the window holding the graph. If a metric is updated every second and you are looking at 4 hours of data, you will need 14,400 points to display everything. Each graph we display will have about 300 points shown at any given time.

In the example above, each point displayed on the screen represents 48 data points. In practice, metrics are collected by the agent every 15-20 seconds. So one day’s worth of data is 4,320 data points. You might consider a rollup function that looks at 5 or 10 minutes worth of data if you would like to have more control over the display of your data for a graph that shows 1 day.

To use the rollup function, click the plus sign to the right of the aggregation group and choose rollup from the dropdown. Now choose how you want to aggregate the data and the interval in seconds.

To create a single line that represents the total available disk space on average across all machines rolled up in 60 seconds buckets, you would use a query like this:

rollup example

When switching to the JSON view, the query will look like this:

"q": "avg:system.disk.free{*}.rollup(avg, 60)"

For more about using the JSON view, visit the JSON Graphing Primer page.

5) Apply more advanced functions

Depending on your analysis needs, you may choose to apply other mathematical functions to the query. Examples include rates and derivatives, smoothing, and more. For a list of available functions, .

FunctionCategoryDescription
abs()Arithmeticabsolute value
log2()Arithmeticbase-2 logarithm
log10()Arithmeticbase-10 logarithm
cumsum()Arithmeticcumulative sum over visible time window
integral()Arithmeticcumulative sum of ([time delta] x [value delta]) over all consecutive pairs of points in the visible time window
.fill()Interpolationchoose how to interpolate missing values
hour_before()Timeshiftmetric values from one hour ago
day_before()Timeshiftmetric values from one day ago
week_before()Timeshiftmetric values from one week ago
month_before()Timeshiftmetric values from one month ago
per_second()Ratethe rate at which the metric changes per second
per_minute()Rateper_second() * 60
per_hour()Rateper_second() * 3600
dt()Ratetime delta between points
diff()Ratevalue delta between points
derivative()Rate1st order derivative; diff() / dt()
ewma_3()Smoothingexponentially weighted moving average with a span of 3
ewma_5()SmoothingEWMA with a span of 5
ewma_10()SmoothingEWMA with a span of 10
ewma_20()SmoothingEWMA with a span of 20
median_3()Smoothingrolling median with a span of 3
median_5()Smoothingrolling median with a span of 5
median_7()Smoothingrolling median with a span of 7
median_9()Smoothingrolling median with a span of 9
.rollup()Rollupoverride default time aggregation type and time period; see the “Rollup” section below for details
count_nonzero()Countcount all the non-zero values
count_not_null()Countcount all the non-null values
top()Rankselect the top series responsive to a given query, according to some ranking method; see the “Top functions” section below for more details
top_offset()Ranksimilar to top(), except with an additional offset parameter, which controls where in the ordered sequence of series the graphing starts. For example, an offset of 2 would start graphing at the number 3 ranked series, according to the chosen ranking metric.
robust_trend()Regressionfit a robust regression trend line using Huber loss; see the “Robust regression” section below for more details
trend_line()Regressionfit an ordinary least squares regression line through the metric values
piecewise_constant()Regressionapproximate the metric with a piecewise function composed of constant-valued segments
anomalies()Algorithmsoverlay a gray band showing the expected behavior of a series based on past behavior; see our guide to anomaly detection
outliers()Algorithmshighlight outlier series; see our guide to outlier detection

.as_count() & .as_rate()

These functions are only intended for metrics submitted as rates or counters via statsd. These functions will have no effect for other metric types. For more on details about how to use .as_count() and .as_rate() please see our blog post.

Rollup

.rollup() is recommended for expert users only. Appending this function to the end of a query allows you to control the number of raw points rolled up into a single point plotted on the graph. The function takes two parameters, method and time: .rollup(method,time)

The method can be sum/min/max/count/avg and time is in seconds. You can use either one individually, or both together like .rollup(sum,120). We impose a limit of 350 points per time range. For example, if you’re requesting .rollup(20) for a month-long window, we will return data at a rollup far greater than 20 seconds in order to prevent returning a gigantic number of points.

Top functions

  • a metric query string with some grouping, e.g. avg:system.cpu.idle{*} by {host}
  • the number of series to be displayed, as an integer.
  • one of 'max', 'min', 'last', 'l2norm', or 'area'. 'area' is the signed area under the curve being graphed, which can be negative. 'l2norm' uses the L2 Norm of the time series, which is always positive, to rank the series.
  • either 'desc' (rank the results in descending order) or 'asc' (ascending order).

The top() method also has convenience functions of the following form, all of which take a single series list as input:

[top, bottom][5, 10, 15, 20]_[mean, min, max, last, area, l2norm]()

For example, bottom10_min() retrieves lowest-valued 10 series using the ‘min’ metric.

Robust regression

The most common type of linear regression – ordinary least squares (OLS) – can be heavily influenced by a small number of points with extreme values. Robust regression is an alternative method for fitting a regression line; it is not influenced as strongly by a small number of extreme values. As an example, see the following plot.

The original metric is shown as a solid blue line. The purple dashed line is an OLS regression line, and the yellow dashed line is a robust regression line. The one short-lived spike in the metric leads to the OLS regression line trending upward, but the robust regression line ignores the spike and does a better job fitting the overall trend in the metric.

6) Set Y-axis scale

By default, the Y-axis for your graph is set to linear with the minimum and maximum automatically set based on the values in the data and including zero. To make changes to the Y-axis, click the button Show Y-Axis Controls. Now you can change the scale from linear to log, pow, or sqrt. Next you can choose the minimum or maximum, and select whether zero should always be shown or not.

7) Overlay events for additional context

You can repeat all the steps above to add additional metrics to your graph to add context. You can also add events from related system to add even more context. So an example would be to add github commits, Jenkins deploys, or Docker creation events. Just click the Overlay Events button and enter a query to find and display your events. To show anything from a source such as Github, use sources:github. For all the events with the tag role:web, use tag:role:web.

8) Create a title

If you don’t enter a title, we will automatically generate a title based on the selections you have made. But it may be more useful to the users of the dashboard to create a title that more aptly describes the purpose of the graph. Linking the technical purpose to the business benefits adds even more value.

9) Save

The final step is to click Save. You can always come back in to the editor and tweak the graph further depending on your needs.