Create a Dashboard to track and correlate APM metrics
Datadog's Research Report: The State of Serverless Report: The State of Serverless

Create a Dashboard to track and correlate APM metrics

4 minutes to complete

Datadog APM allows you to create dashboards based on your business priorities and metrics important to you: You can create widgets on these dashboards to keep track of any traditional infrastructure, logs and custom metrics like host memory usage alongside critical APM metrics based on throughput, latency, and error rate for correlation. Next to these you can track latency of the user experience of your top customers or largest transactions and alongside these keep track of the throughput of your main web-server ahead of any major events like Black Friday.

This guides walks you through adding trace metrics to a dashboard, correlating them with infrastructure metrics and then how to export a App Analytics query. This guide covers adding widgets to the dashboard in three ways:

  • Copying an existing APM graph ( Step 1. 2. & 3.)
  • Creating it manually. (Step 4. & 5. )
  • Exporting a App Analytics query. (Step 7.)
  1. Open the Service List page and choose the web-store service.

  2. Find the Total Requests Graph and click on the export button on the top right to choose Export to Dashboard. Click New Timeboard.

  3. Click on View Dashboard in the success message.

    In the new dashboard, the Hit/error count on service graph for the web-store service is now available. It shows the entire throughput of this service as well as its total amount of errors.

    Note: You can click on the pencil icon to edit this graph and see what precise metrics are being used.

  4. Click on the Add graph placeholder tile on the dashboard space and then Drag a Timeseries to this space.

    This is the dashboard widget edit screen. It empowers you to create any type of visualization across all of the metrics available to you. See the Timeseries widget documentation to learn more.

  5. Click on the system.cpu.user box and choose the metric and parameters relevant to you, in this example:

    ParameterValueDescription
    metrictrace.rack.requests.errorsThe Ruby Rack total set of erroneous requests.
    fromservice:web-storeThe main service in this example stack, it is a Ruby service and all the information in the chart with come from it.
    sum byhttp.status_codeBreaking down the chart by http status codes.

    This specific breakdown is just one example of the many can choose. It is important to note that any metric that starts with trace. contains APM information. See the APM metric documentation to learn more.

  6. Drag another timeseries to the placeholder tile

    In this example two different types of metrics are added to a graph, a trace.* and a runtime.* one. Combined, these metrics allow you to correlate information between requests and code runtime performances. Specificallly, the latency of a service is displayed next to the thread count, knowing that latency spikes might be assocaited with an increase in the thraed count:

    1. First, add trace.rack.reqesusts.errors metric into the widget:

      ParameterValueDescription
      metrictrace.rack.request.duration.by.service.99pThe 99th percentile of latency of requests in our service.
      fromservice:web-storeThe main service in this example stack, it is a Ruby service and all the information in the chart with come from it.
    2. Then click on the Graph additional: Metrics to add another metric to the chart:

      ParameterValueDescription
      metricruntime.ruby.thread_countThread count taken from the Ruby runtime metrics.
      fromservice:web-storeThe main service in this example stack, it is a Ruby service and all the information in the chart with come from it.

    This setup can show whether a spike in latency is assocaited with a spike in the ruby thread count, immediately pointing out the cause for latency allowing for fast resolution.

  7. Go to App Analytics.

    This example shows how to query the latency across the example application: breaking it down by merchants on the platform and view the top-10 merchants with highest latency. From the App Analytics screen, export the graph to the dashboard and view it there:

  8. Return to your dashboard.

    Multiple widgets can now be seen providing deep observability into the example application from both a technical perspective and a business one. But this is only the start of what you can do: add infrastructure metrics, use multiple types of visualizations and add calculations and projections.

    With the dashboard you can also explore related events.

  9. Click on the Search Events or Logs button and add search for a relevant event stream. Note: in this example Ansible is used, your event stream might be different.

    Here, alongside the view of our dashboard, recent events that have happened (in datadog or in external services like Ansible, Chef, etc.) can be seen such as: deployments, task completions, or monitors alerting. These events can then be correlated to what is happening to the metrics setup in the dashboard.

    Finally, make sure to use template variables. These are a set of values that dynamically control the widgets on the dashboards that every users can use without having to edit the widgets themselves.

  10. Click on Add Template Variables in the control panel. Click Add Variable +, name the template variable and choose the tag that the variable will control.

    In this example a template variable for Region is added to see how the dashboard behaves across us-east1 and europe-west-4, out two primary areas of opeation.

    You can now add this template variable to each of the graphs:

    When you change the value in the control panel, all values update in the dashboard:

    Be sure to explore all the metrics available to you and take full advantage of the Datadog 3 pillars of observability. You can easily turn this basic dashboard into a powerful tool that is a one-stop-shop for monitoring and observability in your organization:

Further Reading