Trace Metrics

Overview

Tracing application metrics are collected after you enable trace collection and instrument your application.

Trace Metrics

These metrics capture request counts, error counts, and latency measures. They are calculated based on 100% of the application’s traffic, regardless of any trace ingestion sampling configuration. Ensure that you have full visibility into your application’s traffic by using these metrics to spot potential errors on a service or a resource, and by creating dashboards, monitors, and SLOs.

Note: If your applications and services are instrumented with OpenTelemetry libraries and you set up sampling at the SDK level and/or at the collector level, APM metrics are calculated based on the sampled set of data.

Trace metrics are generated for service entry spans and certain operations depending on integration language. For example, the Django integration produces trace metrics from spans that represent various operations (1 root span for the Django request, 1 for each middleware, and 1 for the view).

The trace metrics namespace is formatted as:

  • trace.<SPAN_NAME>.<METRIC_SUFFIX>

With the following definitions:

<SPAN_NAME>
The name of the operation or span.name (examples: redis.command, pylons.request, rails.request, mysql.query).
<METRIC_SUFFIX>
The name of the metric (examples: hits, errors, apdex, duration). See the section below.
<TAGS>
Trace metrics tags, possible tags are: env, service, version, resource, http.status_code, http.status_class, and Datadog Agent tags (including the host and second primary tag). Note: Other tags set on spans are not available as tags on traces metrics.

Metric suffix

Hits

trace.<SPAN_NAME>.hits
Prerequisite: This metric exists for any APM service.
Description: Represent the count of spans created with a specific name (for example, redis.command, pylons.request, rails.request, or mysql.query).
Metric type: COUNT.
Tags: env, service, version, resource, resource_name, http.status_code, all host tags from the Datadog Host Agent, and the second primary tag.
trace.<SPAN_NAME>.hits.by_http_status
Prerequisite: This metric exists for HTTP/WEB APM services if http metadata exists.
Description: Represent the count of hits for a given span break down by HTTP status code.
Metric type: COUNT.
Tags: env, service, version, resource, resource_name, http.status_class, http.status_code, all host tags from the Datadog Host Agent, and the second primary tag.

Latency distribution

trace.<SPAN_NAME>
Prerequisite: This metric exists for any APM service.
Description: Represent the latency distribution for all services, resources, and versions across different environments and second primary tags.
Metric type: DISTRIBUTION.
Tags: env, service, resource, resource_name, version, synthetics, and the second primary tag.

Errors

trace.<SPAN_NAME>.errors
Prerequisite: This metric exists for any APM service.
Description: Represent the count of errors for a given span.
Metric type: COUNT.
Tags: env, service, version, resource, resource_name, http.status_code, all host tags from the Datadog Host Agent, and the second primary tag.
trace.<SPAN_NAME>.errors.by_http_status
Prerequisite: This metric exists for any APM service.
Description: Represent the count of errors for a given span.
Metric type: COUNT.
Tags: env, service, version, resource, http.status_class, http.status_code, all host tags from the Datadog Host Agent, and the second primary tag.

Apdex

trace.<SPAN_NAME>.apdex
Prerequisite: This metric exists for any HTTP or web-based APM service.
Description: Measures the Apdex score for each web service.
Metric type: GAUGE.
Tags: env, service, resource / resource_name, version, synthetics, and the second primary tag.

Duration

trace.<SPAN_NAME>.duration
Prerequisite: This metric exists for any APM service.
Description: Measure the total time for a collection of spans within a time interval, including child spans seen in the collecting service. For most use cases, Datadog recommends using the Latency Distribution for calculation of average latency or percentiles. To calculate the average latency with host tag filters, you can use this metric with the following formula:
sum:trace.<SPAN_NAME>.duration{<FILTER>}.rollup(sum).fill(zero) / sum:trace.<SPAN_NAME>.hits{<FILTER>}
This metric does not support percentile aggregations. Read the Latency Distribution section for more information. Metric type: GAUGE.
Tags: env, service, resource, http.status_code, all host tags from the Datadog Host Agent, and the second primary tag.

Duration by

This method of using trace metrics is outdated. Instead, tracing distribution metrics using DDSketch is recommended.
trace.<SPAN_NAME>.duration.by_http_status
Prerequisite: This metric exists for HTTP/WEB APM services if http metadata exists.
Description: Measure the total time for a collection of spans for each HTTP status. Specifically, it is the relative share of time spent by all spans over an interval and a given HTTP status - including time spent waiting on child processes.
Metric type: GAUGE.
Tags: env, service, resource, http.status_class, http.status_code, all host tags from the Datadog Host Agent, and the second primary tag.

Further Reading