---
isPrivate: true
title: APM Investigator
description: >-
  Use APM Investigator to analyze performance issues and investigate latency
  patterns across your distributed services.
breadcrumbs: Docs > APM > Tracing Guides > APM Investigator
---

# APM Investigator

{% callout %}
##### Request access to the Preview!

APM Investigator is in Preview. To request access, fill out this form.

[Request Access](https://www.datadoghq.com/product-preview/apm-investigator/)
{% /callout %}

## Overview{% #overview %}

The APM Investigator helps you diagnose and resolve application latency issues through a guided, step-by-step workflow. It consolidates analysis tools into a single interface so you can identify root causes and take action.

{% image
   source="https://docs.dd-static.net/images/tracing/guide/apm_investigator/apm_investigator.05285315cf43251384048612618eb3cc.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/tracing/guide/apm_investigator/apm_investigator.05285315cf43251384048612618eb3cc.png?auto=format&fit=max&w=850&dpr=2 2x"
   alt="APM Investigator UI" /%}

## Key capabilities{% #key-capabilities %}

The APM Investigator helps you:

- **Investigate slow request clusters**: Select problematic requests directly from the latency scatter plot.
- **Identify the source of latency**: Determine whether latency originates from your service, a downstream dependency, databases, or third-party APIs.
- **Narrow the scope**: Isolate issues to specific data centers, clusters, or user segments with [Tag Analysis](https://docs.datadoghq.com/tracing/trace_explorer/tag_analysis.md).
- **Find root causes**: Detect faulty deployments, database slowness, third-party service failures, infrastructure problems, and service-level issues.

## Starting an investigation{% #starting-an-investigation %}

Launch an investigation from an APM service page or a resource page.

1. Navigate to a service showing latency issues.
1. Find the **Latency** graph showing the anomaly.
1. Hover over the graph and click **Investigate**. This opens the investigation side panel.

{% image
   source="https://docs.dd-static.net/images/tracing/guide/apm_investigator/apm_investigator_entrypoint.e65ba3880cd05eb91830b1f233d04085.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/tracing/guide/apm_investigator/apm_investigator_entrypoint.e65ba3880cd05eb91830b1f233d04085.png?auto=format&fit=max&w=850&dpr=2 2x"
   alt="APM Investigator entrypoint" /%}

## Investigation workflow{% #investigation-workflow %}

### Define context: Select slow and normal spans{% #define-context-select-slow-and-normal-spans %}

To trigger the latency analysis, select two zones on the point plot:

- **Slow**: Problematic, slow spans
- **Normal**: Baseline, healthy spans

{% alert level="info" %}
Latency anomalies detected by Watchdog are pre-selected.
{% /alert %}

{% image
   source="https://docs.dd-static.net/images/tracing/guide/apm_investigator/latency_selection.2965ab35cda77e3bf7e67dad16daba91.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/tracing/guide/apm_investigator/latency_selection.2965ab35cda77e3bf7e67dad16daba91.png?auto=format&fit=max&w=850&dpr=2 2x"
   alt="Selection of slow spans on the point plot" /%}

This comparison between the slow and normal spans drives all subsequent analysis.

### Step 1: Identify the latency bottleneck{% #step-1-identify-the-latency-bottleneck %}

The investigator identifies whether latency originates from your service or its downstream dependencies (services, databases, third-party APIs).

**Analysis approach**: The investigator compares trace data from both your selected slow and normal periods. To find the service responsible for the latency increase, it compares:

**Execution Time**: Compares each service's "self-time", defined as the time spent on its own processing, excluding waits for downstream dependencies, across the two datasets. The service with the largest absolute latency increase is the primary focus.

- **Call Patterns Between Services**: Analyzes changes in the number of requests between services. For example, if service Y significantly increases its calls to downstream service X, the investigator might identify Y as the bottleneck.

Based on this comprehensive analysis, the investigator recommends a service as the likely latency bottleneck. Expand the latency bottleneck section to see details about the comparison between slow and normal traces. A table surfaces the changes in self-time and in the number of inbound requests by service.

The following example shows two side-by-side flame graphs that compare slow traces against healthy traces in more detail. Use the arrows to cycle through example traces and click **View** to open the trace in a full pageview.

{% image
   source="https://docs.dd-static.net/images/tracing/guide/apm_investigator/latency_bottleneck.38483b904fd623172d0669bf9a3a71e1.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/tracing/guide/apm_investigator/latency_bottleneck.38483b904fd623172d0669bf9a3a71e1.png?auto=format&fit=max&w=850&dpr=2 2x"
   alt="Latency bottleneck section" /%}

To investigate recent changes to a service, click the `+` icon that appears when you hover over a row to add it as context for your investigation.

### Step 2: Correlate to recent changes{% #step-2-correlate-to-recent-changes %}

The investigator then helps you determine if recent deployments on the service or the latency bottleneck service caused the latency increase.

The **Recent changes** section surfaces:

- Deployments that occurred near the latency spike timeline on a [change tracking](https://docs.datadoghq.com/change_tracking.md) widget
- A latency graph broken down by version

{% image
   source="https://docs.dd-static.net/images/tracing/guide/apm_investigator/recent_changes.6427209ff1d857a804c7ec016e292e9a.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/tracing/guide/apm_investigator/recent_changes.6427209ff1d857a804c7ec016e292e9a.png?auto=format&fit=max&w=850&dpr=2 2x"
   alt="Recent changes" /%}

**Analysis approach**: The APM Investigator analyzes this data in the background to flag if this section is relevant to examine (if a deployment occurred around the time of the latency increase you are investigating).

### Step 3: Find common patterns with Tag Analysis{% #step-3-find-common-patterns-with-tag-analysis %}

The investigator also uses [Tag Analysis](https://docs.datadoghq.com/tracing/trace_explorer/tag_analysis.md) to help you discover shared attributes that distinguish slow traces from healthy traces. Tag Analysis highlights dimensions with significant distribution differences between slow and normal datasets.

{% image
   source="https://docs.dd-static.net/images/tracing/guide/apm_investigator/tag_analysis.bb70bf06342cc743cad0dfe94feffca9.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/tracing/guide/apm_investigator/tag_analysis.bb70bf06342cc743cad0dfe94feffca9.png?auto=format&fit=max&w=850&dpr=2 2x"
   alt="Common patterns in slow traces" /%}

The section surfaces:

- Tag distributions comparing the slow and normal datasets across all span dimensions.
- Highlights of the most discriminating dimensions that might help you understand the latency issue, such as `org_id`, `kubernetes_cluster`, or `datacenter.name`.

The APM Investigator only surfaces this section when dimensions show significant differentiation that is worth examining.

### End user impact{% #end-user-impact %}

Above the point plot, you can find a preview of how many end-users, account and application pages (for example, `/checkout`) are affected by the problem. This information is collected if you enabled the connection between [RUM and traces](https://docs.datadoghq.com/tracing/other_telemetry/rum.md).

{% image
   source="https://docs.dd-static.net/images/tracing/guide/apm_investigator/end_user_impact.a390ebdbaa7a5ff12a074f21cecfac9d.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/tracing/guide/apm_investigator/end_user_impact.a390ebdbaa7a5ff12a074f21cecfac9d.png?auto=format&fit=max&w=850&dpr=2 2x"
   alt="End user impact" /%}

### Root cause{% #root-cause %}

The investigator consolidates findings from all analysis steps (latency bottleneck, recent changes, and tag analysis) to generate a root cause hypothesis. For example, "a deployment of this downstream service introduced the latency increase".

The APM Investigator helps reduce **Mean Time to Resolution (MTTR)** by accelerating issue diagnosis and response through automated trace and change data analysis.

## Further reading{% #further-reading %}

- [Datadog APM](https://docs.datadoghq.com/tracing.md)
