---
title: LLM Observability MCP Tools
description: >-
  Connect AI agents to your LLM Observability traces and experiments using the
  Datadog MCP Server.
breadcrumbs: Docs > LLM Observability > LLM Observability MCP Tools
---

# LLM Observability MCP Tools

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site). ().
{% /alert %}

{% /callout %}

## Overview{% #overview %}

The [Datadog MCP Server](https://docs.datadoghq.com/bits_ai/mcp_server/setup/) enables AI agents to access your [LLM Observability](https://docs.datadoghq.com/llm_observability/) data through the Model Context Protocol (MCP). The `llmobs` toolset provides tools for searching and analyzing traces, inspecting span details and content, and evaluating experiment results directly from AI-powered clients like Cursor, Claude Code, or OpenAI Codex.

## Agent skills{% #agent-skills %}

A set of agent skills that make use of these MCP endpoints can be found in the [datadog-labs/agent-skills](https://github.com/datadog-labs/agent-skills) repo. These help automate some of the manual work associated with the below use cases.

## Use cases{% #use-cases %}

The LLM Observability MCP tools enable AI-assisted workflows for:

- **Debugging agent execution**: Search for traces by ML app, error status, or custom tags, then examine span hierarchies and content to identify failures.
- **Analyzing trace structure**: Visualize the full span tree of a trace to understand how agents, LLMs, tools, and retrievals interact.
- **Investigating agent loops**: Review an agent's step-by-step execution loop to understand decision-making and tool invocation patterns.
- **Evaluating experiments**: Get summary statistics for experiment metrics, compare results across dimension segments, and inspect individual events.
- **Discovering experiment patterns**: Filter and sort experiment events by metric performance to find the best and worst-performing cases.

## Available tools{% #available-tools %}

The `llmobs` toolset includes the following tools:

### Trace and span tools{% #trace-and-span-tools %}

{% dl %}

{% dt %}
`search_llmobs_spans`
{% /dt %}

{% dd %}
Search for spans matching filters or a raw query.
{% /dd %}

{% dt %}
`get_llmobs_trace`
{% /dt %}

{% dd %}
Get the full structure of a trace as a span hierarchy tree, including span counts by kind, error indicators, and total duration.
{% /dd %}

{% dt %}
`get_llmobs_span_details`
{% /dt %}

{% dd %}
Get detailed metadata for one or more spans, including timing, error info, LLM details (model, token counts), metrics, and evaluations.
{% /dd %}

{% dt %}
`get_llmobs_span_content`
{% /dt %}

{% dd %}
Retrieve the actual content of a span field (input, output, messages, documents, or metadata) with optional JSONPath extraction.
{% /dd %}

{% dt %}
`find_llmobs_error_spans`
{% /dt %}

{% dd %}
Find all error spans in a trace with propagation context, grouped by span kind with error messages and stack traces.
{% /dd %}

{% dt %}
`expand_llmobs_spans`
{% /dt %}

{% dd %}
Load children of specific spans for progressive tree exploration when `get_llmobs_trace` returns collapsed nodes.
{% /dd %}

{% dt %}
`get_llmobs_agent_loop`
{% /dt %}

{% dd %}
Get a chronological view of an agent's execution loop, showing each step (LLM calls, tool invocations, decisions) in order.
{% /dd %}

{% /dl %}

### Experiment tools{% #experiment-tools %}

{% dl %}

{% dt %}
`get_llmobs_experiment_summary`
{% /dt %}

{% dd %}
Get a high-level experiment summary with pre-computed statistics for all evaluation metrics. Start here before using other experiment tools.
{% /dd %}

{% dt %}
`list_llmobs_experiment_events`
{% /dt %}

{% dd %}
List experiment events with filtering by dimension or metric and sorting by metric value.
{% /dd %}

{% dt %}
`get_llmobs_experiment_event`
{% /dt %}

{% dd %}
Get full details for a single experiment event, including input, output, expected output, all metrics, and dimensions.
{% /dd %}

{% dt %}
`get_llmobs_experiment_metric_values`
{% /dt %}

{% dd %}
Get statistical analysis for a specific evaluation metric, optionally segmented by a dimension for comparison.
{% /dd %}

{% dt %}
`get_llmobs_experiment_dimension_values`
{% /dt %}

{% dd %}
Get unique values for a dimension with counts, useful for discovering valid filter and segment values.
{% /dd %}

{% /dl %}

## Recommended workflows{% #recommended-workflows %}

### Trace analysis{% #trace-analysis %}

1. **Search**: Use `search_llmobs_spans` to find traces by ML app, status, span kind, or custom tags.
1. **Visualize**: Use `get_llmobs_trace` to see the full span hierarchy tree.
1. **Inspect**: Use `get_llmobs_span_details` to get metadata, timing, and evaluations for specific spans.
1. **Read content**: Use `get_llmobs_span_content` to retrieve the actual I/O, messages, or documents.
1. **Debug errors**: Use `find_llmobs_error_spans` to locate all errors in a trace with propagation context.
1. **Expand**: Use `expand_llmobs_spans` to load children of collapsed spans for deeper exploration.
1. **Agent review**: Use `get_llmobs_agent_loop` to see the step-by-step execution flow of an agent span.

### Experiment analysis{% #experiment-analysis %}

1. **Summarize**: Use `get_llmobs_experiment_summary` to get overall statistics and discover available metrics and dimensions.
1. **Browse events**: Use `list_llmobs_experiment_events` to find events of interest, filtering by dimension or sorting by metric.
1. **Inspect events**: Use `get_llmobs_experiment_event` to view full details for a specific event.
1. **Analyze metrics**: Use `get_llmobs_experiment_metric_values` to get percentile distributions, true/false rates, or compare across dimension segments.
1. **Discover dimensions**: Use `get_llmobs_experiment_dimension_values` to find valid filter and segment values.

## Example prompts{% #example-prompts %}

After you are connected, try prompts like:

- Review error traces for my `customer-support-bot` app over the past week. Summarize the most common failure patterns, how often they occur, and recommend which ones to fix first.
- Find traces where my agent's responses were flagged by evaluations as low quality. Look at the inputs and outputs, then suggest specific changes to my system prompt to improve response quality.
- Look at recent agent traces for my app and find cases where the agent looped more than necessary. Analyze the decision-making at each step and suggest how to improve my tool descriptions to reduce unnecessary tool calls.
- A user reported a bad response. Here's the trace ID: `trace-123`. Walk me through exactly what happened: what the user asked, what the agent did at each step, and where things went wrong. Suggest a code fix.
- Analyze experiment `exp-456` and generate a markdown table of the worst-performing dimensions broken down by evaluation scores. Include any other relevant columns that help me understand where and why performance is degrading.
- Compare experiment `exp-123` (baseline) against experiment `exp-456`. Summarize what improved, what regressed, and by how much. Give me a recommendation on whether the changes are worth shipping.
- Summarize experiment `exp-456` and identify the top 5 lowest-scoring events. For each, show the input, output, and which evaluations failed.

## Combine with other Datadog tools{% #combine-with-other-datadog-tools %}

The `core` toolset included in the setup URL gives your AI agent access to additional Datadog tools that pair naturally with LLM Observability analysis.

### Export analysis to Datadog Notebooks{% #export-analysis-to-datadog-notebooks %}

The `core` toolset includes `create_datadog_notebook` and `edit_datadog_notebook`, which let your AI agent create [Datadog Notebooks](https://docs.datadoghq.com/notebooks/) directly from analysis results. You can export findings from agent chats into a collaborative, shareable notebook that lives in Datadog alongside your traces and experiments.

Try prompts like:

- Analyze experiment `exp-456`, identify the worst-performing dimensions, and export a summary report to a Datadog Notebook with a breakdown by evaluation scores.
- Review error traces for my `customer-support-bot` over the past week and create a Datadog Notebook with the findings, including common failure patterns and recommended fixes.

For custom visualizations that go beyond standard Datadog widgets, like comparison charts or quadrant plots, Notebooks also render [Mermaid diagrams](https://docs.datadoghq.com/notebooks/guide/build_diagrams_with_mermaidjs/) natively. Try prompts like:

- Analyze experiment `exp-456`, compare the `accuracy` scores across each prompt version, and export the results to a Datadog Notebook that includes a Mermaid bar chart of the average score for each version.
- Analyze experiment `exp-456` and export a Datadog Notebook that plots each prompt version on a Mermaid quadrant chart with `relevance` on one axis and `accuracy` on the other. Identify which versions are underperforming on both dimensions.

## Setup{% #setup %}

To use the LLM Observability tools, connect to the Datadog MCP Server with the `llmobs` toolset enabled.

{% alert level="info" %}
For full setup instructions, including Cursor and VS Code extension configuration, see [Set Up the Datadog MCP Server](https://docs.datadoghq.com/bits_ai/mcp_server/setup/).
{% /alert %}

Add the `toolsets=llmobs,core` query parameter to the MCP Server endpoint for your [Datadog site](https://docs.datadoghq.com/getting_started/site/):

```text
https://mcp.<DD_SITE>/api/unstable/mcp-server/mcp?toolsets=llmobs,core
```

For example:

- **US1**: `https://mcp.datadoghq.com/api/unstable/mcp-server/mcp?toolsets=llmobs,core`
- **EU1**: `https://mcp.datadoghq.eu/api/unstable/mcp-server/mcp?toolsets=llmobs,core`

{% tab title="Remote authentication" %}
This method uses the MCP specification's [Streamable HTTP](https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#streamable-http) transport.

**Claude Code** (command line):

```bash
claude mcp add --transport http datadog-mcp "https://mcp.datadoghq.com/api/unstable/mcp-server/mcp?toolsets=llmobs,core"
```

**Codex CLI** (`~/.codex/config.toml`):

```toml
[mcp_servers.datadog]
url = "https://mcp.datadoghq.com/api/unstable/mcp-server/mcp?toolsets=llmobs,core"
```

After adding the configuration, run `codex mcp login datadog` to complete the OAuth flow.

**Gemini CLI, Kiro CLI, or other MCP-compatible clients**:

```json
{
  "mcpServers": {
    "datadog": {
      "type": "http",
      "url": "https://mcp.datadoghq.com/api/unstable/mcp-server/mcp?toolsets=llmobs,core"
    }
  }
}
```

{% /tab %}

{% tab title="Local binary authentication" %}
This method uses the MCP specification's [stdio](https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#stdio) transport. Use this if direct remote authentication is not available for you.

1. Install the Datadog MCP Server binary:

   ```bash
   curl -sSL https://coterm.datadoghq.com/mcp-cli/install.sh | bash
   ```

This installs the binary to `~/.local/bin/datadog_mcp_cli`.

1. Run `datadog_mcp_cli login` to complete the OAuth login flow.

1. Configure your AI client. For Claude Code, add the following code to `~/.claude.json`, making sure to replace `<USERNAME>` in the command path:

   ```json
   {
     "mcpServers": {
       "datadog": {
         "type": "stdio",
         "command": "/Users/<USERNAME>/.local/bin/datadog_mcp_cli",
         "args": [],
         "env": {}
       }
     }
   }
   ```

Or run:

   ```bash
   claude mcp add datadog --scope user -- ~/.local/bin/datadog_mcp_cli
   ```

{% /tab %}

### Authentication{% #authentication %}

The MCP Server uses OAuth 2.0 for authentication. If you cannot go through the OAuth flow, provide a Datadog [API key and application key](https://docs.datadoghq.com/account_management/api-app-keys/) as `DD_API_KEY` and `DD_APPLICATION_KEY` HTTP headers:

```json
{
  "mcpServers": {
    "datadog": {
      "type": "http",
      "url": "https://mcp.datadoghq.com/api/unstable/mcp-server/mcp?toolsets=llmobs,core",
      "headers": {
          "DD_API_KEY": "<YOUR_API_KEY>",
          "DD_APPLICATION_KEY": "<YOUR_APPLICATION_KEY>"
      }
    }
  }
}
```

For security, use a scoped API key and application key from a [service account](https://docs.datadoghq.com/account_management/org_settings/service_accounts/) that has only the required permissions.

## Further reading{% #further-reading %}

- [Datadog MCP Server](https://docs.datadoghq.com/bits_ai/mcp_server)
- [Set up and use LLM Observability Experiments](https://docs.datadoghq.com/llm_observability/experiments)
- [Monitor your application with LLM Observability](https://docs.datadoghq.com/llm_observability/monitoring)
