LLM Observability

Overview

With LLM Observability, you can monitor, troubleshoot, and evaluate your LLM-powered applications, such as chatbots. You can investigate the root cause of issues, monitor operational performance, and evaluate the quality, privacy, and safety of your LLM applications.

Each request fulfilled by your application is represented as a trace on the LLM Observability page in Datadog.

A list of prompt-response pair traces on the LLM Observability page

A trace can represent:

An individual LLM inference, including tokens, error information, and latency
A predetermined LLM workflow, which is a grouping of LLM calls and their contextual operations, such as tool calls or preprocessing steps
A dynamic LLM workflow executed by an LLM agent

Each trace contains spans representing each choice made by an agent or each step of a given workflow. A given trace can also include input and output, latency, privacy issues, errors, and more. For more information, see Terms and Concepts.

Troubleshoot with end-to-end tracing

View every step of your LLM application chains and calls to pinpoint problematic requests and identify the root cause of errors.

Errors that occurred in a trace on the Errors tab in a trace side panel

Monitor operational metrics and optimize cost

Monitor the cost, latency, performance, and usage trends for all your LLM applications with out-of-the-box dashboards.

The out-of-the-box LLM Observability Operational Insights dashboard in Datadog

Evaluate the quality and effectiveness of your LLM applications

Identify problematic clusters and monitor the quality of responses over time with topical clustering and checks like sentiment, failure to answer, and so on.

The box packing layout displays clusters of traces represented by colored circles, and includes a panel listing clusters with topics, trace counts, and failure rates.

Safeguard sensitive data and identify malicious users

Automatically scan and redact any sensitive data in your AI applications and identify prompt injections, among other evaluations.

An example of a prompt-injection attempt detected by LLM Observability

See anomalies highlighted as insights

LLM Observability Insights provides a monitoring experience that helps users identify anomalies in their operational metrics—such as duration and error rate—and their out-of-the-box (OOTB) evaluations.

Outlier detection is performed across key dimensions:

Span name
Workflow type
Cluster input/output topics

These outliers are analyzed over the past week and automatically surfaced in the corresponding time window selected by the user. This enables teams to proactively detect regressions, performance drifts, or unexpected behavior in their LLM applications.

An 'Insights' banner across the top of the LLM Observability Monitor page. The banner displays 10 insights and has a View Insights button that leads to a side panel with further details.

Use integrations with LLM Observability

The LLM Observability SDK for Python integrates with frameworks such as OpenAI, LangChain, AWS Bedrock, and Anthropic. It automatically traces and annotate LLM calls, capturing latency, errors, and token usage metrics—without code changes.

Datadog offers a variety of artificial intelligence (AI) and machine learning (ML) capabilities. The AI/ML integrations on the Integrations page and the Datadog Marketplace are platform-wide Datadog functionalities.

For example, APM offers a native integration with OpenAI for monitoring your OpenAI usage, while Infrastructure Monitoring offers an integration with NVIDIA DCGM Exporter for monitoring compute-intensive AI workloads. These integrations are different from the LLM Observability offering.

For more information, see the Auto Instrumentation documentation.