Monitoring

Overview

Explore and analyze your LLM applications in production with tools for querying, visualizing, correlating, and investigating data across traces, clusters, and other resources.

Monitor performance, debug issues, evaluate quality, and secure your LLM-powered systems with unified visibility across traces, metrics, and online evaluations.

Real-time performance monitoring

Monitor your LLM application’s operational health with built-in metrics and dashboards:

LLM Observability Operational Insights dashboard, showing various metrics and visualizations. Includes an Overview section with total number of traces and spans, success and error rates, etc.; and an LLM Calls section with a donut chart of model usage,average input and output tokens per call, etc.

Request volume and latency: Track requests per second, response times, and performance bottlenecks across different models, operations, and endpoints.
Error tracking: Monitor HTTP errors, model timeouts, and failed requests with detailed error context.
Token consumption: Track prompt tokens, cached tokens, completion tokens, and total usage to optimize costs.
Model usage analytics: Monitor which models are being called, their frequency, and performance characteristics.

The out-of-the-box LLM Observability Operational Insights dashboard provides consolidated views of trace-level and span-level metrics, error rates, latency breakdowns, token consumption trends, and triggered monitors.

Production debugging and troubleshooting

Debug complex LLM workflows with detailed execution visibility:

Detail view of a trace in LLM Observability, featuring a flame graph that visually represents each service call. 'OpenAI.createResponse' is selected, and a detailed span view is displayed — including input messages and output messages.

End-to-end trace analysis: Visualize complete request flows from user input through model calls, tool calls, and response generation.
Span-level debugging: Examine individual operations within chains, including preprocessing steps, model calls, and post-processing logic.
Identify root cause of errors: Pinpoint failure points in multi-step chains, workflows, or agentic operations with detailed error context and timing information.
Performance bottleneck identification: Find slow operations and optimize based on latency breakdowns across workflow components.

Quality and safety evaluations

Detail view of a span in LLM Observability, Evaluations tab. Displays a Hallucination evaluation with 'Confirmed Contradiction', the flagged output, context quote, and an explanation of why this was flagged.

Ensure your LLM agents or applications meets quality standards with online evaluations. For comprehensive information about Datadog-hosted and managed evaluations, ingesting custom evaluations, and safety monitoring capabilities, see the Evaluations documentation.

Query your LLM application’s traces and spans

LLM Observability > Traces view, where the user has entered the query `ml_app:shopist-chat-v2 'purchase' -'discount' @trace.total_tokens:>=20` and various traces are displayed.

Learn how to use Datadog’s LLM Observability query interface to search, filter, and analyze traces and spans generated by your LLM applications. The Querying documentation covers how to:

Use the search bar to filter traces and spans by attributes such as model, user, or error status.
Apply advanced filters to focus on specific LLM operations or timeframes.
Visualize and inspect trace details to troubleshoot and optimize your LLM workflows.

This enables you to quickly identify issues, monitor performance, and gain insights into your LLM application’s behavior in production.

Correlate APM and LLM Observability

A trace in Datadog APM. The Overview tab displays a section titled LLM Obervability, with a link to view the span in LLM Observability, as well as input and output text.

For applications instrumented with Datadog APM, you can correlate APM and LLM Observability through the SDK. Correlating APM with LLM Observability full end-to-end visibility and thorough analysis, from app issues to LLM-specific root causes.

Cluster Map

The Cluster Map provides a visual overview of how your LLM application’s requests are grouped and related. It helps you identify patterns, clusters of similar activity, and outliers in your LLM traces, making it easier to investigate issues and optimize performance.

Monitor your agentic sytems

Learn how to monitor agentic LLM applications, which use multiple tools or chains of reasoning, with Datadog’s Agent Monitoring. This feature helps you track agent actions, tool usage, and reasoning steps, providing visibility into complex LLM workflows and enabling you to troubleshoot and optimize agentic systems effectively. See the Agent Monitoring documentation for details.