LLM Observability

LLM Observability is in public beta.
By using LLM Observability, you acknowledge that Datadog is authorized to share your Company's data with OpenAI Global, LLC for the purpose of providing and improving LLM Observability.

LLM Observability is not available in the US1-FED site.


LLM Observability overview page with record of all prompt-response pair traces

With LLM Observability, you can monitor, troubleshoot, and evaluate your LLM-powered applications, such as chatbots. You can investigate the root cause of issues, monitor operational performance, and evaluate the quality, privacy, and safety of your LLM applications.

Each request fulfilled by your application is represented as a trace on the LLM Observability traces page in Datadog.

A trace can represent:

  • An individual LLM inference, including tokens, error rates, and latencies
  • A predetermined LLM workflow: a grouping of LLM calls and their contextual operations, such as tool calls or preprocessing steps
  • A dynamic LLM workflow executed by an LLM agent

Each trace contains spans representing each choice made by an agent or each step of a given workflow. A given trace can also include input and output, execution duration, privacy issues, errors, and more.

You can instrument your application with the LLM Observability SDK for Python, or by calling the LLM Observability API.

Getting started

To get started with LLM Observability, you can build a simple example with the Quickstart, or follow the guide for instrumenting your LLM application.

Explore LLM Observability

Troubleshoot with end-to-end tracing

View every step of your LLM application chains and calls to pinpoint problematic requests and identify the root cause of errors.

An LLM Observability trace displaying each span of a request

Monitor operational metrics and optimize cost

Monitor the throughput, latency, and token usage trends for all your LLM applications.

The out-of-the-box LLM Observability dashboard

Evaluate the quality and effectiveness of your LLM applications

Identify problematic clusters and monitor the quality of responses over time with topical clustering and checks like sentiment, failure to answer, and so on.

The clusters page in LLM Observability

Safeguard sensitive data and identify malicious users

Automatically scan and redact any sensitive data in your AI applications and identify prompt injections.

An example of a prompt-injection attempt