---
title: LLM Observability Metrics
description: Learn about useful metrics you can generate from LLM Observability data.
breadcrumbs: Docs > LLM Observability > Monitoring > LLM Observability Metrics
---

# LLM Observability Metrics

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com, us2.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site.md). ({% placeholder "user-datadog-site-name" /%}).
{% /alert %}

{% /callout %}

After you instrument your application with LLM Observability, you can access LLM Observability metrics for use in dashboards and monitors. These metrics capture span counts, error counts, token usage, and latency measures for your LLM applications. These metrics are calculated based on 100% of the application's traffic.

{% alert level="info" %}
The `ml_obs.*` entries on this page are [Datadog Metrics](https://docs.datadoghq.com/metrics.md): numerical values that describe an aspect of your LLM application over time, derived from your LLM spans (counts, distributions of cost, tokens, latency, errors). They are 100%-sampled, follow standard [Datadog metric retention](https://docs.datadoghq.com/developers/guide/data-collection-resolution-retention.md) (15 months at full granularity), and are queryable from dashboards, monitors, and notebooks like any other Datadog metric.  They are distinct from two other things in LLM Observability:
- **Per-span operational data** (cost, tokens, latency, errors on each individual trace or span): the raw values these metrics roll up from. Stored with spans, follow [LLM Observability trace retention](https://docs.datadoghq.com/llm_observability/setup.md#data-retention), and are queried from the Traces explorer rather than as metrics.
- **[Evaluation scores](https://docs.datadoghq.com/llm_observability/evaluations.md)** (also called "evals"): quality and safety judgments (for example, hallucination, faithfulness, custom LLM-as-a-judge) attached to individual spans or experiment rows. These are not derived from operational telemetry, and follow LLM Observability trace and experiment retention rather than Datadog metric retention.

{% /alert %}

{% alert level="info" %}
Other tags set on spans are not available as tags on LLM Observability metrics.
{% /alert %}

### Span metrics{% #span-metrics %}

| Metric Name            | Description                                | Metric Type  | Tags                                                                                        |
| ---------------------- | ------------------------------------------ | ------------ | ------------------------------------------------------------------------------------------- |
| `ml_obs.span`          | Total number of spans with a span kind     | Count        | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `span_kind`, `version` |
| `ml_obs.span.duration` | Total duration of spans in seconds         | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `span_kind`, `version` |
| `ml_obs.span.error`    | Number of errors that occurred in the span | Count        | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `span_kind`, `version` |

### LLM token metrics{% #llm-token-metrics %}

| Metric Name                                | Description                                                                       | Metric Type  | Tags                                                                                                                           |
| ------------------------------------------ | --------------------------------------------------------------------------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------ |
| `ml_obs.span.llm.input.tokens`             | Number of tokens in the input sent to the LLM                                     | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.output.tokens`            | Number of tokens in the output                                                    | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.output.reasoning.tokens`  | Number of reasoning tokens in the output                                          | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.prompt.tokens`            | Number of tokens used in the prompt                                               | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.completion.tokens`        | Tokens generated as a completion during the span                                  | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.total.tokens`             | Total tokens consumed during the span (input + output + prompt)                   | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.input.cache_write.tokens` | Number of input tokens written to the prompt cache in an LLM span                 | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.input.cache_read.tokens`  | Number of input tokens served from the prompt cache in an LLM span                | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.input.non_cached.tokens`  | Number of input tokens that did not interact with the prompt cache in an LLM span | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.input.characters`         | Number of characters in the input sent to the LLM                                 | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.output.characters`        | Number of characters in the output                                                | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `matched_model_name`, `matched_model_provider` |

### Embedding metrics{% #embedding-metrics %}

| Metric Name                          | Description                                             | Metric Type  | Tags                                                                                                                           |
| ------------------------------------ | ------------------------------------------------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------ |
| `ml_obs.span.embedding.input.tokens` | Number of input tokens used for generating an embedding | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `matched_model_name`, `matched_model_provider` |

### LLM cost metrics{% #llm-cost-metrics %}

{% alert level="info" %}
The unit for estimated cost metrics for LLM Observability is **nanodollars**.
{% /alert %}

| Metric Name                              | Description                                      | Metric Type  | Tags                                                                                                                                     |
| ---------------------------------------- | ------------------------------------------------ | ------------ | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `ml_obs.span.llm.input.cost`             | Estimated input cost in an LLM span              | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `source`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.embedding.input.cost`       | Estimated input cost in an embedding span        | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `source`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.output.reasoning.cost`  | Estimated reasoning output cost in an LLM span   | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `source`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.output.cost`            | Estimated output cost in an LLM span             | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `source`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.total.cost`             | Estimated total cost in an LLM or embedding span | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `source`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.input.cache_write.cost` | Estimated cache write input cost in an LLM span  | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `source`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.input.cache_read.cost`  | Estimated cache read input cost in an LLM span   | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `source`, `matched_model_name`, `matched_model_provider` |
| `ml_obs.span.llm.input.non_cached.cost`  | Estimated non cached input cost in an LLM span   | Distribution | `env`, `error`, `ml_app`, `model_name`, `model_provider`, `service`, `version`, `source`, `matched_model_name`, `matched_model_provider` |

### Trace metrics{% #trace-metrics %}

| Metric Name             | Description                                     | Metric Type  | Tags                                                        |
| ----------------------- | ----------------------------------------------- | ------------ | ----------------------------------------------------------- |
| `ml_obs.trace`          | Number of traces                                | Count        | `env`, `error`, `ml_app`, `service`, `span_kind`, `version` |
| `ml_obs.trace.duration` | Total duration of all traces across all spans   | Distribution | `env`, `error`, `ml_app`, `service`, `span_kind`, `version` |
| `ml_obs.trace.error`    | Number of errors that occurred during the trace | Count        | `env`, `error`, `ml_app`, `service`, `span_kind`, `version` |

### Estimated usage metrics{% #estimated-usage-metrics %}

| Metric Name                               | Description                           | Metric Type  | Tags                                                                        |
| ----------------------------------------- | ------------------------------------- | ------------ | --------------------------------------------------------------------------- |
| `ml_obs.estimated_usage.llm.input.tokens` | Estimated number of input tokens used | Distribution | `evaluation_name`, `ml_app`, `model_name`, `model_provider`, `model_server` |

### Deprecated metrics{% #deprecated-metrics %}

{% alert level="warning" %}
The following metrics are deprecated, and are maintained only for backward compatibility. Datadog strongly recommends using non-deprecated token metrics for all token usage measurement use cases.
{% /alert %}

| Metric Name                                | Description                                  | Metric Type  | Tags                                                                        |
| ------------------------------------------ | -------------------------------------------- | ------------ | --------------------------------------------------------------------------- |
| `ml_obs.estimated_usage.llm.output.tokens` | Estimated number of output tokens generated  | Distribution | `evaluation_name`, `ml_app`, `model_name`, `model_provider`, `model_server` |
| `ml_obs.estimated_usage.llm.total.tokens`  | Total estimated tokens (input + output) used | Distribution | `evaluation_name`, `ml_app`, `model_name`, `model_provider`, `model_server` |

## Next steps{% #next-steps %}

- [Create a dashboard to track and correlate LLM Observability metrics](https://docs.datadoghq.com/dashboards.md)
- [Create a monitor for alerts and notifications](https://docs.datadoghq.com/monitors/create.md)
 
## Further Reading{% #further-reading %}

- [Learn more about LLM Observability](https://docs.datadoghq.com/llm_observability.md)
- [Create and manage monitors to notify your teams when it matters.](https://docs.datadoghq.com/monitors.md)
- [Track, compare, and optimize your LLM prompts with Datadog LLM Observability](https://www.datadoghq.com/blog/llm-prompt-tracking)
