This product is not supported for your selected
Datadog site. (
).
Agent Observability automatically calculates an estimated cost for each LLM request, using providers’ public pricing models and token counts annotated on LLM/embedding spans.
By aggregating this information across traces and applications, you can gain insights into the user patterns of your LLM models and their impact on overall spending.
Use cases:
- See and understand where LLM spend is coming from, at the model, request, and application levels
- Track changes in token usage and cost over time to proactively guard against higher bills in the future
- Correlate LLM cost with overall application performance, model versions, model providers, and prompt details in a single view
Setting up cost monitoring
Datadog provides two ways to monitor your LLM costs:
Automatic
If your LLM requests involve any of the listed supported providers, Datadog automatically calculates the cost of each request based on the following:
- Token counts attached to the LLM/embedding span, provided by either auto-instrumentation or manual user annotation
- The model provider’s public pricing rates
Manual
To manually supply cost information, follow the instrumentation steps described in the SDK Reference or API. When setting your costs up manually (e.g. setting total_cost), the unit must be in dollar units; however, the unit will be stored as nanodollars.
If you provide partial cost information, Datadog tries to estimate missing information. For example, if you supply a total cost but not input/output cost values, Datadog uses provider pricing and token values annotated on your span to compute the input/output cost values. This can cause a mismatch between your manually provided total cost and the sum of Datadog’s computed input/output costs. Datadog always displays your provided total cost as-is, even if these values differ.
How token counts are calculated
The following relationships apply to token usage fields:
When all sub-components of a token field are provided, Datadog automatically calculates the parent token field, so it does not need to be sent separately.
For example:
input_tokens can be inferred from non_cached_input_tokens, cache_read_input_tokens, and cache_write_input_tokenstotal_tokens can be inferred from input_tokens and output_tokens
Datadog recommends providing the full cache token breakdown whenever cache usage information is available. Datadog automatically applies provider-specific cache read and cache write pricing rates when calculating cached input costs.
If only input_tokens is provided, Datadog treats all input tokens as non-cached input tokens. Providing only a partial cache breakdown may result in inaccurate token counts or cost discrepancies.
Supported providers
Datadog automatically calculates the cost of LLM requests made to the following supported providers using the publicly available pricing information from their official documentation.
Datadog only supports monitoring costs for text-based models.
Datadog supports estimated costs for 800+ models, from OpenAI, Hugging Face, Gemini, Anthropic to models served by OpenRouter.
Metrics
You can find cost metrics in Agent Observability Metrics. The unit for Agent Observability estimated cost metrics is nanodollars.
The cost metrics include a source tag to indicate where the value originated:
source:auto — automatically calculatedsource:user — manually provided
Datadog’s LLM cost and token metrics ship with OOTB tags such as model_name, model_provider, and ml_app. To break down LLM spend by attributes specific to your application — for example team, customer tier or feature, you can promote a subset of your span tags as custom tags on the generated cost and token metrics.
Custom tags apply only to LLM cost (ml_obs.span.* cost) and tokens (ml_obs.span.* tokens) metrics. Other span metrics (such as ml_obs.span.duration) are not affected.
How it works
- Tag your LLM spans with the attributes you want to break spend down by, using the SDK’s
tags parameter (or auto-instrumentation) - In the same annotation, mark which of those tag keys should propagate to cost metrics by passing the keys to the
cost_tags (Python) or costTags (Node.js) parameter. See the SDK Reference for the full instrumentation syntax and timing rules - Once propagated, those tag keys appear as tags on the LLM cost and token metrics. You can use them anywhere in Datadog that consumes these metrics — dashboards, monitors, notebooks, and trace search
Only configure tags with bounded values (such as team, customer_tier, or feature). Tags with unbounded or high-cardinality values (such as user IDs or request IDs) are not fully supported and may be truncated or omitted from the resulting metrics.
Use case: Filter and group spend by an application attribute
Use custom tags to break down spend by attributes that aren’t part of the OOTB metric tags. Build a dashboard widget of ml_obs.span.llm.total.cost grouped by team to see which teams’ LLM usage is driving consumption over time, and configure a metric monitor on the same query to alert independently for each team (or customer tier, feature, environment) when usage crosses a threshold.
Use case: Track prompt caching effectiveness
Many LLM providers cache reused prompt prefixes to reduce repeated processing. Tagging spans with a prompt identifiers such as “prompt_version” or “prompt_template_id,” which are then broken down by theese tags, allows you to:
- Compare prompt variants by how effectively they reuse provider caching. For example, tracking the cache hit rate using the formula
ml_obs.span.llm.input.cache_read.tokens / ml_obs.span.llm.input.cache_write.tokens - Identify prompts that populate the cache but rarely reuse it, which is usually a sign of unstable prefixes or aggressive invalidation
- Spot prompts that miss caching entirely because the prefix is too short to qualify or changes with every call.
- Alert on caching regressions in token metrics before they result in increased costs
View costs in Agent Observability
View your app in Agent Observability and select Cost on the left. The Cost view features:
- A high-level overview of your LLM usage over time including Total Cost, Cost Change, Total Tokens, and Token Change
- Breakdown by Token Type: A breakdown of token usage, along with associated costs
- Breakdown by Provider/Model or Prompt ID/Version: Cost and token usage broken down by LLM provider and model, or by individual prompts or prompt versions (powered by Prompt Tracking)
- Most Expensive LLM Calls: A list of your most expensive requests
Cost data is also available within your application’s traces and spans, allowing you to understand cost at both the request (trace) and operation level (span).
Click any trace or span to open a detailed side-panel view that includes cost metrics for the full trace and for each individual LLM call.
At the top of the trace view, the banner shows aggregated cost information for the full trace, including estimated cost and total tokens. Hovering over these values reveals a breakdown of input and output token/costs.
Selecting an individual LLM span shows similar cost metrics specific to that LLM request.
To query cost-related data in Traces page, use the left side Cost facets.
Alternatively, query the following span attributes directly:
@metrics.input_tokens / @metrics.estimated_input_cost@metrics.output_tokens / @metrics.estimated_output_cost@metrics.total_tokens / @metrics.estimated_total_cost@metrics.non_cached_input_tokens / @metrics.estimated_non_cached_input_cost@metrics.cache_read_input_tokens / @metrics.estimated_cache_read_input_cost@metrics.cache_write_input_tokens / @metrics.estimated_cache_write_input_cost@metrics.reasoning_output_tokens / @metrics.estimated_reasoning_output_cost
Troubleshoot LLM cost monitoring
If LLM cost values are missing from a trace or appear inaccurate, see the following sections for common causes and fixes.
Trace shows “PARTIAL COST” or “COST UNAVAILABLE”
Each LLM span in a trace is priced independently. The warning appears when one or more LLM spans in the trace are missing a cost estimate:
- PARTIAL COST: Datadog computed costs for some LLM spans in the trace, but not the rest. The Estimated Cost value reflects the sum of only the spans for which cost was computed.
- COST UNAVAILABLE: Datadog could not compute cost for any LLM span in the trace.
Hover over the warning to see the number of affected spans, the reason cost is missing, and the names of the affected spans. The reasons fall into one of the following categories.
Unsupported model provider
The LLM model provider on the span is not in the list of supported providers, so Datadog has no public pricing rates to apply.
How to fix:
Unsupported model name
The LLM provider is supported, but the model_name on the span does not match any entry in the provider’s pricing catalog. This usually happens when the model name is misspelled, abbreviated, or uses an internal alias.
How to fix:
- Update the span’s
model_name to match the official name in the provider’s pricing catalog. - For privately deployed or customized variants of a supported model, manually supply cost values on the span. See the SDK Reference or API.
Missing token counts
The span has no input_tokens or output_tokens annotation, so Datadog has no token values for cost calculation. (For embedding spans, only input_tokens is required.)
How to fix:
- For auto-instrumentation, confirm your provider and SDK version are supported.
- For manual instrumentation, annotate input and output token counts on the span. See the SDK Reference or API.
Inaccurate cost when prompt caching is enabled
If a span includes only aggregate input_tokens and output_tokens, but is missing cache_read_input_tokens, cache_write_input_tokens, or non_cached_input_tokens, Datadog applies the standard input rate to the entire input_tokens.
For providers that support prompt caching, cache reads, cache writes, and non-cached inputs are billed at different rates. Without this breakdown, the input cost calculation may be inaccurate.
How to fix:
- For auto-instrumentation, verify that your provider and SDK version emit cache-specific token counts.
- For manual instrumentation, annotate
cache_read_input_tokens, cache_write_input_tokens, and non_cached_input_tokens in addition to input_tokens. See the SDK Reference or API.