For AI agents: A markdown version of this page is available at https://docs.datadoghq.com/llm_observability/instrumentation/api.md.
A documentation index is available at /llms.txt.
This product is not supported for your selected Datadog site. ().
Overview
The Agent Observability HTTP API provides an interface for developers to send LLM-related traces and spans to Datadog. If your application is written in Python, Node.js, or Java, you can use the Agent Observability SDKs.
The API accepts spans with timestamps no more than 24 hours old, allowing limited backfill of delayed data.
Spans API
Use this endpoint to send spans to Datadog. For details on the available kinds of spans, see Span Kinds.
{"data":{"type":"span","attributes":{"ml_app":"weather-bot","session_id":"1","tags":["service:weather-bot","env:staging","user_handle:example-user@example.com","user_id:1234"],"spans":[{"parent_id":"undefined","trace_id":"<TEST_TRACE_ID>","span_id":"<AGENT_SPAN_ID>","name":"health_coach_agent","meta":{"kind":"agent","input":{"value":"What is the weather like today and do i wear a jacket?"},"output":{"value":"It's very hot and sunny, there is no need for a jacket"}},"start_ns":1713889389104152000,"duration":10000000000},{"parent_id":"<AGENT_SPAN_ID>","trace_id":"<TEST_TRACE_ID>","span_id":"<WORKFLOW_ID>","name":"qa_workflow","meta":{"kind":"workflow","input":{"value":"What is the weather like today and do i wear a jacket?"},"output":{"value":"It's very hot and sunny, there is no need for a jacket"}},"start_ns":1713889389104152000,"duration":5000000000},{"parent_id":"<WORKFLOW_SPAN_ID>","trace_id":"<TEST_TRACE_ID>","span_id":"<LLM_SPAN_ID>","name":"generate_response","meta":{"kind":"llm","input":{"messages":[{"role":"system","content":"Your role is to ..."},{"role":"user","content":"What is the weather like today and do i wear a jacket?"}]},"output":{"messages":[{"content":"It's very hot and sunny, there is no need for a jacket","role":"assistant"}]}},"start_ns":1713889389104152000,"duration":2000000000}]}}}
Response
If the request is successful, the API responds with a 202 network code and an empty body.
API standards
Error
Field
Type
Description
message
string
The error message.
stack
string
The stack trace.
type
string
The error type.
IO
Field
Type
Description
value
string
Input or output value. If not set, this value is inferred from messages or documents.
Unique identifier matching the corresponding tool call.
type
string
The type of tool result.
ToolDefinition
Field
Type
Description
name
string
The name of the tool.
description
string
A description of what the tool does.
schema
Dict[key (string), value]
The schema defining the tool’s parameters.
SpanField
Field
Type
Description
kind
string
The kind of span field.
Prompt
Agent Observability registers new versions of templates when the template or chat_template value is updated. If the input is expected to change between invocations, extract the dynamic parts into a variable.
Field
Type
Description
id
string
Logical identifier for this prompt template. Should be unique per ml_app.
name
string
Human-readable name for the prompt.
version
string
Version tag for the prompt (for example, “1.0.0”). If not provided, Agent Observability automatically generates a version by computing a hash of the template content.
template
string
Single string template form. Use placeholder syntax (like {{variable_name}}) to embed variables. This should not be set with chat_template.
Multi-message template form. Use placeholder syntax (like {{variable_name}}) to embed variables in message content. This should not be set with template.
variables
Dict[key (string), string]
Variables used to render the template. Keys correspond to placeholder names in the template.
query_variable_keys
[string]
Variable keys that contain the user query. Used for hallucination detection.
context_variable_keys
[string]
Variable keys that contain ground-truth or context content. Used for hallucination detection.
tags
Dict[key (string), string]
Tags to attach to the prompt run.
{"id":"translation-prompt","chat_template":[{"role":"system","content":"You are a translation service. You translate to {{language}}."},{"role":"user","content":"{{user_input}}"}],"variables":{"language":"french","user_input":"<USER_INPUT_TEXT>"}}
Meta
Field
Type
Description
kind [required]
string
The span kind: "agent", "workflow", "llm", "tool", "task", "embedding", or "retrieval".
A dictionary of metrics to collect for the span. The keys are metric names (strings) and values are metric values (float64 pointers). Common metrics include:
input_tokens - The number of input tokens (LLM spans)
output_tokens - The number of output tokens (LLM spans)
total_tokens - The total number of tokens (LLM spans)
non_cached_input_tokens - The number of non-cached input tokens (LLM spans)
cache_read_input_tokens - The number of cache read input tokens (LLM spans)
cache_write_input_tokens - The number of cache write input tokens (LLM spans)
reasoning_output_tokens - The number of reasoning tokens (LLM spans)
time_to_first_token - Time in seconds for first output token (streaming LLM, root spans)
time_per_output_token - Time in seconds per output token (streaming LLM, root spans)
input_cost - Input cost in dollars (LLM and embedding spans)
output_cost - Output cost in dollars (LLM spans)
total_cost - Total cost in dollars (LLM spans)
non_cached_input_cost - Non-cached input cost in dollars (LLM spans)
cache_read_input_cost - Cache read input cost in dollars (LLM spans)
cache_write_input_cost - Cache write input cost in dollars (LLM spans)
reasoning_output_cost- Reasoning output cost in dollars (LLM spans)
Type: Dict[key (string), float64]
Span
Field
Type
Description
name [required]
string
The name of the span.
span_id [required]
string
An ID unique to the span.
trace_id [required]
string
A unique ID shared by all spans in the same trace.
parent_id [required]
string
ID of the span’s direct parent. If the span is a root span, the parent_id must be undefined.
The session the list of spans belongs to. Can be overridden or set on individual spans as well.
Tag
Tags should be formatted as a list of strings (for example, ["user_handle:dog@gmail.com", "app_version:1.0.0"]). They are meant to store contextual information surrounding the span.
Your application name (the value of DD_LLMOBS_ML_APP) must be a lowercase Unicode string. It may contain the characters listed below:
Alphanumerics
Underscores
Minuses
Colons
Periods
Slashes
The name can be up to 193 characters long and may not contain contiguous or trailing underscores.
Evaluations API
For comprehensive examples and guidance on building custom evaluators, see the Evaluation Developer Guide.
Use this endpoint to send evaluations to Datadog at the span, trace, or session level.
Endpoint
https://api./api/intake/llm-obs/v2/eval-metric
Method
POST
Use the eval_scope field to set the granularity of the evaluation:
span (default): The evaluation is associated with a specific span. Use join_on to identify the target span with a tag key-value pair or a span ID and trace ID combination.
trace: The evaluation is associated with an entire trace. Use join_on to identify the root span of the trace.
session: The evaluation is associated with a session. Provide session_id instead of join_on.
{"data":{"type":"evaluation_metric","attributes":{"metrics":[{"eval_scope":"span","join_on":{"span":{"span_id":"20245611112024561111","trace_id":"13932955089405749200"}},"ml_app":"weather-bot","timestamp_ms":1609459200,"metric_type":"categorical","label":"Sentiment","categorical_value":"Positive"},{"eval_scope":"trace","join_on":{"span":{"span_id":"20245611112024561111","trace_id":"13932955089405749200"}},"ml_app":"weather-bot","timestamp_ms":1609479200,"metric_type":"score","label":"Accuracy","score_value":3,"assessment":"fail","reasoning":"The response provided incorrect information about the weather forecast."},{"eval_scope":"session","session_id":"abc123def456","ml_app":"weather-bot","timestamp_ms":1609479200,"metric_type":"boolean","label":"Topic Relevancy","boolean_value":true},{"eval_scope":"span","join_on":{"tag":{"key":"msg_id","value":"1123132"}},"ml_app":"weather-bot","timestamp_ms":1609479200,"metric_type":"json","label":"Custom Evaluation","json_value":{"verdict":"pass","confidence":0.95,"is_valid":true,"metrics":{"accuracy":0.92,"precision":0.88},"passed_checks":["coherence","relevance","factuality"]}}]}}}
{"data":{"type":"evaluation_metric","id":"456f4567-e89b-12d3-a456-426655440000","attributes":{"metrics":[{"id":"d4f36434-f0cd-47fc-884d-6996cee26da4","eval_scope":"span","join_on":{"span":{"span_id":"20245611112024561111","trace_id":"13932955089405749200"}},"ml_app":"weather-bot","timestamp_ms":1609459200,"metric_type":"categorical","label":"Sentiment","categorical_value":"Positive"},{"id":"cdfc4fc7-e2f6-4149-9c35-edc4bbf7b525","eval_scope":"trace","join_on":{"span":{"span_id":"20245611112024561111","trace_id":"13932955089405749200"}},"ml_app":"weather-bot","timestamp_ms":1609479200,"metric_type":"score","label":"Accuracy","score_value":3,"assessment":"fail","reasoning":"The response provided incorrect information about the weather forecast."},{"id":"haz3fc7-g3p2-1s37-8m12-ndk4hbf7a522","eval_scope":"session","session_id":"abc123def456","ml_app":"weather-bot","timestamp_ms":1609479200,"metric_type":"boolean","label":"Topic Relevancy","boolean_value":true},{"id":"abc1234-h4i5-6j78-9k01-lmn2opq3rst4","eval_scope":"span","join_on":{"tag":{"key":"msg_id","value":"1123132"}},"ml_app":"weather-bot","timestamp_ms":1609479200,"metric_type":"json","label":"Custom Evaluation","json_value":{"verdict":"pass","confidence":0.95,"is_valid":true,"metrics":{"accuracy":0.92,"precision":0.88},"passed_checks":["coherence","relevance","factuality"]}}]}}}
Uniquely identifies the span associated with this evaluation using a tag key-value pair.
SpanContext
Field
Type
Description
span_id [required]
string
The span ID of the span that this evaluation is associated with. Must be a decimal string (for example, "20245611112024561111"). If your instrumentation produces hexadecimal span IDs (such as OpenTelemetry), convert them to decimal before submitting.
trace_id [required]
string
The trace ID of the span that this evaluation is associated with. Must be a decimal string (for example, "13932955089405749200") or a 32-character lowercase hexadecimal string for 128-bit trace IDs.
TagContext
Field
Type
Description
key [required]
string
The tag key name. This must be the same key used when setting the tag on the span.
value [required]
string
The tag value. This value must match exactly one span with the specified tag key/value pair.
EvalMetricsRequestData
Field
Type
Description
type [required]
string
Identifier for the request. Set to evaluation_metric.