For AI agents: A markdown version of this page is available at https://docs.datadoghq.com/llm_observability/instrumentation/api.md. A documentation index is available at /llms.txt.
This product is not supported for your selected Datadog site. ().

Overview

The Agent Observability HTTP API provides an interface for developers to send LLM-related traces and spans to Datadog. If your application is written in Python, Node.js, or Java, you can use the Agent Observability SDKs.

The API accepts spans with timestamps no more than 24 hours old, allowing limited backfill of delayed data.

Spans API

Use this endpoint to send spans to Datadog. For details on the available kinds of spans, see Span Kinds.

Endpoint
https://api./api/intake/llm-obs/v1/trace/spans
Method
POST

Request

Headers (required)

  • DD-API-KEY=<YOUR_DATADOG_API_KEY>
  • Content-Type="application/json"

Body data (required)

FieldTypeDescription
data [required]SpansRequestDataEntry point into the request body.
{
  "data": {
    "type": "span",
    "attributes": {
      "ml_app": "weather-bot",
      "session_id": "1",
      "feedback_join_key": "weather-request-123",
      "tags": [
        "service:weather-bot",
        "env:staging",
        "user_handle:example-user@example.com",
        "user_id:1234"
      ],
      "spans": [
        {
          "parent_id": "undefined",
          "trace_id": "<TEST_TRACE_ID>",
          "span_id": "<AGENT_SPAN_ID>",
          "name": "health_coach_agent",
          "meta": {
            "kind": "agent",
            "input": {
              "value": "What is the weather like today and do i wear a jacket?"
            },
            "output": {
              "value": "It's very hot and sunny, there is no need for a jacket"
            }
          },
          "start_ns": 1713889389104152000,
          "duration": 10000000000
        },
        {
          "parent_id": "<AGENT_SPAN_ID>",
          "trace_id": "<TEST_TRACE_ID>",
          "span_id": "<WORKFLOW_ID>",
          "name": "qa_workflow",
          "meta": {
            "kind": "workflow",
            "input": {
              "value": "What is the weather like today and do i wear a jacket?"
            },
            "output": {
              "value":  "It's very hot and sunny, there is no need for a jacket"
            }
          },
          "start_ns": 1713889389104152000,
          "duration": 5000000000
        },
        {
          "parent_id": "<WORKFLOW_SPAN_ID>",
          "trace_id": "<TEST_TRACE_ID>",
          "span_id": "<LLM_SPAN_ID>",
          "name": "generate_response",
          "meta": {
            "kind": "llm",
            "input": {
              "messages": [
                {
                  "role": "system",
                  "content": "Your role is to ..."
                },
                {
                  "role": "user",
                  "content": "What is the weather like today and do i wear a jacket?"
                }
              ]
            },
            "output": {
              "messages": [
                {
                  "content": "It's very hot and sunny, there is no need for a jacket",
                  "role": "assistant"
                }
              ]
            }
          },
          "start_ns": 1713889389104152000,
          "duration": 2000000000
        }
      ]
    }
  }
}

Response

If the request is successful, the API responds with a 202 network code and an empty body.

API standards

Error

FieldTypeDescription
messagestringThe error message.
stackstringThe stack trace.
typestringThe error type.

IO

FieldTypeDescription
valuestringInput or output value. If not set, this value is inferred from messages or documents.
messages[Message]List of messages. Use only for LLM spans.
documents[Document]List of documents. Use only as output for retrieval spans.
promptPromptStructured prompt metadata that includes the template and variables used for the LLM input. This should only be used for input IO on LLM spans.
embedding[float]List of embedding values.
parametersDict[key (string), value]Additional parameters for the input or output.

Note: When only input.messages is set for an LLM span, Datadog infers input.value from input.messages and uses the following inference logic:

  1. If a message with role=user exists, the content of the last message is used as input.value.
  2. If a user role message is not present, input.value is inferred by concatenating the content fields of all messages, regardless of their roles.

Message

FieldTypeDescription
content [required]stringThe body of the message.
rolestringThe role of the entity.
tool_calls[ToolCall]List of tool calls made in this message.
tool_results[ToolResult]List of tool execution results in this message.

Document

FieldTypeDescription
textstringThe text of the document.
namestringThe name of the document.
scorefloatThe score associated with this document.
idstringThe id of this document.
rankingintegerThe ranking of this document.
metadataDict[key (string), value]Additional metadata for this document.

ToolCall

FieldTypeDescription
namestringThe name of the tool being called.
argumentsDict[key (string), value]The arguments passed to the tool.
tool_idstringUnique identifier for this tool call.
typestringThe type of tool call.

ToolResult

FieldTypeDescription
namestringThe name of the tool that was called.
resultstringThe result returned by the tool.
tool_idstringUnique identifier matching the corresponding tool call.
typestringThe type of tool result.

ToolDefinition

FieldTypeDescription
namestringThe name of the tool.
descriptionstringA description of what the tool does.
schemaDict[key (string), value]The schema defining the tool’s parameters.

SpanField

FieldTypeDescription
kindstringThe kind of span field.

Prompt

Agent Observability registers new versions of templates when the template or chat_template value is updated. If the input is expected to change between invocations, extract the dynamic parts into a variable.
FieldTypeDescription
idstringLogical identifier for this prompt template. Should be unique per ml_app.
namestringHuman-readable name for the prompt.
versionstringVersion tag for the prompt (for example, “1.0.0”). If not provided, Agent Observability automatically generates a version by computing a hash of the template content.
templatestringSingle string template form. Use placeholder syntax (like {{variable_name}}) to embed variables. This should not be set with chat_template.
chat_template[Message]Multi-message template form. Use placeholder syntax (like {{variable_name}}) to embed variables in message content. This should not be set with template.
variablesDict[key (string), string]Variables used to render the template. Keys correspond to placeholder names in the template.
query_variable_keys[string]Variable keys that contain the user query. Used for hallucination detection.
context_variable_keys[string]Variable keys that contain ground-truth or context content. Used for hallucination detection.
tagsDict[key (string), string]Tags to attach to the prompt run.
{
  "id": "translation-prompt",
  "chat_template": [
    {
      "role": "system",
      "content": "You are a translation service. You translate to {{language}}."
    }, {
      "role": "user",
      "content": "{{user_input}}"
    }
  ],
  "variables": {
    "language": "french",
    "user_input": "<USER_INPUT_TEXT>"
  }
}

Meta

FieldTypeDescription
kind [required]stringThe span kind: "agent", "workflow", "llm", "tool", "task", "embedding", or "retrieval".
errorErrorError information on the span.
inputIOThe span’s input information.
outputIOThe span’s output information.
metadataDict[key (string), value] where the value is a float, bool, or stringData about the span that is not input or output related. For example, you can pass temperature and max_tokens for LLM spans.
model_namestringThe name of the model used for LLM spans.
model_providerstringThe provider of the model used for LLM spans.
model_versionstringThe version of the model used for LLM spans.
embedding_for_prompt_idxintegerThe prompt index for which embeddings were computed.
spanSpanFieldSpan field information.
tool_definitions[ToolDefinition]List of available tool definitions.
expected_outputIOThe expected output information.
intentstringThe intent of the span.

Metrics

A dictionary of metrics to collect for the span. The keys are metric names (strings) and values are metric values (float64 pointers). Common metrics include:

  • input_tokens - The number of input tokens (LLM spans)
  • output_tokens - The number of output tokens (LLM spans)
  • total_tokens - The total number of tokens (LLM spans)
  • non_cached_input_tokens - The number of non-cached input tokens (LLM spans)
  • cache_read_input_tokens - The number of cache read input tokens (LLM spans)
  • cache_write_input_tokens - The number of cache write input tokens (LLM spans)
  • reasoning_output_tokens - The number of reasoning tokens (LLM spans)
  • time_to_first_token - Time in seconds for first output token (streaming LLM, root spans)
  • time_per_output_token - Time in seconds per output token (streaming LLM, root spans)
  • input_cost - Input cost in dollars (LLM and embedding spans)
  • output_cost - Output cost in dollars (LLM spans)
  • total_cost - Total cost in dollars (LLM spans)
  • non_cached_input_cost - Non-cached input cost in dollars (LLM spans)
  • cache_read_input_cost - Cache read input cost in dollars (LLM spans)
  • cache_write_input_cost - Cache write input cost in dollars (LLM spans)
  • reasoning_output_cost- Reasoning output cost in dollars (LLM spans)

Type: Dict[key (string), float64]

Span

FieldTypeDescription
name [required]stringThe name of the span.
span_id [required]stringAn ID unique to the span.
trace_id [required]stringA unique ID shared by all spans in the same trace.
parent_id [required]stringID of the span’s direct parent. If the span is a root span, the parent_id must be undefined.
start_ns [required]uint64The span’s start time in nanoseconds.
duration [required]float64The span’s duration in nanoseconds.
meta [required]MetaThe core content relative to the span.
statusstringError status ("ok" or "error"). Defaults to "ok".
apm_trace_idstringThe ID of the associated APM trace. Defaults to match the trace_id field.
metricsDict[key (string), float64]Datadog metrics to collect. See Metrics for common metric names.
session_idstringThe span’s session_id. Overrides the top-level session_id field.
feedback_join_keystringA customer-defined key used to connect feedback to this span. Overrides the top-level feedback_join_key field. For details, see End-User Feedback.
tags[Tag]A list of tags to apply to this particular span.
servicestringThe service name.
ml_appstringThe LLM application name for this span. Overrides the top-level ml_app field.

SpansRequestData

FieldTypeDescription
type [required]stringIdentifier for the request. Set to span.
attributes [required]SpansPayloadThe body of the request.

SpansPayload

FieldTypeDescription
ml_app [required]stringThe name of your LLM application. See Application naming guidelines.
spans [required][Span]A list of spans.
tags[Tag]A list of top-level tags to apply to each span.
session_idstringThe session the list of spans belongs to. Can be overridden or set on individual spans as well.
feedback_join_keystringA customer-defined key used to connect feedback to the spans in the payload. Can be overridden or set on individual spans as well. For details, see End-User Feedback.

Tag

Tags should be formatted as a list of strings (for example, ["user_handle:dog@gmail.com", "app_version:1.0.0"]). They are meant to store contextual information surrounding the span.

For more information about tags, see Getting Started with Tags.

Application naming guidelines

Your application name (the value of DD_LLMOBS_ML_APP) must be a lowercase Unicode string. It may contain the characters listed below:

  • Alphanumerics
  • Underscores
  • Minuses
  • Colons
  • Periods
  • Slashes

The name can be up to 193 characters long and may not contain contiguous or trailing underscores.

Evaluations API

For comprehensive examples and guidance on building custom evaluators, see the Evaluation Developer Guide.

Use this endpoint to send evaluations and end-user feedback to Datadog. Evaluations can be associated with spans, traces, or sessions. End-user feedback can be associated with spans, traces, sessions, or a customer-defined feedback join key.

Endpoint
https://api./api/intake/llm-obs/v2/eval-metric
Method
POST

Use the eval_scope field to set the granularity of an evaluation:

  • span (default): The evaluation is associated with a specific span. Use join_on to identify the target span with a tag key-value pair or a span ID and trace ID combination.
  • trace: The evaluation is associated with an entire trace. Use join_on to identify the root span of the trace.
  • session: The evaluation is associated with a session. Provide session_id instead of join_on.

To submit feedback, set event_kind to feedback. Feedback events must include submitter.id, omit join_on, and provide exactly one target field: span_id, trace_id, session_id, or feedback_join_key. If eval_scope is omitted, Datadog infers it from the target field.

Use feedback_join_key when feedback applies to an external entity, such as an incident ID, report ID, task ID, or release check ID, instead of a single span, trace, or session. To make the feedback appear with related telemetry, set the same feedback_join_key on the related spans when you submit them with the Spans API or by adding a feedback_join_key:incident-1234 tag through Enriching spans.

To create dashboard widgets from feedback, create the widget as you would for an evaluation and add the filter @event_kind:feedback.

Support for filtering spans, traces, or sessions by feedback is not available. For example, you cannot yet filter traces to only traces with thumbs-down feedback. Use dashboards scoped to @event_kind:feedback instead.

Request

Headers (required)

  • DD-API-KEY=<YOUR_DATADOG_API_KEY>
  • Content-Type="application/json"

Body data (required)

FieldTypeDescription
data [required]EvalMetricsRequestDataEntry point into the request body.
{
  "data": {
    "type": "evaluation_metric",
    "attributes": {
      "metrics": [
        {
          "eval_scope": "span",
          "join_on": {
            "span": {
              "span_id": "20245611112024561111",
              "trace_id": "13932955089405749200"
            }
          },
          "ml_app": "weather-bot",
          "timestamp_ms": 1609459200,
          "metric_type": "categorical",
          "label": "Sentiment",
          "categorical_value": "Positive"
        },
        {
          "eval_scope": "trace",
          "join_on": {
            "span": {
              "span_id": "20245611112024561111",
              "trace_id": "13932955089405749200"
            }
          },
          "ml_app": "weather-bot",
          "timestamp_ms": 1609479200,
          "metric_type": "score",
          "label": "Accuracy",
          "score_value": 3,
          "assessment": "fail",
          "reasoning": "The response provided incorrect information about the weather forecast."
        },
        {
          "eval_scope": "session",
          "session_id": "abc123def456",
          "ml_app": "weather-bot",
          "timestamp_ms": 1609479200,
          "metric_type": "boolean",
          "label": "Topic Relevancy",
          "boolean_value": true
        },
        {
          "eval_scope": "span",
          "join_on": {
            "tag": {
              "key": "msg_id",
              "value": "1123132"
            }
          },
          "ml_app": "weather-bot",
          "timestamp_ms": 1609479200,
          "metric_type": "json",
          "label": "Custom Evaluation",
          "json_value": {
            "verdict": "pass",
            "confidence": 0.95,
            "is_valid": true,
            "metrics": {
              "accuracy": 0.92,
              "precision": 0.88
            },
            "passed_checks": ["coherence", "relevance", "factuality"]
          }
        },
        {
          "event_kind": "feedback",
          "feedback_join_key": "weather-request-123",
          "ml_app": "weather-bot",
          "timestamp_ms": 1765990800016,
          "metric_type": "text",
          "label": "user_comment",
          "text_value": "The response did not answer whether I needed a jacket.",
          "assessment": "fail",
          "submitter": {
            "id": "user-123",
            "type": "user"
          }
        }
      ]
    }
  }
}

Response

FieldTypeDescriptionGuaranteed
IDstringResponse UUID generated upon submission.Yes
metrics[EvalMetric]A list of evaluations or feedback events.Yes
{
  "data": {
    "type": "evaluation_metric",
    "id": "456f4567-e89b-12d3-a456-426655440000",
    "attributes": {
      "metrics": [
        {
          "id": "d4f36434-f0cd-47fc-884d-6996cee26da4",
          "eval_scope": "span",
          "join_on": {
            "span": {
              "span_id": "20245611112024561111",
              "trace_id": "13932955089405749200"
            }
          },
          "ml_app": "weather-bot",
          "timestamp_ms": 1609459200,
          "metric_type": "categorical",
          "label": "Sentiment",
          "categorical_value": "Positive"
        },
        {
          "id": "cdfc4fc7-e2f6-4149-9c35-edc4bbf7b525",
          "eval_scope": "trace",
          "join_on": {
            "span": {
              "span_id": "20245611112024561111",
              "trace_id": "13932955089405749200"
            }
          },
          "ml_app": "weather-bot",
          "timestamp_ms": 1609479200,
          "metric_type": "score",
          "label": "Accuracy",
          "score_value": 3,
          "assessment": "fail",
          "reasoning": "The response provided incorrect information about the weather forecast."
        },
        {
          "id": "haz3fc7-g3p2-1s37-8m12-ndk4hbf7a522",
          "eval_scope": "session",
          "session_id": "abc123def456",
          "ml_app": "weather-bot",
          "timestamp_ms": 1609479200,
          "metric_type": "boolean",
          "label": "Topic Relevancy",
          "boolean_value": true
        },
        {
          "id": "abc1234-h4i5-6j78-9k01-lmn2opq3rst4",
          "eval_scope": "span",
          "join_on": {
            "tag": {
              "key": "msg_id",
              "value": "1123132"
            }
          },
          "ml_app": "weather-bot",
          "timestamp_ms": 1609479200,
          "metric_type": "json",
          "label": "Custom Evaluation",
          "json_value": {
            "verdict": "pass",
            "confidence": 0.95,
            "is_valid": true,
            "metrics": {
              "accuracy": 0.92,
              "precision": 0.88
            },
            "passed_checks": ["coherence", "relevance", "factuality"]
          }
        },
        {
          "id": "fedbk34-h4i5-6j78-9k01-lmn2opq3rst4",
          "event_kind": "feedback",
          "eval_scope": "external",
          "feedback_join_key": "weather-request-123",
          "ml_app": "weather-bot",
          "timestamp_ms": 1765990800016,
          "metric_type": "text",
          "label": "user_comment",
          "text_value": "The response did not answer whether I needed a jacket.",
          "assessment": "fail",
          "submitter": {
            "id": "user-123",
            "type": "user"
          }
        }
      ]
    }
  }
}

API standards

Attributes

FieldTypeDescription
metrics [required][EvalMetric]A list of evaluations or feedback events.
tags[Tag]A list of tags to apply to all the evaluations or feedback events in the payload.

EvalMetric

FieldTypeDescription
IDstringEvaluation metric UUID (generated upon submission).
event_kindstringThe kind of event. Accepted values are "evaluation" and "feedback". Defaults to "evaluation" when omitted.
eval_scopestringThe granularity of the event: "span" (default for evaluations), "trace", "session", or "external" for feedback targeted by feedback_join_key. For feedback, this can be omitted and inferred from the target field.
join_on [required for span and trace scope evaluations][JoinOn]How an evaluation is joined to a span or trace. Required for evaluations when eval_scope is "span" or "trace". Must be absent for feedback and for session evaluations.
span_idstringFor feedback, the ID of the span the feedback is associated with. Use this as one of the feedback target fields.
trace_idstringFor feedback, the ID of the trace the feedback is associated with. Use this as one of the feedback target fields.
session_id [required for session scope evaluations]stringThe session ID the event is associated with. Required for evaluations when eval_scope is "session". For feedback, use this as one of the feedback target fields. Must be absent when non-feedback eval_scope is "span" or "trace".
feedback_join_keystringFor feedback, a customer-defined key for feedback that applies to an external entity instead of a single span, trace, or session. Must be absent for evaluations.
submitter [required for feedback]SubmitterThe user, agent, or other entity that submitted the feedback.
timestamp_ms [required]int64A UTC UNIX timestamp in milliseconds representing the time the request was sent.
ml_app [required]stringThe name of your LLM application. See Application naming guidelines.
metric_type [required]stringThe value type: "categorical", "score", "boolean", "json", or "text". The "text" type is supported only for feedback events.
label [required]stringThe unique name or label for the provided evaluation or feedback.
categorical_value [required if the metric_type is “categorical”]stringA string representing the category value.
score_value [required if the metric_type is “score”]numberA score value.
boolean_value [required if the metric_type is “boolean”]booleanA boolean value.
json_value [required if the metric_type is “json”]Dict[key (string), value]A JSON object value.
text_value [required if the metric_type is “text”]stringA text value. This is supported only for feedback events and is useful for free-text feedback.
assessmentstringAn assessment of this evaluation. Accepted values are pass and fail.
reasoningstringA text explanation of the evaluation result.
tags[Tag]A list of tags to apply to this particular evaluation metric.

For feedback events, provide exactly one of span_id, trace_id, session_id, or feedback_join_key. If you provide eval_scope, it must match the target field: span_id maps to "span", trace_id maps to "trace", session_id maps to "session", and feedback_join_key maps to "external".

Submitter

FieldTypeDescription
id [required]stringIdentifier for the user, agent, or other entity that submitted the feedback.
typestringCategory for the submitter. Recommended values are user and agent.

JoinOn

FieldTypeDescription
span[SpanContext]Uniquely identifies the span associated with this evaluation using span ID & trace ID.
tag[TagContext]Uniquely identifies the span associated with this evaluation using a tag key-value pair.

SpanContext

FieldTypeDescription
span_id [required]stringThe span ID of the span that this evaluation is associated with. Must be a decimal string (for example, "20245611112024561111"). If your instrumentation produces hexadecimal span IDs (such as OpenTelemetry), convert them to decimal before submitting.
trace_id [required]stringThe trace ID of the span that this evaluation is associated with. Must be a decimal string (for example, "13932955089405749200") or a 32-character lowercase hexadecimal string for 128-bit trace IDs.

TagContext

FieldTypeDescription
key [required]stringThe tag key name. This must be the same key used when setting the tag on the span.
value [required]stringThe tag value. This value must match exactly one span with the specified tag key/value pair.

EvalMetricsRequestData

FieldTypeDescription
type [required]stringIdentifier for the request. Set to evaluation_metric.
attributes [required][Attributes]The body of the request.

Further Reading