LLM Observability API

LLM Observability is not available in the US1-FED site.

LLM Observability is in public beta, and this API is subject to change. If changes occur, Datadog will provide release notes with any applicable upgrade instructions.

Overview

The LLM Observability API provides an interface for developers to send LLM-related traces and spans to Datadog. If your application is written in Python, you can use the LLM Observability SDK for Python.

Spans API

Use this endpoint to send spans to Datadog. For details on the available kinds of spans, see Span Kinds.

Endpoint
https://api./api/unstable/llm-obs/v1/trace/spans
Method
POST

Request

Headers (required)

  • DD-API-KEY=<YOUR_DATADOG_API_KEY>
  • Content-Type="application/json"

Body data (required)

FieldTypeDescription
data [required]SpansRequestDataEntry point into the request body.
{
  "data": {
    "type": "span",
    "attributes": {
      "ml_app": "weather-bot",
      "session_id": "1",
      "tags": [
        "service:weather-bot",
        "env:staging",
        "user_handle:example-user@example.com",
        "user_id:1234"
      ],
      "spans": [
        {
          "parent_id": "undefined",
          "trace_id": "<TEST_TRACE_ID>",
          "span_id": "<AGENT_SPAN_ID>",
          "name": "health_coach_agent",
          "meta": {
            "kind": "agent",
            "input": {
              "value": "What is the weather like today and do i wear a jacket?"
            },
            "output": {
              "value": "It's very hot and sunny, there is no need for a jacket"
            }
          },
          "start_ns": 1713889389104152000,
          "duration": 10000000000
        },
        {
          "parent_id": "<AGENT_SPAN_ID>",
          "trace_id": "<TEST_TRACE_ID>",
          "span_id": "<WORKFLOW_ID>",
          "name": "qa_workflow",
          "meta": {
            "kind": "workflow",
            "input": {
              "value": "What is the weather like today and do i wear a jacket?"
            },
            "output": {
              "value":  "It's very hot and sunny, there is no need for a jacket"
            }
          },
          "start_ns": 1713889389104152000,
          "duration": 5000000000
        },
        {
          "parent_id": "<WORKFLOW_SPAN_ID>",
          "trace_id": "<TEST_TRACE_ID>",
          "span_id": "<LLM_SPAN_ID>",
          "name": "generate_response",
          "meta": {
            "kind": "llm",
            "input": {
              "messages": [
                {
                  "role": "system",
                  "content": "Your role is to ..."
                },
                {
                  "role": "user",
                  "content": "What is the weather like today and do i wear a jacket?"
                }
              ]
            },
            "output": {
              "messages": [
                {
                  "content": "It's very hot and sunny, there is no need for a jacket",
                  "role": "assistant"
                }
              ]
            }
          },
          "start_ns": 1713889389104152000,
          "duration": 2000000000
        }
      ]
    }
  }
}

Response

If the request is successful, the API responds with a 202 network code and an empty body.

API standards

Error

FieldTypeDescription
messagestringThe error message.
stackstringThe stack trace.
typestringThe error type.

IO

FieldTypeDescription
valuestringInput or output value. This should be used for all spans except for LLM spans.
messages[Message]List of messages. This should only be used for LLM spans.

Message

FieldTypeDescription
content [required]stringThe body of the message.
rolestringThe role of the entity.

Meta

FieldTypeDescription
kind [required]stringThe span kind: "agent", "workflow", "llm", "tool", "task", "embedding", or "retrieval".
errorErrorError information on the span.
inputIOThe span’s input information.
outputIOThe span’s output information.
metadataDict[key (string), value] where the value is a float, bool, or stringData about the span that is not input or output related. Use the following metadata keys for LLM spans: temperature, max_tokens, model_name, and model_provider.

Metrics

FieldTypeDescription
prompt_tokensfloat64The number of prompt tokens. Only valid for LLM spans.
completion_tokensfloat64The number of completion tokens. Only valid for LLM spans.
total_tokensfloat64The total number of tokens associated with the span. Only valid for LLM spans.
time_to_first_tokenfloat64The time in seconds it takes for the first output token to be returned in streaming-based LLM applications. Set for root spans.
time_per_output_tokenfloat64The time in seconds it takes for the per output token to be returned in streaming-based LLM applications. Set for root spans.

Span

FieldTypeDescription
name [required]stringThe name of the span.
span_id [required]stringAn ID unique to the span.
trace_id [required]stringA unique ID shared by all spans in the same trace.
parent_id [required]stringID of the span’s direct parent. If the span is a root span, the parent_id must be undefined.
start_ns [required]uint64The span’s start time in nanoseconds.
duration [required]float64The span’s duration in nanoseconds.
meta [required]MetaThe core content relative to the span.
statusstringError status ("ok" or "error"). Defaults to "ok".
metricsMetricsDatadog metrics to collect.
session_idstringThe span’s session_id. Overrides the top-level session_id field.
tags[Tag]A list of tags to apply to this particular span.

SpansRequestData

FieldTypeDescription
type [required]stringIdentifier for the request. Set to span.
attributes [required]SpansPayloadThe body of the request.

SpansPayload

FieldTypeDescription
ml_app [required]stringThe name of your LLM application. See Application naming guidelines.
spans [required][Span]A list of spans.
tags[Tag]A list of top-level tags to apply to each span.
session_idstringThe session the list of spans belongs to. Can be overridden or set on individual spans as well.

Tag

Tags should be formatted as a list of strings (for example, ["user_handle:dog@gmail.com", "app_version:1.0.0"]). They are meant to store contextual information surrounding the span.

For more information about tags, see Getting Started with Tags.

Application naming guidelines

Your application name (the value of ml_app) must start with a letter. It may contain the characters listed below:

  • Alphanumerics
  • Underscores
  • Minuses
  • Colons
  • Periods
  • Slashes

The name can be up to 200 characters long and contain Unicode letters (which includes most character sets, including languages such as Japanese).

Evaluations API

Use this endpoint to send evaluations associated with a given span to Datadog.

Endpoint
https://api./api/unstable/llm-obs/v1/eval-metric
Method
POST

Evaluations require a span_id and trace_id.

  • If you are not using the LLM Observability SDK, send the span_id and trace_id that you used to create your target span.
  • If you are using the LLM Observability SDK, obtain the span_id and trace_id by finding your target span, and accessing the root_span.span_id and the root_span.trace_id attributes.

Request

Headers (required)

  • DD-API-KEY=<YOUR_DATADOG_API_KEY>
  • Content-Type="application/json"

Body data (required)

FieldTypeDescription
data [required]EvalMetricsRequestDataEntry point into the request body.
{
  "data": {
    "type": "evaluation_metric",
    "attributes": {
      "metrics": [
        {
          "span_id": "61399242116139924211",
          "trace_id": "13932955089405749200",
          "timestamp": 1609459200,
          "metric_type": "categorical",
          "label": "Sentiment",
          "categorical_value": "Positive"
        },
        {
          "span_id": "20245611112024561111",
          "trace_id": "13932955089405749200",
          "metric_type": "score",
          "label": "Accuracy",
          "score_value": 3
        }
      ]
    }
  }
}

Response

FieldTypeDescriptionGuaranteed
IDstringResponse UUID generated upon submission.Yes
metrics[EvalMetric]A list of evaluations.Yes
{
  "data": {
    "type": "evaluation_metric",
    "id": "456f4567-e89b-12d3-a456-426655440000",
    "attributes": {
      "metrics": [
        {
          "id": "d4f36434-f0cd-47fc-884d-6996cee26da4",
          "span_id": "61399242116139924211",
          "trace_id": "13932955089405749200",
          "timestamp": 1609459200,
          "metric_type": "categorical",
          "label": "Sentiment",
          "categorical_value": "Positive"
        },
        {
          "id": "cdfc4fc7-e2f6-4149-9c35-edc4bbf7b525",
          "span_id": "20245611112024561111",
          "trace_id": "13932955089405749200",
          "metric_type": "score",
          "label": "Accuracy",
          "score_value": 3
        }
      ]
    }
  }
}

API standards

Attribute

FieldTypeDescription
metrics [required][EvalMetric]A list of evaluations each associated with a span.

EvalMetric

FieldTypeDescription
IDstringEvaluation metric UUID (generated upon submission).
span_id [required]stringThe ID of the span that this evaluation is associated with.
trace_id [required]stringThe ID of the trace that this evaluation is associated with.
timestampint64A UTC UNIX timestamp representing the time the request was sent.
metric_type [required]stringThe type of evaluation: "categorical" or "score".
label [required]stringThe unique name or label for the provided evaluation .
categorical_value [required if the metric_type is “score”]stringA string representing the category that the evaluation belongs to.
score_value [required if the metric_type is “score”]numberA score value of the evaluation.
flaggedbooleanFlag content as inappropriate or incorrect.
annotationstringA generic string note about the provided evaluation.

EvalMetricsRequestData

FieldTypeDescription
type [required]stringIdentifier for the request. Set to evaluation_metric.
attributes [required][EvalMetric]The body of the request.