LLM Observability SDK Reference

Docs > LLM Observability > LLM Observability Instrumentation > LLM Observability SDK Reference

Overview

Datadog’s LLM Observability SDKs provide automatic instrumentation as well as manual instrumentation APIs to provide observability and insights into your LLM applications.

Setup

Requirements

A Datadog API key.

The latest ddtrace package is installed (Python 3.7+ required):
```
pip install ddtrace
```

The latest dd-trace package is installed (Node.js 16+ required):
```
npm install dd-trace
```

You have downloaded the latest dd-trace-java JAR. The LLM Observability SDK is supported in dd-trace-java v1.51.0+ (Java 8+ required).

Command-line setup

Enable LLM Observability by running your application using the ddtrace-run command and specifying the required environment variables.

Note: ddtrace-run automatically turns on all LLM Observability integrations.

DD_SITE=<YOUR_DATADOG_SITE> DD_API_KEY=<YOUR_API_KEY> DD_LLMOBS_ENABLED=1 \
DD_LLMOBS_ML_APP=<YOUR_ML_APP_NAME> ddtrace-run <YOUR_APP_STARTUP_COMMAND>

Environment variables for command-line setup

DD_SITE: required - string
Destination Datadog site for LLM data submission. Your site is .
DD_LLMOBS_ENABLED: required - integer or string
Toggle to enable submitting data to LLM Observability. Should be set to 1 or true.
DD_LLMOBS_ML_APP: optional - string
The name of your LLM application, service, or project, under which all traces and spans are grouped. This helps distinguish between different applications or experiments. See Application naming guidelines for allowed characters and other constraints. To override this value for a given root span, see Tracing multiple applications. If not provided, this defaults to the value of DD_SERVICE, or the value of a propagated DD_LLMOBS_ML_APP from an upstream service.
Note: Before version ddtrace==3.14.0, this is a required field.
DD_LLMOBS_AGENTLESS_ENABLED: optional - integer or string - default: false
Only required if you are not using the Datadog Agent, in which case this should be set to 1 or true.
DD_API_KEY: optional - string
Your Datadog API key. Only required if you are not using the Datadog Agent.

Enable LLM Observability by running your application with NODE_OPTIONS="--import dd-trace/initialize.mjs" and specifying the required environment variables.

Note: dd-trace/initialize.mjs automatically turns on all APM integrations.

DD_SITE=<YOUR_DATADOG_SITE> DD_API_KEY=<YOUR_API_KEY> DD_LLMOBS_ENABLED=1 \
DD_LLMOBS_ML_APP=<YOUR_ML_APP_NAME> NODE_OPTIONS="--import dd-trace/initialize.mjs" node <YOUR_APP_ENTRYPOINT>

Environment variables for command-line setup

DD_SITE: required - string
The Datadog site to submit your LLM data. Your site is .
DD_LLMOBS_ENABLED: required - integer or string
Toggle to enable submitting data to LLM Observability. Should be set to 1 or true.
DD_LLMOBS_ML_APP: optional - string
The name of your LLM application, service, or project, under which all traces and spans are grouped. This helps distinguish between different applications or experiments. See Application naming guidelines for allowed characters and other constraints. To override this value for a given root span, see Tracing multiple applications. If not provided, this defaults to the value of DD_SERVICE, or the value of a propagated DD_LLMOBS_ML_APP from an upstream service.
Note: Before version dd-trace@5.66.0, this is a required field.
DD_LLMOBS_AGENTLESS_ENABLED: optional - integer or string - default: false
Only required if you are not using the Datadog Agent, in which case this should be set to 1 or true.
DD_API_KEY: optional - string
Your Datadog API key. Only required if you are not using the Datadog Agent.

Enable LLM Observability by running your application with dd-trace-java and specifying the required parameters as environment variables or system properties.

DD_SITE=<YOUR_DATADOG_SITE> DD_API_KEY=<YOUR_API_KEY> \
java -javaagent:path/to/your/dd-trace-java-jar/dd-java-agent-SNAPSHOT.jar \
-Ddd.service=my-app -Ddd.llmobs.enabled=true -Ddd.llmobs.ml.app=my-ml-app -jar path/to/your/app.jar

Environment variables and system properties

You can supply the following parameters as environment variables (for example, DD_LLMOBS_ENABLED) or as Java system properties (for example, dd.llmobs_enabled).

DD_SITE or dd.site: required - string
Destination Datadog site for LLM data submission. Your site is .
DD_LLMOBS_ENABLED or dd.llmobs.enabled: required - integer or string
Toggle to enable submitting data to LLM Observability. Should be set to 1 or true.
DD_LLMOBS_ML_APP or dd.llmobs.ml.app: optional - string
The name of your LLM application, service, or project, under which all traces and spans are grouped. This helps distinguish between different applications or experiments. See Application naming guidelines for allowed characters and other constraints. To override this value for a given root span, see Tracing multiple applications. If not provided, this defaults to the value of DD_SERVICE, or the value of a propagated DD_LLMOBS_ML_APP from an upstream service.
Note: Before version 1.54.0 of dd-trace-java, this is a required field.
DD_LLMOBS_AGENTLESS_ENABLED or dd.llmobs.agentless.enabled: optional - integer or string - default: false
Only required if you are not using the Datadog Agent, in which case this should be set to 1 or true.
DD_API_KEY or dd.api.key: optional - string
Your Datadog API key. Only required if you are not using the Datadog Agent.

In-code setup

Instead of using command-line setup, you can also enable LLM Observability programmatically.

Use the LLMObs.enable() function to enable LLM Observability.

Do not use this setup method with the ddtrace-run command.

from ddtrace.llmobs import LLMObs
LLMObs.enable(
  ml_app="<YOUR_ML_APP_NAME>",
  api_key="<YOUR_DATADOG_API_KEY>",
  site="<YOUR_DATADOG_SITE>",
  agentless_enabled=True,
)

Parameters

ml_app: optional - string
The name of your LLM application, service, or project, under which all traces and spans are grouped. This helps distinguish between different applications or experiments. See Application naming guidelines for allowed characters and other constraints. To override this value for a given trace, see Tracing multiple applications. If not provided, this defaults to the value of DD_LLMOBS_ML_APP.
integrations_enabled - default: true: optional - boolean
A flag to enable automatically tracing LLM calls for Datadog’s supported LLM integrations. If not provided, all supported LLM integrations are enabled by default. To avoid using the LLM integrations, set this value to false.
agentless_enabled: optional - boolean - default: false
Only required if you are not using the Datadog Agent, in which case this should be set to True. This configures the ddtrace library to not send any data that requires the Datadog Agent. If not provided, this defaults to the value of DD_LLMOBS_AGENTLESS_ENABLED.
site: optional - string
The Datadog site to submit your LLM data. Your site is . If not provided, this defaults to the value of DD_SITE.
api_key: optional - string
Your Datadog API key. Only required if you are not using the Datadog Agent. If not provided, this defaults to the value of DD_API_KEY.
env: optional - string
The name of your application’s environment (examples: prod, pre-prod, staging). If not provided, this defaults to the value of DD_ENV.
service: optional - string
The name of the service used for your application. If not provided, this defaults to the value of DD_SERVICE.

Do not use this setup method with the dd-trace/initialize.mjs command.

Use the init() function to enable LLM Observability.

const tracer = require('dd-trace').init({
  llmobs: {
    mlApp: "<YOUR_ML_APP_NAME>",
    agentlessEnabled: true,
  },
  site: "<YOUR_DATADOG_SITE>",
  env: "<YOUR_ENV>",
});

const llmobs = tracer.llmobs;

Options for llmobs configuration

mlApp: optional - string
The name of your LLM application, service, or project, under which all traces and spans are grouped. This helps distinguish between different applications or experiments. See Application naming guidelines for allowed characters and other constraints. To override this value for a given trace, see Tracing multiple applications. If not provided, this defaults to the value of DD_LLMOBS_ML_APP.
agentlessEnabled: optional - boolean - default: false
Only required if you are not using the Datadog Agent, in which case this should be set to true. This configures the dd-trace library to not send any data that requires the Datadog Agent. If not provided, this defaults to the value of DD_LLMOBS_AGENTLESS_ENABLED.

Options for general tracer configuration:

site: optional - string
The Datadog site to submit your LLM data. Your site is . If not provided, this defaults to the value of DD_SITE.
env: optional - string
The name of your application’s environment (examples: prod, pre-prod, staging). If not provided, this defaults to the value of DD_ENV.
service: optional - string
The name of the service used for your application. If not provided, this defaults to the value of DD_SERVICE.

Environment variables

Set the following values as environment variables. They cannot be configured programmatically.

DD_API_KEY: optional - string
Your Datadog API key. Only required if you are not using the Datadog Agent.

AWS Lambda Setup

To instrument an existing AWS Lambda function with LLM Observability, you can use the Datadog Extension and respective language layers.

Open a Cloudshell in the AWS console.
Install the Datadog CLI client

npm install -g @datadog/datadog-ci

Set the Datadog API key and site

export DD_API_KEY=<YOUR_DATADOG_API_KEY>
export DD_SITE=<YOUR_DATADOG_SITE>

If you already have or prefer to use a secret in Secrets Manager, you can set the API key by using the secret ARN:

export DATADOG_API_KEY_SECRET_ARN=<DATADOG_API_KEY_SECRET_ARN>

Install your Lambda function with LLM Observability (this requires at least version 77 of the Datadog Extension layer)

datadog-ci lambda instrument -f <YOUR_LAMBDA_FUNCTION_NAME> -r <AWS_REGION> -v 123 -e 93 --llmobs <YOUR_LLMOBS_ML_APP>

datadog-ci lambda instrument -f <YOUR_LAMBDA_FUNCTION_NAME> -r <AWS_REGION> -v 135 -e 93 --llmobs <YOUR_LLMOBS_ML_APP>

datadog-ci lambda instrument -f <YOUR_LAMBDA_FUNCTION_NAME> -r <AWS_REGION> -v 25 -e 93 --llmobs <YOUR_LLMOBS_ML_APP>

Invoke your Lambda function and verify that LLM Observability traces are visible in the Datadog UI.

Manually flush LLM Observability traces by using the flush method before the Lambda function returns.

from ddtrace.llmobs import LLMObs
def handler():
  # function body
  LLMObs.flush()

import tracer from 'dd-trace';
const llmobs = tracer.llmobs;

export const handler = async (event) => {
  // your function body
  llmobs.flush();
};

After installing the SDK and running your application you should expect to see some data in LLM Observability from auto-instrumentation. Manual instrumentation can be used to capture custom built frameworks or operations from libraries that are not yet supported.

Manual instrumentation

To capture an LLM operation a function decorator can be used to easily instrument workflows:

from ddtrace.llmobs.decorators import workflow

@workflow
def handle_user_request():
    ...

or a context-manager based approach to capture fine-grained operations:

from ddtrace.llmobs import LLMObs

with LLMObs.llm(model="gpt-4o"):
    call_llm()
    LLMObs.annotate(
        metrics={
            "input_tokens": ...,
            "output_tokens": ...,
        },
    )

For a list of available span kinds, see the Span Kinds documentation. For more granular tracing of operations within functions, see Tracing spans using inline methods.

To trace a span, use llmobs.wrap(options, function) as a function wrapper for the function you’d like to trace. For a list of available span kinds, see the Span Kinds documentation. For more granular tracing of operations within functions, see Tracing spans using inline methods.

Span Kinds

Span kinds are required, and are specified on the options object passed to the llmobs tracing functions (trace, wrap, and decorate). See the Span Kinds documentation for a list of supported span kinds.

Note: Spans with an invalid span kind are not submitted to LLM Observability.

Automatic function argument/output/name capturing

llmobs.wrap (along with llmobs.decorate for TypeScript) tries to automatically capture inputs, outputs, and the name of the function being traced. If you need to manually annotate a span, see Enriching spans. Inputs and outputs you annotate will override the automatic capturing. Additionally, to override the function name, pass the name property on the options object to the llmobs.wrap function:

function processMessage () {
  ... // user application logic
  return
}
processMessage = llmobs.wrap({ kind: 'workflow', name: 'differentFunctionName' }, processMessage)

Conditions for finishing a span for a wrapped function

llmobs.wrap extends the underlying behavior of tracer.wrap. The underlying span created when the function is called is finished under the following conditions:

If the function returns a Promise, then the span finishes when the promise is resolved or rejected.
If the function takes a callback as its last parameter, then the span finishes when that callback is called.
If t function doesn’t accept a callback and doesn’t return a Promise, then the span finishes at the end of the function execution.

The following example demonstrates the second condition, where the last argument is a callback:

Example

const express = require('express')
const app = express()

function myAgentMiddleware (req, res, next) {
  const err = ... // user application logic
  // the span for this function is finished when `next` is called
  next(err)
}
myAgentMiddleware = llmobs.wrap({ kind: 'agent' }, myAgentMiddleware)

app.use(myAgentMiddleware)

If the application does not use the callback function, it is recommended to use an inline traced block instead. See Tracing spans using inline methods for more information.

const express = require('express')
const app = express()

function myAgentMiddleware (req, res) {
  // the `next` callback is not being used here
  return llmobs.trace({ kind: 'agent', name: 'myAgentMiddleware' }, () => {
    return res.status(200).send('Hello World!')
  })
}

app.use(myAgentMiddleware)

Starting a span

There are multiple methods to start a span, based on the kind of span that you are starting. See the Span Kinds documentation for a list of supported span kinds.

All spans are started as an object instance of LLMObsSpan. Each span has methods that you can use to interact with the span and record data.

Finishing a span

Spans must be finished for the trace to be submitted and visible in the Datadog app.

To finish a span, call finish() on a span object instance. If possible, wrap the span in a try/finally block to ensure the span is submitted even if an exception occurs.

Example

    try {
        LLMObsSpan workflowSpan = LLMObs.startWorkflowSpan("my-workflow-span-name", "ml-app-override", "session-141");
        // user logic
        // interact with started span
    } finally {
      workflowSpan.finish();
    }

LLM calls

If you are using any LLM providers or frameworks that are supported by Datadog's LLM integrations, you do not need to manually start an LLM span to trace these operations.

To trace an LLM call, use the function decorator ddtrace.llmobs.decorators.llm().

Arguments

model_name: required - string
The name of the invoked LLM.
name: optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
model_provider: optional - string - default: "custom"
The name of the model provider.
Note: To display the estimated cost in US dollars, set model_provider to one of the following values: openai, azure_openai, or anthropic.
session_id: optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
ml_app: optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

from ddtrace.llmobs.decorators import llm

@llm(model_name="claude", name="invoke_llm", model_provider="anthropic")
def llm_call():
    completion = ... # user application logic to invoke LLM
    return completion

To trace an LLM call, specify the span kind as llm, and optionally specify the following arguments on the options object.

Arguments

modelName: optional - string - default: "custom"
The name of the invoked LLM.
name: optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
modelProvider: optional - string - default: "custom"
The name of the model provider.
Note: To display the estimated cost in US dollars, set modelProvider to one of the following values: openai, azure_openai, or anthropic.
sessionId: optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
mlApp: optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

function llmCall () {
  const completion = ... // user application logic to invoke LLM
  return completion
}
llmCall = llmobs.wrap({ kind: 'llm', name: 'invokeLLM', modelName: 'claude', modelProvider: 'anthropic' }, llmCall)

To trace an LLM call, import and call the following method with the arguments listed below:

import datadog.trace.api.llmobs.LLMObs;
LLMObs.startLLMSpan(spanName, modelName, modelProvider, mlApp, sessionID);

Arguments

spanName: optional - String
The name of the operation. If not provided, spanName defaults to the span kind.
modelName: optional - String - default: "custom"
The name of the invoked LLM.
modelProvider: optional - String - default: "custom"
The name of the model provider.
Note: To display the estimated cost in US dollars, set modelProvider to one of the following values: openai, azure_openai, or anthropic.
mlApp: optional - String
The name of the ML application that the operation belongs to. Supplying a non-null value overrides the ML app name supplied at the start of the application. See Tracing multiple applications for more information.
sessionId: optional - String
The ID of the underlying user session. See Tracking user sessions for more information.

Example

import datadog.trace.api.llmobs.LLMObs;

public class MyJavaClass {
  public String invokeModel() {
    LLMObsSpan llmSpan = LLMObs.startLLMSpan("my-llm-span-name", "my-llm-model", "my-company", "maybe-ml-app-override", "session-141");
    String inference = ... // user application logic to invoke LLM
    llmSpan.annotateIO(...); // record the input and output
    llmSpan.finish();
    return inference;
  }
}

Workflows

To trace a workflow span, use the function decorator ddtrace.llmobs.decorators.workflow().

Arguments

name: optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
session_id: optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
ml_app: optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

from ddtrace.llmobs.decorators import workflow

@workflow
def process_message():
    ... # user application logic
    return

To trace a workflow span, specify the span kind as workflow, and optionally specify arguments on the options object.

Arguments

name: optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
sessionId: optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
mlApp: optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

function processMessage () {
  ... // user application logic
  return
}
processMessage = llmobs.wrap({ kind: 'workflow' }, processMessage)

To trace a workflow span, import and call the following method with the arguments listed below:

import datadog.trace.api.llmobs.LLMObs;
LLMObs.startWorkflowSpan(spanName, mlApp, sessionID);

Arguments

spanName: optional - String
The name of the operation. If not provided, spanName defaults to the span kind.
mlApp: optional - String
The name of the ML application that the operation belongs to. Supplying a non-null value overrides the ML app name supplied at the start of the application. See Tracing multiple applications for more information.
sessionId: optional - String
The ID of the underlying user session. See Tracking user sessions for more information.

Example

import datadog.trace.api.llmobs.LLMObs;

public class MyJavaClass {
  public String executeWorkflow() {
    LLMObsSpan workflowSpan = LLMObs.startWorkflowSpan("my-workflow-span-name", null, "session-141");
    String workflowResult = workflowFn(); // user application logic
    workflowSpan.annotateIO(...); // record the input and output
    workflowSpan.finish();
    return workflowResult;
  }
}

Agents

To trace an agent execution, use the function decorator ddtrace.llmobs.decorators.agent().

Arguments

name: optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
session_id: optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
ml_app: optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

from ddtrace.llmobs.decorators import agent

@agent
def react_agent():
    ... # user application logic
    return

To trace an agent execution, specify the span kind as agent, and optionally specify arguments on the options object.

Arguments

name: optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
sessionId: optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
mlApp: optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

function reactAgent () {
  ... // user application logic
  return
}
reactAgent = llmobs.wrap({ kind: 'agent' }, reactAgent)

To trace an agent execution, import and call the following method with the arguments listed below

import datadog.trace.api.llmobs.LLMObs;
LLMObs.startAgentSpan(spanName, mlApp, sessionID);

Arguments

spanName: optional - String
The name of the operation. If not provided, spanName defaults to the name of the traced function.
mlApp: optional - String
The name of the ML application that the operation belongs to. Supplying a non-null value overrides the ML app name supplied at the start of the application. See Tracing multiple applications for more information.
sessionId: optional - String
The ID of the underlying user session. See Tracking user sessions for more information.

Tool calls

To trace a tool call, use the function decorator ddtrace.llmobs.decorators.tool().

Arguments

name: optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
session_id: optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
ml_app: optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

from ddtrace.llmobs.decorators import tool

@tool
def call_weather_api():
    ... # user application logic
    return

To trace a tool call, specify the span kind as tool, and optionally specify arguments on the options object.

Arguments

name: optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
sessionId: optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
mlApp: optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

function callWeatherApi () {
  ... // user application logic
  return
}
callWeatherApi = llmobs.wrap({ kind: 'tool' }, callWeatherApi)

To trace a tool call, import and call the following method with the arguments listed below:

import datadog.trace.api.llmobs.LLMObs;
LLMObs.startToolSpan(spanName, mlApp, sessionID);

Arguments

spanName: optional - String
The name of the operation. If not provided, spanName defaults to the name of the traced function.
mlApp: optional - String
The name of the ML application that the operation belongs to. Supplying a non-null value overrides the ML app name supplied at the start of the application. See Tracing multiple applications for more information.
sessionId: optional - String
The ID of the underlying user session. See Tracking user sessions for more information.

Tasks

To trace a task span, use the function decorator LLMObs.task().

Arguments

name: optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
session_id: optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
ml_app: optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

from ddtrace.llmobs.decorators import task

@task
def sanitize_input():
    ... # user application logic
    return

To trace a task span, specify the span kind as task, and optionally specify arguments on the options object.

Arguments

name: optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
sessionId: optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
mlApp: optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

function sanitizeInput () {
  ... // user application logic
  return
}
sanitizeInput = llmobs.wrap({ kind: 'task' }, sanitizeInput)

To trace a task span, import and call the following method with the arguments listed below:

import datadog.trace.api.llmobs.LLMObs;
LLMObs.startTaskSpan(spanName, mlApp, sessionID);

Arguments

spanName: optional - String
The name of the operation. If not provided, spanName defaults to the name of the traced function.
mlApp: optional - String
The name of the ML application that the operation belongs to. Supplying a non-null value overrides the ML app name supplied at the start of the application. See Tracing multiple applications for more information.
sessionId: optional - String
The ID of the underlying user session. See Tracking user sessions for more information.

Embeddings

To trace an embedding operation, use the function decorator LLMObs.embedding().

Note: Annotating an embedding span’s input requires different formatting than other span types. See Enriching spans for more details on how to specify embedding inputs.

Arguments

model_name: required - string
The name of the invoked LLM.
name: optional - string
The name of the operation. If not provided, name is set to the name of the traced function.
model_provider: optional - string - default: "custom"
session_id: optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
ml_app: optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

from ddtrace.llmobs.decorators import embedding

@embedding(model_name="text-embedding-3", model_provider="openai")
def perform_embedding():
    ... # user application logic
    return

To trace an embedding operation, specify the span kind as embedding, and optionally specify arguments on the options object.

Note: Annotating an embedding span’s input requires different formatting than other span types. See Enriching spans for more details on how to specify embedding inputs.

Arguments

modelName: optional - string - default: "custom"
The name of the invoked LLM.
name: optional - string
The name of the operation. If not provided, name is set to the name of the traced function.
modelProvider: optional - string - default: "custom"
The name of the model provider.
sessionId: optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
mlApp: optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

function performEmbedding () {
  ... // user application logic
  return
}
performEmbedding = llmobs.wrap({ kind: 'embedding', modelName: 'text-embedding-3', modelProvider: 'openai' }, performEmbedding)

Retrievals

To trace a retrieval span, use the function decorator ddtrace.llmobs.decorators.retrieval().

Note: Annotating a retrieval span’s output requires different formatting than other span types. See Enriching spans for more details on how to specify retrieval outputs.

Arguments

name: optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
session_id: optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
ml_app: optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

from ddtrace.llmobs.decorators import retrieval

@retrieval
def get_relevant_docs(question):
    context_documents = ... # user application logic
    LLMObs.annotate(
        input_data=question,
        output_data = [
            {"id": doc.id, "score": doc.score, "text": doc.text, "name": doc.name} for doc in context_documents
        ]
    )
    return

To trace a retrieval span, specify the span kind as retrieval, and optionally specify the following arguments on the options object.

Note: Annotating a retrieval span’s output requires different formatting than other span types. See Enriching spans for more details on how to specify retrieval outputs.

Arguments

name: optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
sessionId: optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
mlApp: optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

The following also includes an example of annotating a span. See Enriching spans for more information.

function getRelevantDocs (question) {
  const contextDocuments = ... // user application logic
  llmobs.annotate({
    inputData: question,
    outputData: contextDocuments.map(doc => ({
      id: doc.id,
      score: doc.score,
      text: doc.text,
      name: doc.name
    }))
  })
  return
}
getRelevantDocs = llmobs.wrap({ kind: 'retrieval' }, getRelevantDocs)

Nesting spans

Starting a new span before the current span is finished automatically traces a parent-child relationship between the two spans. The parent span represents the larger operation, while the child span represents a smaller nested sub-operation within it.

from ddtrace.llmobs.decorators import task, workflow

@workflow
def extract_data(document):
    preprocess_document(document)
    ... # performs data extraction on the document
    return

@task
def preprocess_document(document):
    ... # preprocesses a document for data extraction
    return

function preprocessDocument (document) {
  ... // preprocesses a document for data extraction
  return
}
preprocessDocument = llmobs.wrap({ kind: 'task' }, preprocessDocument)

function extractData (document) {
  preprocessDocument(document)
  ... // performs data extraction on the document
  return
}
extractData = llmobs.wrap({ kind: 'workflow' }, extractData)

import datadog.trace.api.llmobs.LLMObs;
import datadog.trace.api.llmobs.LLMObsSpan;

public class MyJavaClass {
  public void preprocessDocument(String document) {
  LLMObsSpan taskSpan = LLMObs.startTaskSpan("preprocessDocument", null, "session-141");
   ...   // preprocess document for data extraction
   taskSpan.annotateIO(...); // record the input and output
   taskSpan.finish();
  }

  public String extractData(String document) {
    LLMObsSpan workflowSpan = LLMObs.startWorkflowSpan("extractData", null, "session-141");
    preprocessDocument(document);
    ... // perform data extraction on the document
    workflowSpan.annotateIO(...); // record the input and output
    workflowSpan.finish();
  }
}

Enriching spans

The SDK provides the method LLMObs.annotate() to enrich spans with inputs, outputs, and metadata.

The LLMObs.annotate() method accepts the following arguments:

Arguments

span: optional - Span - default: the current active span
The span to annotate. If span is not provided (as when using function decorators), the SDK annotates the current active span.
input_data: optional - JSON serializable type or list of dictionaries
Either a JSON serializable type (for non-LLM spans) or a list of dictionaries with this format: {"content": "...", "role": "...", "tool_calls": ..., "tool_results": ...}, where "tool_calls" are an optional list of tool call dictionaries with required keys: "name", "arguments", and optional keys: "tool_id", "type", and "tool_results" are an optional list of tool result dictionaries with required key: "result", and optional keys: "name", "tool_id", "type" for function calling scenarios. Note: Embedding spans are a special case and require a string or a dictionary (or a list of dictionaries) with this format: {"text": "..."}.
output_data: optional - JSON serializable type or list of dictionaries
Either a JSON serializable type (for non-LLM spans) or a list of dictionaries with this format: {"content": "...", "role": "...", "tool_calls": ...}, where "tool_calls" are an optional list of tool call dictionaries with required keys: "name", "arguments", and optional keys: "tool_id", "type" for function calling scenarios. Note: Retrieval spans are a special case and require a string or a dictionary (or a list of dictionaries) with this format: {"text": "...", "name": "...", "score": float, "id": "..."}.
tool_definitions: optional - list of dictionaries
List of tool definition dictionaries for function calling scenarios. Each tool definition should have a required "name": "..." key and optional "description": "..." and "schema": {...} keys.
metadata: optional - dictionary
A dictionary of JSON serializable key-value pairs that users can add as metadata information relevant to the input or output operation described by the span (model_temperature, max_tokens, top_k, etc.).
metrics: optional - dictionary
A dictionary of JSON serializable keys and numeric values that users can add as metrics relevant to the operation described by the span (input_tokens, output_tokens, total_tokens, time_to_first_token, etc.). The unit for time_to_first_token is in seconds, similar to the duration metric which is emitted by default.
tags: optional - dictionary
A dictionary of JSON serializable key-value pairs that users can add as tags on the span. Example keys: session, env, system, and version. For more information about tags, see Getting Started with Tags.

Example

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import embedding, llm, retrieval, workflow

@llm(model_name="model_name", model_provider="model_provider")
def llm_call(prompt):
    resp = ... # llm call here
    LLMObs.annotate(
        span=None,
        input_data=[{"role": "user", "content": "Hello world!"}],
        output_data=[{"role": "assistant", "content": "How can I help?"}],
        metadata={"temperature": 0, "max_tokens": 200},
        metrics={"input_tokens": 4, "output_tokens": 6, "total_tokens": 10},
        tags={"host": "host_name"},
    )
    return resp

@workflow
def extract_data(document):
    resp = llm_call(document)
    LLMObs.annotate(
        input_data=document,
        output_data=resp,
        tags={"host": "host_name"},
    )
    return resp

@embedding(model_name="text-embedding-3", model_provider="openai")
def perform_embedding():
    ... # user application logic
    LLMObs.annotate(
        span=None,
        input_data={"text": "Hello world!"},
        output_data=[0.0023064255, -0.009327292, ...],
        metrics={"input_tokens": 4},
        tags={"host": "host_name"},
    )
    return

@retrieval(name="get_relevant_docs")
def similarity_search():
    ... # user application logic
    LLMObs.annotate(
        span=None,
        input_data="Hello world!",
        output_data=[{"text": "Hello world is ...", "name": "Hello, World! program", "id": "document_id", "score": 0.9893}],
        tags={"host": "host_name"},
    )
    return

The SDK provides the method llmobs.annotate() to annotate spans with inputs, outputs, and metadata.

The LLMObs.annotate() method accepts the following arguments:

Arguments

span: optional - Span - default: the current active span
The span to annotate. If span is not provided (as when using function wrappers), the SDK annotates the current active span.
annotationOptions: required - object
An object of different types of data to annotate the span with.

The annotationOptions object can contain the following:

inputData: optional - JSON serializable type or list of objects
Either a JSON serializable type (for non-LLM spans) or a list of dictionaries with this format: {role: "...", content: "..."} (for LLM spans). Note: Embedding spans are a special case and require a string or an object (or a list of objects) with this format: {text: "..."}.
outputData: optional - JSON serializable type or list of objects
Either a JSON serializable type (for non-LLM spans) or a list of objects with this format: {role: "...", content: "..."} (for LLM spans). Note: Retrieval spans are a special case and require a string or an object (or a list of objects) with this format: {text: "...", name: "...", score: number, id: "..."}.
metadata: optional - object
An object of JSON serializable key-value pairs that users can add as metadata information relevant to the input or output operation described by the span (model_temperature, max_tokens, top_k, etc.).
metrics: optional - object
An object of JSON serializable keys and numeric values that users can add as metrics relevant to the operation described by the span (input_tokens, output_tokens, total_tokens, etc.).
tags: optional - object
An object of JSON serializable key-value pairs that users can add as tags regarding the span’s context (session, environment, system, versioning, etc.). For more information about tags, see Getting Started with Tags.

Example

function llmCall (prompt) {
  const completion = ... // user application logic to invoke LLM
  llmobs.annotate({
    inputData: [{ role: "user", content: "Hello world!" }],
    outputData: [{ role: "assistant", content: "How can I help?" }],
    metadata: { temperature: 0, max_tokens: 200 },
    metrics: { input_tokens: 4, output_tokens: 6, total_tokens: 10 },
    tags: { host: "host_name" }
  })
  return completion
}
llmCall = llmobs.wrap({ kind:'llm', modelName: 'modelName', modelProvider: 'modelProvider' }, llmCall)

function extractData (document) {
  const resp = llmCall(document)
  llmobs.annotate({
    inputData: document,
    outputData: resp,
    tags: { host: "host_name" }
  })
  return resp
}
extractData = llmobs.wrap({ kind: 'workflow' }, extractData)

function performEmbedding () {
  ... // user application logic
  llmobs.annotate(
    undefined, { // this can be set to undefined or left out entirely
      inputData: { text: "Hello world!" },
      outputData: [0.0023064255, -0.009327292, ...],
      metrics: { input_tokens: 4 },
      tags: { host: "host_name" }
    }
  )
}
performEmbedding = llmobs.wrap({ kind: 'embedding', modelName: 'text-embedding-3', modelProvider: 'openai' }, performEmbedding)

function similaritySearch () {
  ... // user application logic
  llmobs.annotate(undefined, {
    inputData: "Hello world!",
    outputData: [{ text: "Hello world is ...", name: "Hello, World! program", id: "document_id", score: 0.9893 }],
    tags: { host: "host_name" }
  })
  return
}
similaritySearch = llmobs.wrap({ kind: 'retrieval', name: 'getRelevantDocs' }, similaritySearch)

The SDK provides several methods to annotate spans with inputs, outputs, metrics, and metadata.

Annotating inputs and outputs

Use the annotateIO() member method of the LLMObsSpan interface to add structured input and output data to an LLMObsSpan. This includes optional arguments and LLM message objects.

Arguments

If an argument is null or empty, nothing happens. For example, if inputData is a non-empty string while outputData is null, then only inputData is recorded.

inputData: optional - String or List<LLMObs.LLMMessage>
Either a string (for non-LLM spans) or a list of LLMObs.LLMMessages for LLM spans.
outputData: optional - String or List<LLMObs.LLMMessage>
Either a string (for non-LLM spans) or a list of LLMObs.LLMMessages for LLM spans.

LLM Messages

LLM spans must be annotated with LLM Messages using the LLMObs.LLMMessage object.

The LLMObs.LLMMessage object can be instantiated by calling LLMObs.LLMMessage.from() with the following arguments:

role: required - String
A string describing the role of the author of the message.
content: required - String
A string containing the content of the message.

Example

import datadog.trace.api.llmobs.LLMObs;

public class MyJavaClass {
  public String invokeChat(String userInput) {
    LLMObsSpan llmSpan = LLMObs.startLLMSpan("my-llm-span-name", "my-llm-model", "my-company", "maybe-ml-app-override", "session-141");
    String systemMessage = "You are a helpful assistant";
    Response chatResponse = ... // user application logic to invoke LLM
    llmSpan.annotateIO(
      Arrays.asList(
        LLMObs.LLMMessage.from("user", userInput),
        LLMObs.LLMMessage.from("system", systemMessage)
      ),
      Arrays.asList(
        LLMObs.LLMMessage.from(chatResponse.role, chatResponse.content)
      )
    );
    llmSpan.finish();
    return chatResponse;
  }
}

Adding metrics

Bulk add metrics

The setMetrics() member method of the LLMObsSpan interface accepts the following arguments to attach multiple metrics in bulk:

Arguments

metrics: required - Map<String, Number>
A map of JSON-serializable keys and numeric values that users can add to record metrics relevant to the operation described by the span (for example, input_tokens, output_tokens, or total_tokens).

Add a single metric

The setMetric() member method of the LLMObsSpan interface accepts the following arguments to attach a single metric:

Arguments

key: required - CharSequence
The name of the metric.
value: required - int, long, or double
The value of the metric.

Examples

import datadog.trace.api.llmobs.LLMObs;

public class MyJavaClass {
  public String invokeChat(String userInput) {
    LLMObsSpan llmSpan = LLMObs.startLLMSpan("my-llm-span-name", "my-llm-model", "my-company", "maybe-ml-app-override", "session-141");
    String chatResponse = ... // user application logic to invoke LLM
    llmSpan.setMetrics(Map.of(
      "input_tokens", 617,
      "output_tokens", 338,
      "time_per_output_token", 0.1773
    ));
    llmSpan.setMetric("total_tokens", 955);
    llmSpan.setMetric("time_to_first_token", 0.23);
    llmSpan.finish();
    return chatResponse;
  }
}

Adding tags

For more information about tags, see Getting Started with Tags.

Bulk add tags

The setTags() member method of the LLMObsSpan interface accepts the following arguments to attach multiple tags in bulk:

Arguments

tags: required - Map<String, Object>
A map of JSON-serializable key-value pairs that users can add as tags to describe the span’s context (for example, session, environment, system, or version).

Add a single tag

The setTag() member method of the LLMObsSpan interface accepts the following arguments to attach a single tag:

Arguments

key: required - String
The key of the tag.
value: required - int, long, double, boolean, or String
The value of the tag.

Examples

import datadog.trace.api.llmobs.LLMObs;

public class MyJavaClass {
  public String invokeChat(String userInput) {
    LLMObsSpan llmSpan = LLMObs.startLLMSpan("my-llm-span-name", "my-llm-model", "my-company", "maybe-ml-app-override", "session-141");
    String chatResponse = ... // user application logic to invoke LLM
    llmSpan.setTags(Map.of(
      "chat_source", "web",
      "users_in_chat", 3
    ));
    llmSpan.setTag("is_premium_user", true);
    llmSpan.finish();
    return chatResponse;
  }
}

Annotating errors

Adding a Throwable (recommended)

The addThrowable() member method of the LLMObsSpan interface accepts the following argument to attach a throwable with a stack trace:

Arguments

throwable: required - Throwable
The throwable/exception that occurred.

Adding an error message

The setErrorMessage() member method of the LLMObsSpan interface accepts the following argument to attach an error string:

Arguments

errorMessage: required - String
The message of the error.

Setting an error flag

The setError() member method of the LLMObsSpan interface accepts the following argument to indicate an error with the operation:

Arguments

error: required - boolean
true if the span errored.

Examples

import datadog.trace.api.llmobs.LLMObs;

public class MyJavaClass {
  public String invokeChat(String userInput) {
    LLMObsSpan llmSpan = LLMObs.startLLMSpan("my-llm-span-name", "my-llm-model", "my-company", "maybe-ml-app-override", "session-141");
    String chatResponse = "N/A";
    try {
      chatResponse = ... // user application logic to invoke LLM
    } catch (Exception e) {
      llmSpan.addThrowable(e);
      throw new RuntimeException(e);
    } finally {
      llmSpan.finish();
    }
    return chatResponse;
  }
}

Annotating metadata

The setMetadata() member method of the LLMObsSpan interface accepts the following arguments:

metadata: required - Map<String, Object>
A map of JSON-serializable key-value pairs that contains metadata relevant to the input or output operation described by the span.

Example

import datadog.trace.api.llmobs.LLMObs;

public class MyJavaClass {
  public String invokeChat(String userInput) {
    LLMObsSpan llmSpan = LLMObs.startLLMSpan("my-llm-span-name", "my-llm-model", "my-company", "maybe-ml-app-override", "session-141");
    llmSpan.setMetadata(
      Map.of(
        "temperature", 0.5,
        "is_premium_member", true,
        "class", "e1"
      )
    );
    String chatResponse = ... // user application logic to invoke LLM
    return chatResponse;
  }
}

Annotating auto-instrumented spans

The SDK’s LLMObs.annotation_context() method returns a context manager that can be used to modify all auto-instrumented spans started while the annotation context is active.

The LLMObs.annotation_context() method accepts the following arguments:

Arguments

name: optional - str
Name that overrides the span name for any auto-instrumented spans that are started within the annotation context.
prompt: optional - dictionary
A dictionary that represents the prompt used for an LLM call. See the Prompt object documentation for the complete schema and supported keys. You can also import the Prompt object from ddtrace.llmobs.utils and pass it in as the prompt argument. Note: This argument only applies to LLM spans.
tags: optional - dictionary
A dictionary of JSON serializable key-value pairs that users can add as tags on the span. Example keys: session, env, system, and version. For more information about tags, see Getting Started with Tags.

Example

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import workflow

@workflow
def rag_workflow(user_question):
    context_str = retrieve_documents(user_question).join(" ")

    with LLMObs.annotation_context(
        prompt = Prompt(
            id="chatbot_prompt",
            version="1.0.0",
            template="Please answer the question using the provided context: {{question}}\n\nContext:\n{{context}}",
            variables={
                "question": user_question,
                "context": context_str,
            }
        ),
        tags = {
            "retrieval_strategy": "semantic_similarity"
        },
        name = "augmented_generation"
    ):
        completion = openai_client.chat.completions.create(...)
    return completion.choices[0].message.content

The SDK’s llmobs.annotationContext() accepts a callback function that can be used to modify all auto-instrumented spans started while inside the scope of the callback function.

The llmobs.annotationContext() method accepts the following options on the first argument:

Options

name: optional - str
Name that overrides the span name for any auto-instrumented spans that are started within the annotation context.
tags: optional - object
An object of JSON serializable key-value pairs that users can add as tags on the span. Example keys: session, env, system, and version. For more information about tags, see Getting Started with Tags.

Example

const { llmobs } = require('dd-trace');

function ragWorkflow(userQuestion) {
    const contextStr = retrieveDocuments(userQuestion).join(" ");

    const completion = await llmobs.annotationContext({
      tags: {
        retrieval_strategy: "semantic_similarity"
      },
      name: "augmented_generation"
    }, async () => {
      const completion = await openai_client.chat.completions.create(...);
      return completion.choices[0].message.content;
    });
}

Prompt tracking

Attach structured prompt metadata to the LLM span so you can reproduce results, audit changes, and compare prompt performance across versions. When using templates, LLM Observability also provides version tracking based on template content changes.

Use LLMObs.annotation_context(prompt=...) to attach prompt metadata before the LLM call. For more details on span annotation, see Enriching spans.

Arguments

prompt: required - dictionary
A typed dictionary that follows the Prompt schema below.

Prompt structure

Supported keys:

id (str): Logical identifier for this prompt. Should be unique per ml_app. Defaults to {ml_app}-unnamed_prompt
version (str): Version tag for the prompt (for example, “1.0.0”). See version tracking for more details.
variables (Dict[str, str]): Variables used to populate the template placeholders.
template (str): Template string with placeholders (for example, "Translate {{text}} to {{lang}}").
chat_template (List[Message]): Multi-message template form. Provide a list of { "role": "<role>", "content": "<template string with placeholders>" } objects.
tags (Dict[str, str]): Tags to attach to the prompt run.
rag_context_variables (List[str]): Variable keys that contain ground-truth/context content. Used for hallucination detection.
rag_query_variables (List[str]): Variable keys that contain the user query. Used for hallucination detection.

Example: single-template prompt

from ddtrace.llmobs import LLMObs

def answer_question(text):
    # Attach prompt metadata to the upcoming LLM span using LLMObs.annotation_context()
    with LLMObs.annotation_context(prompt={
        "id": "translation-template",
        "version": "1.0.0",
        "chat_template": [{"role": "user", "content": "Translate to {{lang}}: {{text}}"}],
        "variables": {"lang": "fr", "text": text},
        "tags": {"team": "nlp"}
    }):
        # Example provider call (replace with your client)
        completion = openai_client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": f"Translate to fr: {text}"}]
        )
    return completion

Example: LangChain prompt templates

When you use LangChain’s prompt templating with auto-instrumentation, assign templates to variables with meaningful names. Auto-instrumentation uses these names to identify prompts.

# "translation_template" will be used to identify the template in Datadog
translation_template = PromptTemplate.from_template("Translate {text} to {language}")
chain = translation_template | llm

Use llmobs.annotationContext({ prompt: ... }, () => { ... }) to attach prompt metadata before the LLM call. For more details on span annotation, see Enriching spans.

Arguments

Options

prompt: required - object
An object that follows the Prompt schema below.

Prompt structure

Supported properties:

id (string): Logical identifier for this prompt. Should be unique per ml_app. Defaults to {ml_app}-unnamed_prompt
version (string): Version tag for the prompt (for example, “1.0.0”). See version tracking for more details.
variables (Record<string, string>): Variables used to populate the template placeholders.
template (string | List[Message]): Template string with placeholders (for example, "Translate {{text}} to {{lang}}"). Alternatively, a list of { "role": "<role>", "content": "<template string with placeholders>" } objects.
tags (Record<string, string>): Tags to attach to the prompt run.
contextVariables (string[]): Variable keys that contain ground-truth/context content. Used for hallucination detection.
queryVariables (string[]): Variable keys that contain the user query. Used for hallucination detection.

Example: single-template prompt

const { llmobs } = require('dd-trace');

function answerQuestion(text) {
    // Attach prompt metadata to the upcoming LLM span using LLMObs.annotation_context()
    return llmobs.annotationContext({
      prompt: {
        id: "translation-template",
        version: "1.0.0",
        chat_template: [{"role": "user", "content": "Translate to {{lang}}: {{text}}"}],
        variables: {"lang": "fr", "text": text},
        tags: {"team": "nlp"}
      }
    }, () => {
      // Example provider call (replace with your client)
      return openaiClient.chat.completions.create({
          model: "gpt-4o",
          messages: [{"role": "user", "content": f"Translate to fr: {text}"}]
        });
    });
}

Notes

Annotating a prompt is only available on LLM spans.
Place the annotation immediately before the provider call so it applies to the correct LLM span.
Use a unique prompt id to distinguish different prompts within your application.
Keep templates static by using placeholder syntax (like {{variable_name}}) and define dynamic content in the variables section.
For multiple auto-instrumented LLM calls within a block, use an annotation context to apply the same prompt metadata across calls. See Annotating auto-instrumented spans.

Version tracking

LLM Observability provides automatic versioning for your prompts when no explicit version is specified. When you provide a template or chat_template in your prompt metadata without a version tag, the system automatically generates a version by computing a hash of the template content. If you do provide a version tag, LLM Observability uses your specified version label instead of auto-generating one.

The versioning system works as follows:

Auto versioning: When no version tag is provided, LLM Observability computes a hash of the template or chat_template content to automatically generate a numerical version identifier
Manual versioning: When a version tag is provided, LLM Observability uses your specified version label exactly as provided
Version history: Both auto-generated and manual versions are maintained in the version history to track prompt evolution over time

This gives you the flexibility to either rely on automatic version management based on template content changes, or maintain full control over versioning with your own version labels.

Cost monitoring

Attach token metrics (for automatic cost tracking) or cost metrics (for manual cost tracking) to your LLM/embedding spans. Token metrics allow Datadog to calculate costs using provider pricing, while cost metrics let you supply your own pricing when using custom or unsupported models. For more details, see Costs.

If you’re using automatic instrumentation, token and cost metrics appear on your spans automatically. If you’re instrumenting manually, follow the guidance below.

Use case: Using a common model provider

Datadog supports common model providers such as OpenAI, Azure OpenAI, Anthropic, and Google Gemini. When using these providers, you only need to annotate your LLM request with model_name, model_provider, and token usage. Datadog automatically calculates the estimated cost based on the provider’s pricing.

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import llm

@llm(model_name="gpt-5.1", model_provider="openai")
def llm_call(prompt):
    resp = ... # llm call here
    # Annotate token metrics
    LLMObs.annotate(
        metrics={
          "input_tokens": 50,
          "output_tokens": 120,
          "total_tokens": 170,
          "non_cached_input_tokens": 13,  # optional
          "cache_read_input_tokens": 22,  # optional
          "cache_write_input_tokens": 15, # optional
        },
    )
    return resp

Use case: Using a custom model

For custom or unsupported models, you must annotate the span manually with the cost data.

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import llm

@llm(model_name="custom_model", model_provider="model_provider")
def llm_call(prompt):
    resp = ... # llm call here
    # Annotate cost metrics
    LLMObs.annotate(
        metrics={
          "input_cost": 3,
          "output_cost": 7,
          "total_cost": 10,
          "non_cached_input_cost": 1,    # optional
          "cache_read_input_cost": 0.6,  # optional
          "cache_write_input_cost": 1.4, # optional
        },
    )
    return resp

Evaluations

The LLM Observability SDK provides methods to export and submit your evaluations to Datadog.

For building reusable, class-based evaluators (BaseEvaluator, BaseSummaryEvaluator) with rich result metadata, see the Evaluation Developer Guide.

Evaluations must be joined to a single span. You can identify the target span using either of these two methods:

Tag-based joining - Join an evaluation using a unique key-value tag pair that is set on a single span. The evaluation will fail to join if the tag key-value pair matches multiple spans or no spans.
Direct span reference - Join an evaluation using the span’s unique trace ID and span ID combination.

Exporting a span

LLMObs.export_span() can be used to extract the span context from a span. This method is helpful for associating your evaluation with the corresponding span.

Arguments

The LLMObs.export_span() method accepts the following argument:

span: optional - Span
The span to extract the span context (span and trace IDs) from. If not provided (as when using function decorators), the SDK exports the current active span.

Example

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import llm

@llm(model_name="claude", name="invoke_llm", model_provider="anthropic")
def llm_call():
    completion = ... # user application logic to invoke LLM
    span_context = LLMObs.export_span(span=None)
    return completion

llmobs.exportSpan() can be used to extract the span context from a span. You’ll need to use this method to associate your evaluation with the corresponding span.

Arguments

The llmobs.exportSpan() method accepts the following argument:

span: optional - Span
The span to extract the span context (span and trace IDs) from. If not provided (as when using function wrappers), the SDK exports the current active span.

Example

function llmCall () {
  const completion = ... // user application logic to invoke LLM
  const spanContext = llmobs.exportSpan()
  return completion
}
llmCall = llmobs.wrap({ kind: 'llm', name: 'invokeLLM', modelName: 'claude', modelProvider: 'anthropic' }, llmCall)

Submitting evaluations

LLMObs.submit_evaluation() can be used to submit your custom evaluation associated with a given span.

LLMObs.submit_evaluation_for is deprecated and will be removed in the next major version of ddtrace (4.0). To migrate, rename your LLMObs.submit_evaluation_for calls with LLMObs.submit_evaluation.

Note: Custom evaluations are evaluators that you implement and host yourself. These differ from out-of-the-box evaluations, which are automatically computed by Datadog using built-in evaluators. To configure out-of-the-box evaluations for your application, use the LLM Observability > Settings > Evaluations page in Datadog.

The LLMObs.submit_evaluation() method accepts the following arguments:

Arguments

label: required - string
The name of the evaluation.
metric_type: required - string
The type of the evaluation. Must be categorical, score, boolean or json.
value: required - string, numeric type, or dict
The value of the evaluation. Must be a string (metric_type==categorical), integer/float (metric_type==score), boolean (metric_type==boolean), or dict (metric_type==json).
span: optional - dictionary
A dictionary that uniquely identifies the span associated with this evaluation. Must contain span_id (string) and trace_id (string). Use LLMObs.export_span() to generate this dictionary.
span_with_tag_value: optional - dictionary
A dictionary that uniquely identifies the span associated with this evaluation. Must contain tag_key (string) and tag_value (string).
Note: Exactly one of span or span_with_tag_value is required. Supplying both, or neither, raises a ValueError.
ml_app: required - string
The name of the ML application.
timestamp_ms: optional - integer
The unix timestamp in milliseconds when the evaluation metric result was generated. If not provided, this defaults to the current time.
tags: optional - dictionary
A dictionary of string key-value pairs that users can add as tags regarding the evaluation. For more information about tags, see Getting Started with Tags.
assessment: optional - string
An assessment of this evaluation. Accepted values are pass and fail.
reasoning: optional - string
A text explanation of the evaluation result.
metadata: optional - dictionary
A dictionary containing arbitrary structured metadata associated with the evaluation result.

Example

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import llm

@llm(model_name="claude", name="invoke_llm", model_provider="anthropic")
def llm_call():
    completion = ... # user application logic to invoke LLM

    # joining an evaluation to a span via a tag key-value pair
    msg_id = get_msg_id()
    LLMObs.annotate(
        tags = {'msg_id': msg_id}
    )

    LLMObs.submit_evaluation(
        span_with_tag_value = {
            "tag_key": "msg_id",
            "tag_value": msg_id
        },
        ml_app = "chatbot",
        label="harmfulness",
        metric_type="score",
        value=10,
        tags={"evaluation_provider": "ragas"},
        assessment="fail",
        reasoning="Malicious intent was detected in the user instructions.",
        metadata={"details": ["jailbreak", "SQL injection"]}
    )

    # joining an evaluation to a span via span ID and trace ID
    span_context = LLMObs.export_span(span=None)
    LLMObs.submit_evaluation(
        span_context = span_context,
        ml_app = "chatbot",
        label="harmfulness",
        metric_type="score",
        value=10,
        tags={"evaluation_provider": "ragas"},
        assessment="fail",
        reasoning="Malicious intent was detected in the user instructions.",
        metadata={"details": ["jailbreak", "SQL injection"]}
    )
    return completion

llmobs.submitEvaluation() can be used to submit your custom evaluation associated with a given span.

The llmobs.submitEvaluation() method accepts the following arguments:

Arguments

span_context: required - dictionary
The span context to associate the evaluation with. This should be the output of LLMObs.export_span().
evaluationOptions: required - object
An object of the evaluation data.

The evaluationOptions object can contain the following:

label: required - string
The name of the evaluation.
metricType: required - string
The type of the evaluation. Must be one of “categorical”, “score”, “boolean” or “json”.
value: required - string or numeric type
The value of the evaluation. Must be a string (for categorical metric_type), number (for score metric_type), boolean (for boolean metric_type), or a JSON object (for json metric_type).
tags: optional - dictionary
A dictionary of string key-value pairs that users can add as tags regarding the evaluation. For more information about tags, see Getting Started with Tags.
assessment: optional - string
An assessment of this evaluation. Accepted values are pass and fail.
reasoning: optional - string
A text explanation of the evaluation result.
metadata: optional - dictionary
A JSON object containing arbitrary structured metadata associated with the evaluation result.

Example

function llmCall () {
  const completion = ... // user application logic to invoke LLM
  const spanContext = llmobs.exportSpan()
  llmobs.submitEvaluation(spanContext, {
    label: "harmfulness",
    metricType: "score",
    value: 10,
    tags: { evaluationProvider: "ragas" }
  })
  return completion
}
llmCall = llmobs.wrap({ kind: 'llm', name: 'invokeLLM', modelName: 'claude', modelProvider: 'anthropic' }, llmCall)

Use LLMObs.SubmitEvaluation() to submit your custom evaluation associated with a given span.

The LLMObs.SubmitEvaluation() method accepts the following arguments:

Arguments

llmObsSpan: required - LLMObsSpan
The span context to associate the evaluation with.
label: required - String
The name of the evaluation.
categoricalValue or scoreValue: required - String or double
The value of the evaluation. Must be a string (for categorical evaluations) or a double (for score evaluations).
tags: optional - Map<String, Object>
A dictionary of string key-value pairs used to tag the evaluation. For more information about tags, see Getting Started with Tags.

Example

import datadog.trace.api.llmobs.LLMObs;

public class MyJavaClass {
  public String invokeChat(String userInput) {
    LLMObsSpan llmSpan = LLMObs.startLLMSpan("my-llm-span-name", "my-llm-model", "my-company", "maybe-ml-app-override", "session-141");
    String chatResponse = "N/A";
    try {
      chatResponse = ... // user application logic to invoke LLM
    } catch (Exception e) {
      llmSpan.addThrowable(e);
      throw new RuntimeException(e);
    } finally {
      llmSpan.finish();

      // submit evaluations
      LLMObs.SubmitEvaluation(llmSpan, "toxicity", "toxic", Map.of("language", "english"));
      LLMObs.SubmitEvaluation(llmSpan, "f1-similarity", 0.02, Map.of("provider", "f1-calculator"));
    }
    return chatResponse;
  }
}

Span processing

To modify input and output data on spans, you can configure a processor function. The processor function has access to span tags to enable conditional input/output modification. Processor functions can either return the modified span to emit it, or return None/null to prevent the span from being emitted entirely. This is useful for filtering out spans that contain sensitive data or meet certain criteria.

Example

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs import LLMObsSpan

def redact_processor(span: LLMObsSpan) -> LLMObsSpan:
    if span.get_tag("no_output") == "true":
        for message in span.output:
            message["content"] = ""
    return span


# If using LLMObs.enable()
LLMObs.enable(
  ...
  span_processor=redact_processor,
)
# else when using `ddtrace-run`
LLMObs.register_processor(redact_processor)

with LLMObs.llm("invoke_llm_with_no_output"):
    LLMObs.annotate(tags={"no_output": "true"})

Example: conditional modification with auto-instrumentation

When using auto instrumentation, the span is not always contextually accessible. To conditionally modify the inputs and outputs on auto-instrumented spans, annotation_context() can be used in addition to a span processor.

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs import LLMObsSpan

def redact_processor(span: LLMObsSpan) -> LLMObsSpan:
    if span.get_tag("no_input") == "true":
        for message in span.input:
            message["content"] = ""
    return span

LLMObs.register_processor(redact_processor)


def call_openai():
    with LLMObs.annotation_context(tags={"no_input": "true"}):
        # make call to openai
        ...

Example: preventing spans from being emitted

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs import LLMObsSpan
from typing import Optional

def filter_processor(span: LLMObsSpan) -> Optional[LLMObsSpan]:
    # Skip spans that are marked as internal or contain sensitive data
    if span.get_tag("internal") == "true" or span.get_tag("sensitive") == "true":
        return None  # This span will not be emitted

    # Process and return the span normally
    return span

LLMObs.register_processor(filter_processor)

# This span will be filtered out and not sent to Datadog
with LLMObs.workflow("internal_workflow"):
    LLMObs.annotate(tags={"internal": "true"})
    # ... workflow logic

Example

const tracer = require('dd-trace').init({
  llmobs: {
    mlApp: "<YOUR_ML_APP_NAME>"
  }
})

const llmobs = tracer.llmobs

function redactProcessor(span) {
  if (span.getTag("no_output") === "true") {
    for (const message of span.output) {
      message.content = ""
    }
  }
  return span
}

llmobs.registerProcessor(redactProcessor)

Example: conditional modification with auto-instrumentation

When using auto instrumentation, the span is not always contextually accessible. To conditionally modify the inputs and outputs on auto-instrumented spans, llmobs.annotationContext() can be used in addition to a span processor.

const { llmobs } = require('dd-trace');

function redactProcessor(span) {
  if (span.getTag("no_input") == "true") {
    for (const message of span.input) {
      message.content = "";
    }
  }

  return span;
}

llmobs.registerProcessor(redactProcessor);

async function callOpenai() {
  await llmobs.annotationContext({ tags: { no_input: "true" } }, async () => {
    // make call to openai
  });
}

Example: preventing spans from being emitted

const tracer = require('dd-trace').init({
  llmobs: {
    mlApp: "<YOUR_ML_APP_NAME>"
  }
})

const llmobs = tracer.llmobs

function filterProcessor(span) {
  // Skip spans that are marked as internal or contain sensitive data
  if (span.getTag("internal") === "true" || span.getTag("sensitive") === "true") {
    return null  // This span will not be emitted
  }

  // Process and return the span normally
  return span
}

llmobs.registerProcessor(filterProcessor)

// This span will be filtered out and not sent to Datadog
function internalWorkflow() {
  return llmobs.trace({ kind: 'workflow', name: 'internalWorkflow' }, (span) => {
    llmobs.annotate({ tags: { internal: "true" } })
    // ... workflow logic
  })
}

Tracking user sessions

Session tracking allows you to associate multiple interactions with a given user.

When starting a root span for a new trace or span in a new process, specify the session_id argument with the string ID of the underlying user session, which is submitted as a tag on the span. Optionally, you can also specify the user_handle, user_name, and user_id tags.

from ddtrace.llmobs.decorators import workflow

@workflow(session_id="<SESSION_ID>")
def process_user_message():
    LLMObs.annotate(
        ...
        tags = {"user_handle": "poodle@dog.com", "user_id": "1234", "user_name": "poodle"}
    )
    return

Session tracking tags

Tag	Description
`session_id`	The ID representing a single user session, for example, a chat session.
`user_handle`	The handle for the user of the chat session.
`user_name`	The name for the user of the chat session.
`user_id`	The ID for the user of the chat session.

When starting a root span for a new trace or span in a new process, specify the sessionId argument with the string ID of the underlying user session:

function processMessage() {
    ... # user application logic
    return
}
processMessage = llmobs.wrap({ kind: 'workflow', sessionId: "<SESSION_ID>" }, processMessage)

When starting a root span for a new trace or span in a new process, specify the sessionId argument with the string ID of the underlying user session:

import datadog.trace.api.llmobs.LLMObs;

public class MyJavaClass {
  public String processChat(int userID) {
    LLMObsSpan workflowSpan = LLMObs.startWorkflowSpan("incoming-chat", null, "session-" + System.currentTimeMillis() + "-" + userID);
    String chatResponse = answerChat(); // user application logic
    workflowSpan.annotateIO(...); // record the input and output
    workflowSpan.finish();
    return chatResponse;
  }
}

Distributed tracing

The SDK supports tracing across distributed services or hosts. Distributed tracing works by propagating span information across web requests.

The ddtrace library provides some out-of-the-box integrations that support distributed tracing for popular web framework and HTTP libraries. If your application makes requests using these supported libraries, you can enable distributed tracing by running:

from ddtrace import patch
patch(<INTEGRATION_NAME>=True)

If your application does not use any of these supported libraries, you can enable distributed tracing by manually propagating span information to and from HTTP headers. The SDK provides the helper methods LLMObs.inject_distributed_headers() and LLMObs.activate_distributed_headers() to inject and activate tracing contexts in request headers.

Injecting distributed headers

The LLMObs.inject_distributed_headers() method takes a span and injects its context into the HTTP headers to be included in the request. This method accepts the following arguments:

request_headers: required - dictionary
The HTTP headers to extend with tracing context attributes.
span: optional - Span - default: The current active span.
The span to inject its context into the provided request headers. Any spans (including those with function decorators), this defaults to the current active span.

Activating distributed headers

The LLMObs.activate_distributed_headers() method takes HTTP headers and extracts tracing context attributes to activate in the new service.

Note: You must call LLMObs.activate_distributed_headers() before starting any spans in your downstream service. Spans started prior (including function decorator spans) do not get captured in the distributed trace.

This method accepts the following argument:

request_headers: required - dictionary
The HTTP headers to extract tracing context attributes.

Example

client.py

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import workflow

@workflow
def client_send_request():
    request_headers = {}
    request_headers = LLMObs.inject_distributed_headers(request_headers)
    send_request("<method>", request_headers)  # arbitrary HTTP call

server.py

from ddtrace.llmobs import LLMObs

def server_process_request(request):
    LLMObs.activate_distributed_headers(request.headers)
    with LLMObs.task(name="process_request") as span:
        pass  # arbitrary server work

The dd-trace library provides out-of-the-box integrations that support distributed tracing for popular web frameworks. Requiring the tracer automatically enables these integrations, but you can disable them optionally with:

const tracer = require('dd-trace').init({
  llmobs: { ... },
})
tracer.use('http', false) // disable the http integration

Advanced tracing

Tracing spans using inline methods

For each span kind, the ddtrace.llmobs.LLMObs class provides a corresponding inline method to automatically trace the operation a given code block entails. These methods have the same argument signature as their function decorator counterparts, with the addition that name defaults to the span kind (llm, workflow, etc.) if not provided. These methods can be used as context managers to automatically finish the span after the enclosed code block is completed.

Example

from ddtrace.llmobs import LLMObs

def process_message():
    with LLMObs.workflow(name="process_message", session_id="<SESSION_ID>", ml_app="<ML_APP>") as workflow_span:
        ... # user application logic
    return

Persisting a span across contexts

To manually start and stop a span across different contexts or scopes:

Start a span manually using the same methods (for example, the LLMObs.workflow method for a workflow span), but as a plain function call rather than as a context manager.
Pass the span object as an argument to other functions.
Stop the span manually with the span.finish() method. Note: the span must be manually finished, otherwise it is not submitted.

Example

from ddtrace.llmobs import LLMObs

def process_message():
    workflow_span = LLMObs.workflow(name="process_message")
    ... # user application logic
    separate_task(workflow_span)
    return

def separate_task(workflow_span):
    ... # user application logic
    workflow_span.finish()
    return

Force flushing in serverless environments

LLMObs.flush() is a blocking function that submits all buffered LLM Observability data to the Datadog backend. This can be useful in serverless environments to prevent an application from exiting until all LLM Observability traces are submitted.

Tracing multiple applications

The SDK supports tracing multiple LLM applications from the same service.

You can configure an environment variable DD_LLMOBS_ML_APP to the name of your LLM application, which all generated spans are grouped into by default.

To override this configuration and use a different LLM application name for a given root span, pass the ml_app argument with the string name of the underlying LLM application when starting a root span for a new trace or a span in a new process.

from ddtrace.llmobs.decorators import workflow

@workflow(name="process_message", ml_app="<NON_DEFAULT_ML_APP_NAME>")
def process_message():
    ... # user application logic
    return

Tracing spans using inline methods

The llmobs SDK provides a corresponding inline method to automatically trace the operation a given code block entails. These methods have the same argument signature as their function wrapper counterparts, with the addition that name is required, as the name cannot be inferred from an anonymous callback. This method will finish the span under the following conditions:

If the function returns a Promise, then the span finishes when the promise is resolved or rejected.
If the function takes a callback as its last parameter, then the span finishes when that callback is called.
If the function doesn’t accept a callback and doesn’t return a Promise, then the span finishes at the end of the function execution.

Example without a callback

function processMessage () {
  return llmobs.trace({ kind: 'workflow', name: 'processMessage', sessionId: '<SESSION_ID>', mlApp: '<ML_APP>' }, workflowSpan => {
    ... // user application logic
    return
  })
}

Example with a callback

function processMessage () {
  return llmobs.trace({ kind: 'workflow', name: 'processMessage', sessionId: '<SESSION_ID>', mlApp: '<ML_APP>' }, (workflowSpan, cb) => {
    ... // user application logic
    let maybeError = ...
    cb(maybeError) // the span will finish here, and tag the error if it is not null or undefined
    return
  })
}

The return type of this function matches the return type of the traced function:

function processMessage () {
  const result = llmobs.trace({ kind: 'workflow', name: 'processMessage', sessionId: '<SESSION_ID>', mlApp: '<ML_APP>' }, workflowSpan => {
    ... // user application logic
    return 'hello world'
  })

  console.log(result) // 'hello world'
  return result
}

Function decorators in TypeScript

The Node.js LLM Observability SDK offers an llmobs.decorate function which serves as a function decorator for TypeScript applications. This functions tracing behavior is the same as llmobs.wrap.

Example

// index.ts
import tracer from 'dd-trace';
tracer.init({
  llmobs: {
    mlApp: "<YOUR_ML_APP_NAME>",
  },
});

const { llmobs } = tracer;

class MyAgent {
  @llmobs.decorate({ kind: 'agent' })
  async runChain () {
    ... // user application logic
    return
  }
}

Force flushing in serverless environments

llmobs.flush() is a blocking function that submits all buffered LLM Observability data to the Datadog backend. This can be useful in serverless environments to prevent an application from exiting until all LLM Observability traces are submitted.

Tracing multiple applications

The SDK supports tracing multiple LLM applications from the same service.

You can configure an environment variable DD_LLMOBS_ML_APP to the name of your LLM application, which all generated spans are grouped into by default.

To override this configuration and use a different LLM application name for a given root span, pass the mlApp argument with the string name of the underlying LLM application when starting a root span for a new trace or a span in a new process.

function processMessage () {
  ... // user application logic
  return
}
processMessage = llmobs.wrap({ kind: 'workflow', name: 'processMessage', mlApp: '<NON_DEFAULT_ML_APP_NAME>' }, processMessage)

Application naming guidelines

Your application name (the value of DD_LLMOBS_ML_APP) must follow these guidelines:

Must be a lowercase Unicode string
Can be up to 193 characters long
Cannot contain contiguous or trailing underscores
Can contain the following characters:
- Alphanumerics
- Underscores
- Minuses
- Colons
- Periods
- Slashes