LLM Observability Python SDK Reference

LLM Observability is not available in the US1-FED site.

LLM Observability is in public beta.

Overview

The LLM Observability SDK for Python enhances the observability of your Python-based LLM applications. The SDK supports Python versions 3.7 and newer. For information about LLM Observability’s integration support, see Auto Instrumentation.

You can install and configure tracing of various operations such as workflows, tasks, and API calls with function decorators or context managers. You can also annotate these traces with metadata for deeper insights into the performance and behavior of your applications, supporting multiple LLM services or models from the same environment.

For usage examples you can run from a Jupyter notebook, see the LLM Observability Jupyter Notebooks repository.

Setup

Prerequisites

  1. The latest ddtrace package must be installed:
pip install ddtrace
  1. LLM Observability requires a Datadog API key (see the instructions for creating an API key).

Command-line setup

Enable LLM Observability by running your application using the ddtrace-run command and specifying the required environment variables.

Note: ddtrace-run automatically turns on all LLM Observability integrations.

DD_SITE=<YOUR_DATADOG_SITE> DD_API_KEY=<YOUR_API_KEY> DD_LLMOBS_ENABLED=1 \
DD_LLMOBS_ML_APP=<YOUR_ML_APP_NAME> ddtrace-run <YOUR_APP_STARTUP_COMMAND>
DD_API_KEY
required - string
Your Datadog API key.
DD_SITE
required - string
The Datadog site to submit your LLM data. Your site is .
DD_LLMOBS_ENABLED
required - integer or string
Toggle to enable submitting data to LLM Observability. Should be set to 1 or true.
DD_LLMOBS_ML_APP
required - string
The name of your LLM application, service, or project, under which all traces and spans are grouped. This helps distinguish between different applications or experiments. See Application naming guidelines for allowed characters and other constraints. To override this value for a given root span, see Tracing multiple applications.
DD_LLMOBS_AGENTLESS_ENABLED
optional - integer or string - default: false
Only required if you are not using the Datadog Agent, in which case this should be set to 1 or true.

In-code setup

Enable LLM Observability programatically through the LLMOBs.enable() function instead of running with the ddtrace-run command. Note: Do not use this setup method with the ddtrace-run command.

from ddtrace.llmobs import LLMObs
LLMObs.enable(
  ml_app="<YOUR_ML_APP_NAME>",
  api_key="<YOUR_DATADOG_API_KEY>",
  site="<YOUR_DATADOG_SITE>",
  agentless_enabled=True,
  integrations_enabled=True,
)
ml_app
optional - string
The name of your LLM application, service, or project, under which all traces and spans are grouped. This helps distinguish between different applications or experiments. See Application naming guidelines for allowed characters and other constraints. To override this value for a given trace, see Tracing multiple applications. If not provided, this defaults to the value of DD_LLMOBS_ML_APP.
integrations_enabled - default: true
optional - boolean
A flag to enable automatically tracing LLM calls for Datadog’s supported LLM integrations. If not provided, all supported LLM integrations are enabled by default. To avoid using the LLM integrations, set this value to false.
agentless_enabled
optional - boolean - default: false
Only required if you are not using the Datadog Agent, in which case this should be set to True. This configures the ddtrace library to not send any data that requires the Datadog Agent. If not provided, this defaults to the value of DD_LLMOBS_AGENTLESS_ENABLED.
site
optional - string
The Datadog site to submit your LLM data. Your site is . If not provided, this defaults to the value of DD_SITE.
api_key
optional - string
Your Datadog API key. If not provided, this defaults to the value of DD_API_KEY.
env
optional - string
The name of your application’s environment (examples: prod, pre-prod, staging). If not provided, this defaults to the value of DD_ENV.
service
optional - string
The name of the service used for your application. If not provided, this defaults to the value of DD_SERVICE.

Application naming guidelines

Your application name (the value of DD_LLMOBS_ML_APP) must be a lowercase Unicode string. It may contain the characters listed below:

  • Alphanumerics
  • Underscores
  • Minuses
  • Colons
  • Periods
  • Slashes

The name can be up to 193 characters long and may not contain contiguous or trailing underscores.

Tracing spans

To trace a span, use ddtrace.llmobs.decorators.<SPAN_KIND>() as a function decorator (for example, llmobs.decorators.task() for a task span) for the function you’d like to trace. For a list of available span kinds, see the Span Kinds documentation. For more granular tracing of operations within functions, see Tracing spans using inline methods.

LLM span

Note: If you are using any LLM providers or frameworks that are supported by Datadog’s LLM integrations, you do not need to manually start a LLM span to trace these operations.

To trace an LLM span, use the function decorator ddtrace.llmobs.decorators.llm().

Arguments

model_name
required - string
The name of the invoked LLM.
name
optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
model_provider
optional - string - default: "custom"
session_id
optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
ml_app
optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

from ddtrace.llmobs.decorators import llm

@llm(model_name="claude", name="invoke_llm", model_provider="anthropic")
def llm_call():
    completion = ... # user application logic to invoke LLM
    return completion

Workflow span

To trace a workflow span, use the function decorator ddtrace.llmobs.decorators.workflow().

Arguments

name
optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
session_id
optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
ml_app
optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

from ddtrace.llmobs.decorators import workflow

@workflow
def process_message():
    ... # user application logic
    return 

Agent span

To trace an agent span, use the function decorator ddtrace.llmobs.decorators.agent().

Arguments

name
optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
session_id
optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
ml_app
optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

from ddtrace.llmobs.decorators import agent

@agent(name="react_agent")
def run_agent():
    ... # user application logic
    return 

Tool span

To trace a tool span, use the function decorator ddtrace.llmobs.decorators.tool().

Arguments

name
optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
session_id
optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
ml_app
optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

from ddtrace.llmobs.decorators import tool

@tool(name="get_current_weather")
def call_weather_api():
    ... # user application logic
    return 

Task span

To trace a task span, use the function decorator LLMObs.task().

Arguments

name
optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
session_id
optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
ml_app
optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

from ddtrace.llmobs.decorators import task

@task
def sanitize_input():
    ... # user application logic
    return 

Embedding span

To trace an embedding span, use the function decorator LLMObs.embedding().

Note: Annotating an embedding span’s input requires different formatting than other span types. See Annotating a span for more details on how to specify embedding inputs.

Arguments

model_name
required - string
The name of the invoked LLM.
name
optional - string
The name of the operation. If not provided, name is set to the name of the traced function.
model_provider
optional - string - default: "custom"
session_id
optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
ml_app
optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

from ddtrace.llmobs.decorators import embedding

@embedding(model_name="text-embedding-3", model_provider="openai")
def perform_embedding():
    ... # user application logic
    return 

Retrieval span

To trace a retrieval span, use the function decorator ddtrace.llmobs.decorators.retrieval().

Note: Annotating a retrieval span’s output requires different formatting than other span types. See Annotating a span for more details on how to specify retrieval outputs.

Arguments

name
optional - string
The name of the operation. If not provided, name defaults to the name of the traced function.
session_id
optional - string
The ID of the underlying user session. See Tracking user sessions for more information.
ml_app
optional - string
The name of the ML application that the operation belongs to. See Tracing multiple applications for more information.

Example

from ddtrace.llmobs.decorators import retrieval

@retrieval(name="get_relevant_docs")
def similarity_search():
    ... # user application logic
    return 

Tracking user sessions

Session tracking allows you to associate multiple interactions with a given user. When starting a root span for a new trace or span in a new process, specify the session_id argument with the string ID of the underlying user session:

from ddtrace.llmobs.decorators import workflow

@workflow(session_id="<SESSION_ID>")
def process_message():
    ... # user application logic
    return 

Annotating a span

The SDK provides the method LLMObs.annotate() to annotate spans with inputs, outputs, and metadata.

Arguments

The LLMObs.annotate() method accepts the following arguments:

span
optional - Span - default: the current active span
The span to annotate. If span is not provided (as when using function decorators), the SDK annotates the current active span.
input_data
optional - JSON serializable type or list of dictionaries
Either a JSON serializable type (for non-LLM spans) or a list of dictionaries with this format: {"role": "...", "content": "..."} (for LLM spans). Note: Embedding spans are a special case and require a string or a dictionary (or a list of dictionaries) with this format: {"text": "..."}.
output_data
optional - JSON serializable type or list of dictionaries
Either a JSON serializable type (for non-LLM spans) or a list of dictionaries with this format: {"role": "...", "content": "..."} (for LLM spans). Note: Retrieval spans are a special case and require a string or a dictionary (or a list of dictionaries) with this format: {"text": "...", "name": "...", "score": float, "id": "..."}.
metadata
optional - dictionary
A dictionary of JSON serializable key-value pairs that users can add as metadata information relevant to the input or output operation described by the span (model_temperature, max_tokens, top_k, and so on).
metrics
optional - dictionary
A dictionary of JSON serializable keys and numeric values that users can add as metrics relevant to the operation described by the span (input_tokens, output_tokens, total_tokens, and so on).
tags
optional - dictionary
A dictionary of JSON serializable key-value pairs that users can add as tags regarding the span’s context (session, environment, system, versioning, and so on). For more information about tags, see Getting Started with Tags.

Example

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import embedding, llm, retrieval, workflow

@llm(model="model_name", model_provider="model_provider")
def llm_call(prompt):
    resp = ... # llm call here
    LLMObs.annotate(
        span=None,
        input_data=[{"role": "user", "content": "Hello world!"}],
        output_data=[{"role": "assistant", "content": "How can I help?"}],
        metadata={"temperature": 0, "max_tokens": 200},
        metrics={"input_tokens": 4, "output_tokens": 6, "total_tokens": 10},
        tags={"host": "host_name"},
    )
    return resp

@workflow
def process_message(prompt):
    resp = llm_call_inline(prompt)
    LLMObs.annotate(
        span=None,
        input_data="prompt",
        output_data="output",
        tags={"host": "host_name"},
    )
    return resp

@embedding(model_name="text-embedding-3", model_provider="openai")
def perform_embedding():
    ... # user application logic
    LLMObs.annotate(
        span=None,
        input_data={"text": "Hello world!"},
        output_data=[0.0023064255, -0.009327292, ...],
        metrics={"input_tokens": 4},
        tags={"host": "host_name"},
    )
    return

@retrieval(name="get_relevant_docs")
def similarity_search():
    ... # user application logic
    LLMObs.annotate(
        span=None,
        input_data="Hello world!",
        output_data=[{"text": "Hello world is ...", "name": "Hello, World! program", "id": "document_id", "score": 0.9893}],
        tags={"host": "host_name"},
    )
    return

Evaluations

The LLM Observability SDK provides the methods LLMObs.export_span() and LLMObs.submit_evaluation() to help your traced LLM application submit evaluations to LLM Observability.

Exporting a span

LLMObs.export_span() can be used to extract the span context from a span. You’ll need to use this method to associate your evaluation with the corresponding span.

Arguments

The LLMObs.export_span() method accepts the following argument:

span
optional - Span
The span to extract the span context (span and trace IDs) from. If not provided (as when using function decorators), the SDK exports the current active span.

Example

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import llm

@llm(model_name="claude", name="invoke_llm", model_provider="anthropic")
def llm_call():
    completion = ... # user application logic to invoke LLM
    span_context = LLMObs.export_span(span=None)
    return completion

Submit evaluations

LLMObs.submit_evaluation() can be used to submit your custom evaluation associated with a given span.

Arguments

The LLMObs.submit_evaluation() method accepts the following arguments:

span_context
required - dictionary
The span context to associate the evaluation with. This should be the output of LLMObs.export_span().
label
required - string
The name of the evaluation.
metric_type
required - string
The type of the evaluation. Must be one of “categorical” or “score”.
value
required - string or numeric type
The value of the evaluation. Must be a string (for categorical metric_type) or integer/float (for score metric_type).
tags
optional - dictionary
A dictionary of string key-value pairs that users can add as tags regarding the evaluation. For more information about tags, see Getting Started with Tags.

Example

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import llm

@llm(model_name="claude", name="invoke_llm", model_provider="anthropic")
def llm_call():
    completion = ... # user application logic to invoke LLM
    span_context = LLMObs.export_span(span=None)
    LLMObs.submit_evaluation(
        span_context,
        label="harmfulness",
        metric_type="score",
        value=10,
        tags={"evaluation_provider": "ragas"},
    )
    return completion

Advanced tracing

Tracing spans using inline methods

For each span kind, the ddtrace.llmobs.LLMObs class provides a corresponding inline method to automatically trace the operation a given code block entails. These methods have the same argument signature as their function decorator counterparts, with the addition that name defaults to the span kind (llm, workflow, and so on) if not provided. These methods can be used as context managers to automatically finish the span once the enclosed code block is completed.

Example

from ddtrace.llmobs import LLMObs

def process_message():
    with LLMObs.workflow(name="process_message", session_id="<SESSION_ID>", ml_app="<ML_APP>") as workflow_span:
        ... # user application logic
    return

Persisting a span across contexts

To manually start and stop a span across different contexts or scopes:

  1. Start a span manually using the same methods (for example, the LLMObs.workflow method for a workflow span), but as a plain function call rather than as a context manager.
  2. Pass the span object as an argument to other functions.
  3. Stop the span manually with the span.finish() method. Note: the span must be manually finished, otherwise it will not be submitted.

Example

from ddtrace.llmobs import LLMObs

def process_message():
    workflow_span = LLMObs.workflow(name="process_message")
    ... # user application logic
    separate_task(workflow_span)
    return

def separate_task(workflow_span):
    ... # user application logic
    workflow_span.finish()
    return

Flushing in serverless environments

LLMObs.flush() is a blocking function that submits all buffered LLM Observability data to the Datadog backend. This can be useful in serverless environments to prevent an application from exiting until all LLM Observability traces are submitted.

Tracing multiple applications

The SDK supports tracking multiple LLM applications from the same service.

You can configure an environment variable DD_LLMOBS_ML_APP to the name of your LLM application, which all generated spans are grouped into by default.

To override this configuration and use a different LLM application name for a given root span, pass the ml_app argument with the string name of the underlying LLM application when starting a root span for a new trace or a span in a new process.

from ddtrace.llmobs.decorators import workflow

@workflow(name="process_message", ml_app="<NON_DEFAULT_ML_APP_NAME>")
def process_message():
    ... # user application logic
    return

Distributed tracing

The SDK supports tracing across distributed services or hosts. Distributed tracing works by propagating span information across web requests.

The ddtrace library provides some out-of-the-box integrations that support distributed tracing for popular web framework and HTTP libraries. If your application makes requests using these supported libraries, you can enable distributed tracing by running:

from ddtrace import patch
patch(<INTEGRATION_NAME>=True)

If your application does not use any of these supported libraries, you can enable distributed tracing by manually propagating span information to and from HTTP headers. The SDK provides the helper methods LLMObs.inject_distributed_headers() and LLMObs.activate_distributed_headers() to inject and activate tracing contexts in request headers.

Injecting distributed headers

The LLMObs.inject_distributed_headers() method takes a span and injects its context into the HTTP headers to be included in the request. This method accepts the following arguments:

request_headers
required - dictionary
The HTTP headers to extend with tracing context attributes.
span
optional - Span - default: The current active span.
The span to inject its context into the provided request headers. Any spans (including those with function decorators), this defaults to the current active span.

Activating distributed headers

The LLMObs.activate_distributed_headers() method takes HTTP headers and extracts tracing context attributes to activate in the new service.

Note: You must call LLMObs.activate_distributed_headers() before starting any spans in your downstream service. Spans started prior (including function decorator spans) do not get captured in the distributed trace.

This method accepts the following argument:

request_headers
required - dictionary
The HTTP headers to extract tracing context attributes.

Example

client.py

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import workflow

@workflow
def client_send_request():
    request_headers = {}
    request_headers = LLMObs.inject_distributed_headers(request_headers)
    send_request("<method>", request_headers)  # arbitrary HTTP call

server.py

from ddtrace.llmobs import LLMObs

def server_process_request(request):
    LLMObs.activate_distributed_headers(request.headers)
    with LLMObs.task(name="process_request") as span:
        pass  # arbitrary server work