- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
The LLM Observability SDK for Python enhances the observability of your Python-based LLM applications. The SDK supports Python versions 3.7 and newer. For information about LLM Observability’s integration support, see Auto Instrumentation.
You can install and configure tracing of various operations such as workflows, tasks, and API calls with function decorators or context managers. You can also annotate these traces with metadata for deeper insights into the performance and behavior of your applications, supporting multiple LLM services or models from the same environment.
For usage examples you can run from a Jupyter notebook, see the LLM Observability Jupyter Notebooks repository.
ddtrace
package must be installed:pip install ddtrace
Enable LLM Observability by running your application using the ddtrace-run
command and specifying the required environment variables.
Note: ddtrace-run
automatically turns on all LLM Observability integrations.
DD_SITE=<YOUR_DATADOG_SITE> DD_API_KEY=<YOUR_API_KEY> DD_LLMOBS_ENABLED=1 \
DD_LLMOBS_ML_APP=<YOUR_ML_APP_NAME> ddtrace-run <YOUR_APP_STARTUP_COMMAND>
DD_SITE
.DD_LLMOBS_ENABLED
1
or true
.DD_LLMOBS_ML_APP
DD_LLMOBS_AGENTLESS_ENABLED
false
1
or true
.DD_API_KEY
Enable LLM Observability programatically through the LLMObs.enable()
function instead of running with the ddtrace-run
command. Note: Do not use this setup method with the ddtrace-run
command.
from ddtrace.llmobs import LLMObs
LLMObs.enable(
ml_app="<YOUR_ML_APP_NAME>",
api_key="<YOUR_DATADOG_API_KEY>",
site="<YOUR_DATADOG_SITE>",
agentless_enabled=True,
)
ml_app
DD_LLMOBS_ML_APP
.integrations_enabled
- default: true
false
.agentless_enabled
false
True
. This configures the ddtrace
library to not send any data that requires the Datadog Agent. If not provided, this defaults to the value of DD_LLMOBS_AGENTLESS_ENABLED
.site
. If not provided, this defaults to the value of DD_SITE
.api_key
DD_API_KEY
.env
prod
, pre-prod
, staging
). If not provided, this defaults to the value of DD_ENV
.service
DD_SERVICE
.Enable LLM Observability by specifying the required environment variables in your command line setup and following the setup instructions for the Datadog-Python and Datadog-Extension AWS Lambda layers.
Note: Using the Datadog-Python
and Datadog-Extension
layers automatically turns on all LLM Observability integrations, and force flushes spans at the end of the Lambda function.
Your application name (the value of DD_LLMOBS_ML_APP
) must be a lowercase Unicode string. It may contain the characters listed below:
The name can be up to 193 characters long and may not contain contiguous or trailing underscores.
To trace a span, use ddtrace.llmobs.decorators.<SPAN_KIND>()
as a function decorator (for example, llmobs.decorators.task()
for a task span) for the function you’d like to trace. For a list of available span kinds, see the Span Kinds documentation. For more granular tracing of operations within functions, see Tracing spans using inline methods.
Note: If you are using any LLM providers or frameworks that are supported by Datadog’s LLM integrations, you do not need to manually start a LLM span to trace these operations.
To trace an LLM span, use the function decorator ddtrace.llmobs.decorators.llm()
.
model_name
name
name
defaults to the name of the traced function.model_provider
"custom"
session_id
ml_app
from ddtrace.llmobs.decorators import llm
@llm(model_name="claude", name="invoke_llm", model_provider="anthropic")
def llm_call():
completion = ... # user application logic to invoke LLM
return completion
To trace a workflow span, use the function decorator ddtrace.llmobs.decorators.workflow()
.
name
name
defaults to the name of the traced function.session_id
ml_app
from ddtrace.llmobs.decorators import workflow
@workflow
def process_message():
... # user application logic
return
To trace an agent span, use the function decorator ddtrace.llmobs.decorators.agent()
.
name
name
defaults to the name of the traced function.session_id
ml_app
from ddtrace.llmobs.decorators import agent
@agent
def react_agent():
... # user application logic
return
To trace a tool span, use the function decorator ddtrace.llmobs.decorators.tool()
.
name
name
defaults to the name of the traced function.session_id
ml_app
from ddtrace.llmobs.decorators import tool
@tool
def call_weather_api():
... # user application logic
return
To trace a task span, use the function decorator LLMObs.task()
.
name
name
defaults to the name of the traced function.session_id
ml_app
from ddtrace.llmobs.decorators import task
@task
def sanitize_input():
... # user application logic
return
To trace an embedding span, use the function decorator LLMObs.embedding()
.
Note: Annotating an embedding span’s input requires different formatting than other span types. See Annotating a span for more details on how to specify embedding inputs.
model_name
name
name
is set to the name of the traced function.model_provider
"custom"
session_id
ml_app
from ddtrace.llmobs.decorators import embedding
@embedding(model_name="text-embedding-3", model_provider="openai")
def perform_embedding():
... # user application logic
return
To trace a retrieval span, use the function decorator ddtrace.llmobs.decorators.retrieval()
.
Note: Annotating a retrieval span’s output requires different formatting than other span types. See Annotating a span for more details on how to specify retrieval outputs.
name
name
defaults to the name of the traced function.session_id
ml_app
from ddtrace.llmobs.decorators import retrieval
@retrieval
def get_relevant_docs(question):
context_documents = ... # user application logic
LLMObs.annotate(
input_data=question,
output_data = [
{"id": doc.id, "score": doc.score, "text": doc.text, "name": doc.name} for doc in context_documents
]
)
return
Session tracking allows you to associate multiple interactions with a given user. When starting a root span for a new trace or span in a new process, specify the session_id
argument with the string ID of the underlying user session, which is submitted as a tag on the span. Optionally, you can also specify the user_handle
, user_name
, and user_id
tags.
from ddtrace.llmobs.decorators import workflow
@workflow(session_id="<SESSION_ID>")
def process_user_message():
LLMObs.annotate(
...
tags = {"user_handle": "poodle@dog.com", "user_id": "1234", "user_name": "poodle"}
)
return
Tag | Description |
---|---|
session_id | The ID representing a single user session, for example, a chat session. |
user_handle | The handle for the user of the chat session. |
user_name | The name for the user of the chat session. |
user_id | The ID for the user of the chat session. |
The SDK provides the method LLMObs.annotate()
to annotate spans with inputs, outputs, and metadata.
The LLMObs.annotate()
method accepts the following arguments:
span
span
is not provided (as when using function decorators), the SDK annotates the current active span.input_data
{"role": "...", "content": "..."}
(for LLM spans). Note: Embedding spans are a special case and require a string or a dictionary (or a list of dictionaries) with this format: {"text": "..."}
.output_data
{"role": "...", "content": "..."}
(for LLM spans). Note: Retrieval spans are a special case and require a string or a dictionary (or a list of dictionaries) with this format: {"text": "...", "name": "...", "score": float, "id": "..."}
.metadata
model_temperature
, max_tokens
, top_k
, etc.).metrics
input_tokens
, output_tokens
, total_tokens
, time_to_first_token
, etc.). The unit for time_to_first_token
is in seconds, similar to the duration
metric which is emitted by default.tags
session
, env
, system
, and version
. For more information about tags, see Getting Started with Tags.from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import embedding, llm, retrieval, workflow
@llm(model_name="model_name", model_provider="model_provider")
def llm_call(prompt):
resp = ... # llm call here
LLMObs.annotate(
span=None,
input_data=[{"role": "user", "content": "Hello world!"}],
output_data=[{"role": "assistant", "content": "How can I help?"}],
metadata={"temperature": 0, "max_tokens": 200},
metrics={"input_tokens": 4, "output_tokens": 6, "total_tokens": 10},
tags={"host": "host_name"},
)
return resp
@workflow
def extract_data(document):
resp = llm_call(document)
LLMObs.annotate(
input_data=document,
output_data=resp,
tags={"host": "host_name"},
)
return resp
@embedding(model_name="text-embedding-3", model_provider="openai")
def perform_embedding():
... # user application logic
LLMObs.annotate(
span=None,
input_data={"text": "Hello world!"},
output_data=[0.0023064255, -0.009327292, ...],
metrics={"input_tokens": 4},
tags={"host": "host_name"},
)
return
@retrieval(name="get_relevant_docs")
def similarity_search():
... # user application logic
LLMObs.annotate(
span=None,
input_data="Hello world!",
output_data=[{"text": "Hello world is ...", "name": "Hello, World! program", "id": "document_id", "score": 0.9893}],
tags={"host": "host_name"},
)
return
The SDK’s LLMObs.annotate_context()
method returns a context manager that can be used to modify all auto-instrumented spans started while the annotation context is active.
The LLMObs.annotation_context()
method accepts the following arguments:
name
prompt
{"template": "...", "id": "...", "version": "...", "variables": {"variable_1": "...", ...}}
.Prompt
object from ddtrace.utils
and pass it in as the prompt
argument. Note: This argument only applies to LLM spans.tags
session
, env
, system
, and version
. For more information about tags, see Getting Started with Tags.from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import workflow
@workflow
def rag_workflow(user_question):
context_str = retrieve_documents(user_question).join(" ")
with LLMObs.annotation_context(
prompt = Prompt(
variables = {
"question": user_question,
"context": context_str,
},
template = "Please answer the..."
),
tags = {
"retrieval_strategy": "semantic_similarity"
},
name = "augmented_generation"
):
completion = openai_client.chat.completions.create(...)
return completion.choices[0].message.content
The LLM Observability SDK provides the methods LLMObs.export_span()
and LLMObs.submit_evaluation()
to help your traced LLM application submit evaluations to LLM Observability.
LLMObs.export_span()
can be used to extract the span context from a span. You’ll need to use this method to associate your evaluation with the corresponding span.
The LLMObs.export_span()
method accepts the following argument:
span
from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import llm
@llm(model_name="claude", name="invoke_llm", model_provider="anthropic")
def llm_call():
completion = ... # user application logic to invoke LLM
span_context = LLMObs.export_span(span=None)
return completion
LLMObs.submit_evaluation()
can be used to submit your custom evaluation associated with a given span.
The LLMObs.submit_evaluation()
method accepts the following arguments:
span_context
LLMObs.export_span()
.label
metric_type
categorical
or score
.value
metric_type==categorical
) or integer/float (metric_type==score
).tags
from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import llm
@llm(model_name="claude", name="invoke_llm", model_provider="anthropic")
def llm_call():
completion = ... # user application logic to invoke LLM
span_context = LLMObs.export_span(span=None)
LLMObs.submit_evaluation(
span_context,
label="harmfulness",
metric_type="score",
value=10,
tags={"evaluation_provider": "ragas"},
)
return completion
For each span kind, the ddtrace.llmobs.LLMObs
class provides a corresponding inline method to automatically trace the operation a given code block entails. These methods have the same argument signature as their function decorator counterparts, with the addition that name
defaults to the span kind (llm
, workflow
, etc.) if not provided. These methods can be used as context managers to automatically finish the span after the enclosed code block is completed.
from ddtrace.llmobs import LLMObs
def process_message():
with LLMObs.workflow(name="process_message", session_id="<SESSION_ID>", ml_app="<ML_APP>") as workflow_span:
... # user application logic
return
To manually start and stop a span across different contexts or scopes:
LLMObs.workflow
method for a workflow span), but as a plain function call rather than as a context manager.span.finish()
method. Note: the span must be manually finished, otherwise it is not submitted.from ddtrace.llmobs import LLMObs
def process_message():
workflow_span = LLMObs.workflow(name="process_message")
... # user application logic
separate_task(workflow_span)
return
def separate_task(workflow_span):
... # user application logic
workflow_span.finish()
return
LLMObs.flush()
is a blocking function that submits all buffered LLM Observability data to the Datadog backend. This can be useful in serverless environments to prevent an application from exiting until all LLM Observability traces are submitted.
The SDK supports tracing multiple LLM applications from the same service.
You can configure an environment variable DD_LLMOBS_ML_APP
to the name of your LLM application, which all generated spans are grouped into by default.
To override this configuration and use a different LLM application name for a given root span, pass the ml_app
argument with the string name of the underlying LLM application when starting a root span for a new trace or a span in a new process.
from ddtrace.llmobs.decorators import workflow
@workflow(name="process_message", ml_app="<NON_DEFAULT_ML_APP_NAME>")
def process_message():
... # user application logic
return
The SDK supports tracing across distributed services or hosts. Distributed tracing works by propagating span information across web requests.
The ddtrace
library provides some out-of-the-box integrations that support distributed tracing for popular web framework and HTTP libraries. If your application makes requests using these supported libraries, you can enable distributed tracing by running:
from ddtrace import patch
patch(<INTEGRATION_NAME>=True)
If your application does not use any of these supported libraries, you can enable distributed tracing by manually propagating span information to and from HTTP headers. The SDK provides the helper methods LLMObs.inject_distributed_headers()
and LLMObs.activate_distributed_headers()
to inject and activate tracing contexts in request headers.
The LLMObs.inject_distributed_headers()
method takes a span and injects its context into the HTTP headers to be included in the request. This method accepts the following arguments:
request_headers
span
The current active span.
The LLMObs.activate_distributed_headers()
method takes HTTP headers and extracts tracing context attributes to activate in the new service.
Note: You must call LLMObs.activate_distributed_headers()
before starting any spans in your downstream service. Spans started prior (including function decorator spans) do not get captured in the distributed trace.
This method accepts the following argument:
request_headers
client.py
from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import workflow
@workflow
def client_send_request():
request_headers = {}
request_headers = LLMObs.inject_distributed_headers(request_headers)
send_request("<method>", request_headers) # arbitrary HTTP call
server.py
from ddtrace.llmobs import LLMObs
def server_process_request(request):
LLMObs.activate_distributed_headers(request.headers)
with LLMObs.task(name="process_request") as span:
pass # arbitrary server work