Evaluations measure the quality of your LLM application’s responses.
While Agent Observability provides a few out-of-the-box evaluations for your traces, you can submit your own evaluations to Agent Observability in two ways: with Datadog’s SDK, or with the Agent Observability API. Use this naming convention for the evaluation label:
Evaluation labels must start with a letter.
Evaluation labels must only contain ASCII alphanumerics or underscores.
Other characters, including spaces, are converted to underscores.
Unicode is not supported.
Evaluation labels must not exceed 200 characters. Fewer than 100 is preferred from a UI perspective.
Evaluation labels must be unique for a given LLM application (ml_app) and organization.
For feedback submitted by your users such as thumbs-up or thumbs-down ratings, accepted changes, free-text comments, and other signals, see End-User Feedback.
Submitting external evaluations with the SDK
The Agent Observability SDK provides the methods LLMObs.submit_evaluation() and LLMObs.export_span() to help your traced LLM application submit external evaluations to Agent Observability. See the Python or Node.js SDK documentation for more details.
For building reusable, class-based evaluators with rich result metadata, see the Evaluation Developer Guide.
Example
fromddtrace.llmobsimportLLMObsfromddtrace.llmobs.decoratorsimportllmdefmy_harmfulness_eval(input:Any)->float:score=...# custom harmfulness evaluation logicreturnscore@llm(model_name="claude",name="invoke_llm",model_provider="anthropic")defllm_call():completion=...# user application logic to invoke LLM# joining an evaluation to a span via span ID and trace IDspan_context=LLMObs.export_span(span=None)LLMObs.submit_evaluation(span=span_context,ml_app="chatbot",label="harmfulness",metric_type="score",# can be score or categoricalvalue=my_harmfulness_eval(completion),tags={"type":"custom"},timestamp_ms=1765990800016,# optional, unix timestamp in millisecondsassessment="pass",# optional, "pass" or "fail"reasoning="it makes sense",# optional, judge llm reasoning)
Submitting external evaluations with the API
You can use the evaluations API provided by Agent Observability to send evaluations associated with spans, traces, or sessions to Datadog. See the Evaluations API for more details on the API specifications. For building reusable evaluators, see the Evaluation Developer Guide.
To submit evaluations for OpenTelemetry spans directly to the Evaluations API, you must include the source:otel tag in the evaluation. Additionally, span_id and trace_id values must be provided as decimal strings. If your OpenTelemetry instrumentation produces hexadecimal IDs, convert them to decimal before submitting. For example, in Python: str(int(hex_span_id, 16)).
Example
{"data":{"type":"evaluation_metric","id":"456f4567-e89b-12d3-a456-426655440000","attributes":{"metrics":[{"id":"cdfc4fc7-e2f6-4149-9c35-edc4bbf7b525","join_on":{"tag":{"key":"msg_id","value":"1123132"}},"ml_app":"weather-bot","timestamp_ms":1765990800016,"metric_type":"score","label":"Accuracy","score_value":3,"tags":["source:otel"],"assessment":"pass","reasoning":"it makes sense"}]}}}