Este producto no es compatible con el sitio Datadog seleccionado. ().
Esta página aún no está disponible en español. Estamos trabajando en su traducción. Si tienes alguna pregunta o comentario sobre nuestro actual proyecto de traducción, no dudes en ponerte en contacto con nosotros.
Session level evaluations help ensure your LLM-powered applications successfully achieve intended user outcomes across entire interactions. These managed evaluations analyze multi-turn sessions to assess higher-level goals and behaviors that span beyond individual spans, giving insight into overall effectiveness and user satisfaction.
Goal Completeness
An agent can call tools correctly but still fail to achieve the user’s intended goal. This evaluation checks whether your LLM chatbot can successfully carry out a full session by effectively meeting the user’s needs from start to finish. This completeness measure serves as a proxy for gauging user satisfaction over the course of a multi-turn interaction and is especially valuable for LLM chatbot applications.
Evaluation Stage
Evaluation Method
Evaluation Definition
Evaluated on LLM spans
Evaluated using LLM
Checks whether the agent resolved the user’s intent by analyzing full session spans. Runs only on sessions marked as completed.
How to Use
Goal Completeness is only available for OpenAI and Azure OpenAI.
To enable Goal Completeness evaluation, you need to instrument your application to track sessions and their completion status. This evaluation works by analyzing complete sessions to determine if all user intentions were successfully addressed.
The evaluation requires sending a span with a specific tag when the session ends. This signal allows the evaluation to identify session boundaries and trigger the completeness assessment:
For optimal evaluation accuracy and cost control, it is preferable to send a tag when the session is finished and configure the evaluation to run only on sessions with this tag. The evaluation returns a detailed breakdown including resolved intentions, unresolved intentions, and reasoning for the assessment. A session is considered incomplete if more than 50% of identified intentions remain unresolved.
fromddtrace.llmobsimportLLMObsfromddtrace.llmobs.decoratorsimportllm# Call this function whenever your session has ended@llm(model_name="model_name",model_provider="model_provider")defsend_session_ended_span(input_data,output_data)->None:"""Send a span to indicate the chat session has ended."""LLMObs.annotate(input_data=input_data,output_data=output_data,tags={"session_status":"completed"})
Replace session_status and completed with your preferred tag key and value.
The span should contain meaningful input_data and output_data that represent the final state of the session. This helps the evaluation understand the session’s context and outcomes when assessing completeness.
Goal completeness configuration
After instrumenting your application to send session-end spans, configure the evaluation to run only on sessions with your specific tag. This targeted approach ensures the evaluation analyzes complete sessions rather than partial interactions.
Go to the Goal Completeness settings
Configure the evaluation data:
Select spans as the data type since Goal Completeness runs on LLM spans which contain the full session history.
Choose the tag name associated with the span that corresponds to your session-end function (for example, send_session_ended_span).
In the tags section, specify the tag you configured in your instrumentation (for example, session_status:completed).
This configuration ensures evaluations run only on complete sessions. This provides accurate assessments of user intention resolution.
Troubleshooting
If evaluations are skipped, check that you are tagging session-end spans correctly.
Ensure your agent is configured to signal the end of a user request cycle.