Evaluations

이 제품은 선택한 Datadog 사이트에서 지원되지 않습니다. ().
이 페이지는 아직 한국어로 제공되지 않습니다. 번역 작업 중입니다.
현재 번역 프로젝트에 대한 질문이나 피드백이 있으신 경우 언제든지 연락주시기 바랍니다.

Overview

LLM Observability offers several ways to support evaluations. They can be configured by navigating to AI Observability > Evaluations.

Custom LLM-as-a-judge evaluations

Custom LLM-as-a-judge evaluations allow you to define your own evaluation logic using natural language prompts. You can create custom evaluations to assess subjective or objective criteria (like tone, helpfulness, or factuality) and run them at scale across your traces and spans.

Managed evaluations

Datadog builds and supports managed evaluations to support common use cases. You can enable and configure them within the LLM Observability application.

Submit external evaluations

You can also submit external evaluations using Datadog’s API. This mechanism is great if you have your own evaluation system, but would like to centralize that information within Datadog.

Evaluation integrations

Datadog also supports integrations with some 3rd party evaluation frameworks, such as Ragas and NeMo.

Sensitive Data Scanner integration

In addition to evaluating the input and output of LLM requests, agents, workflows, or the application, LLM Observability integrates with Sensitive Data Scanner, which helps prevent data leakage by identifying and redacting any sensitive information.

Security

Get real-time security guardrails for your AI apps and agents

AI Guard helps secure your AI apps and agents in real time against prompt injection, jailbreaking, tool misuse, and sensitive data exfiltration attacks. Try it today!

JOIN THE PREVIEW

Permissions

LLM Observability Write permissions are necessary to configure evaluations.

Retrieving spans

LLM Observability offers an Export API that you can use to retrieve spans for running external evaluations. This helps circumvent the need to keep track of evaluation-relevant data at execution time.