---
title: Evaluations
description: Learn how to configure Evaluations for your LLM application.
breadcrumbs: Docs > Agent Observability > Evaluations
---

# Evaluations

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com, us2.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site.md). ({% placeholder "user-datadog-site-name" /%}).
{% /alert %}

{% /callout %}

## Overview{% #overview %}

Agent Observability offers several ways to support evaluations. They can be configured by navigating to [AI Observability > Evaluations](https://app.datadoghq.com/llm/evaluations).

### Custom LLM-as-a-judge evaluations{% #custom-llm-as-a-judge-evaluations %}

[Custom LLM-as-a-judge evaluations](https://docs.datadoghq.com/llm_observability/evaluations/custom_llm_as_a_judge_evaluations.md) allow you to define your own evaluation logic using natural language prompts. You can create custom evaluations to assess subjective or objective criteria (like tone, helpfulness, or factuality) and run them at scale across your traces and spans.

### Managed evaluations{% #managed-evaluations %}

Datadog builds and supports [managed evaluations](https://docs.datadoghq.com/llm_observability/evaluations/managed_evaluations.md) to support common use cases. You can enable and configure them within the Agent Observability application.

### Submit end-user feedback{% #submit-end-user-feedback %}

[End-user feedback](https://docs.datadoghq.com/llm_observability/evaluations/end_user_feedback.md) lets you submit thumbs-up or thumbs-down ratings, accepted changes, free-text comments, and other user or agent feedback to Datadog. Feedback can be connected to spans, traces, sessions, or customer-defined entities with a feedback join key.

### Submit external evaluations{% #submit-external-evaluations %}

You can also submit [external evaluations](https://docs.datadoghq.com/llm_observability/evaluations/external_evaluations.md) using Datadog's API. Use this approach when you have your own evaluation system but want to centralize evaluation results within Datadog.

### Building custom evaluators{% #building-custom-evaluators %}

For developers building custom evaluators, see the [Evaluation Developer Guide](https://docs.datadoghq.com/llm_observability/guide/evaluation_developer_guide.md).

### Evaluation integrations{% #evaluation-integrations %}

Datadog also supports integrations with some 3rd party evaluation frameworks, such as [NeMo](https://docs.datadoghq.com/llm_observability/evaluations/submit_nemo_evaluations.md).

### Annotation Queues{% #annotation-queues %}

[Annotation Queues](https://docs.datadoghq.com/llm_observability/evaluations/annotation_queues.md) provide a structured workflow for systematic human review of LLM traces.

### Sensitive Data Scanner integration{% #sensitive-data-scanner-integration %}

In addition to evaluating the input and output of LLM requests, agents, workflows, or the application, Agent Observability integrates with [Sensitive Data Scanner](https://docs.datadoghq.com/security/sensitive_data_scanner.md), which helps prevent data leakage by identifying and redacting any sensitive information. For a list of the out-of-the-box rules included with Sensitive Data Scanner, see [Library Rules](https://docs.datadoghq.com/security/sensitive_data_scanner/scanning_rules/library_rules.md).

### Security{% #security %}

{% callout %}
##### Get real-time security guardrails for your AI apps and agents

AI Guard helps secure your AI apps and agents in real time against prompt injection, jailbreaking, tool misuse, and sensitive data exfiltration attacks. Try it today!

[JOIN THE PREVIEW](https://www.datadoghq.com/product-preview/ai-security/)
{% /callout %}

### Permissions{% #permissions %}

[`Agent Observability Write` permissions](https://docs.datadoghq.com/account_management/rbac/permissions.md#llm-observability) are necessary to configure evaluations.

### Retrieving spans{% #retrieving-spans %}

Agent Observability offers an [Export API](https://docs.datadoghq.com/llm_observability/evaluations/export_api.md) that you can use to retrieve spans for running external evaluations. This helps circumvent the need to keep track of evaluation-relevant data at execution time.

## Further Reading{% #further-reading %}

- [Track, compare, and optimize your LLM prompts with Datadog LLM Observability](https://www.datadoghq.com/blog/llm-prompt-tracking)
