---
title: Evaluation compatibility
description: Learn about the compatibility requirements for evaluations.
breadcrumbs: Docs > LLM Observability > Evaluations > Evaluation compatibility
---

# Evaluation compatibility

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site). ().
{% /alert %}

{% /callout %}

## Evaluation compatibility{% #evaluation-compatibility %}

The supported third party LLM providers are OpenAI, Azure OpenAI, Anthropic, Amazon Bedrock, Vertex AI, and AI Gateway.

### Managed evaluations{% #managed-evaluations %}

Managed evaluations are supported for the following configurations.

| Evaluation                                                                                                          | DD-trace version | LLM Provider | Applicable span |
| ------------------------------------------------------------------------------------------------------------------- | ---------------- | ------------ | --------------- |
| [Language Mismatch](https://docs.datadoghq.com/llm_observability/evaluations/managed_evaluations#language-mismatch) | Fully supported  | Self hosted  | All span kinds  |

### Custom LLM-as-a-judge evaluations{% #custom-llm-as-a-judge-evaluations %}

Custom LLM-as-a-judge evaluations are supported for the following configurations.

| Evaluation                                                                                                                             | DD-trace version | LLM Provider                  | Applicable span |
| -------------------------------------------------------------------------------------------------------------------------------------- | ---------------- | ----------------------------- | --------------- |
| [Boolean](https://docs.datadoghq.com/llm_observability/evaluations/custom_llm_as_a_judge_evaluations#define-the-evaluation-output)     | Fully supported  | All third party LLM providers | All span kinds  |
| [Score](https://docs.datadoghq.com/llm_observability/evaluations/custom_llm_as_a_judge_evaluations#define-the-evaluation-output)       | Fully supported  | All third party LLM providers | All span kinds  |
| [Categorical](https://docs.datadoghq.com/llm_observability/evaluations/custom_llm_as_a_judge_evaluations#define-the-evaluation-output) | Fully supported  | All third party LLM providers | All span kinds  |
| [JSON](https://docs.datadoghq.com/llm_observability/evaluations/custom_llm_as_a_judge_evaluations#define-the-evaluation-output)        | Fully supported  | All third party LLM providers | All span kinds  |

#### Template LLM-as-a-judge evaluations{% #template-llm-as-a-judge-evaluations %}

Existing templates for custom LLM-as-a-judge evaluations are supported for the following configurations.

| Evaluation                                                                                                                                                             | DD-trace version | LLM Provider                  | Applicable span |
| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------- | ----------------------------- | --------------- |
| [Failure to Answer](https://docs.datadoghq.com/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/template_evaluations#failure-to-answer)                 | Fully supported  | All third party LLM providers | All span kinds  |
| [Hallucination](https://docs.datadoghq.com/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/template_evaluations#hallucination)                         | Fully supported  | All third party LLM providers | LLM only        |
| [Sentiment](https://docs.datadoghq.com/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/template_evaluations#sentiment)                                 | Fully supported  | All third party LLM providers | All span kinds  |
| [Toxicity](https://docs.datadoghq.com/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/template_evaluations#toxicity)                                   | Fully supported  | All third party LLM providers | All span kinds  |
| [Prompt Injection](https://docs.datadoghq.com/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/template_evaluations#prompt-injection)                   | Fully supported  | All third party LLM providers | All span kinds  |
| [Topic Relevancy](https://docs.datadoghq.com/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/template_evaluations#topic-relevancy)                     | Fully supported  | All third party LLM providers | All span kinds  |
| [Tool Selection](https://docs.datadoghq.com/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/template_evaluations#tool-selection)                       | Fully supported  | All third party LLM providers | LLM only        |
| [Tool Argument Correctness](https://docs.datadoghq.com/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/template_evaluations#tool-argument-correctness) | Fully supported  | All third party LLM providers | LLM only        |
| [Goal Completeness](https://docs.datadoghq.com/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/template_evaluations#goal-completeness)                 | Fully supported  | All third party LLM providers | LLM only        |
