---
title: Datasets
description: >-
  Using datasets in Agent Observability Experiments, including how to create,
  retrieve, and manage datasets, as well as information about versioning.
breadcrumbs: Docs > Agent Observability > Experiments > Datasets
---

> For the complete documentation index, see [llms.txt](https://docs.datadoghq.com/llms.txt).

# Datasets

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com, us2.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site.md). ({% placeholder "user-datadog-site-name" /%}).
{% /alert %}

{% /callout %}

In Agent Observability Experiments, a *dataset* is a collection of *inputs*, and *expected outputs* and *metadata* that represent scenarios you want to tests your agent on. Each dataset is associated with a *project*.

Each record in a dataset contains:

- **input** (required): Represents all the information that the agent can access in a task.
- **expected output** (optional): Also called *ground truth*, represents the ideal answer that the agent should output. You can use *expected output* to store the actual output of the app, as well as any intermediary results you want to assesss.
- **metadata** (optional): Contains any useful information to categorize the record and use for further analysis. For example: topics, tags, descriptions, notes.
- **id** (optional): A user-defined identifier for the record. Must be 128 characters or fewer and contain only letters, numbers, `_`, `-`, or `.`. If not provided, the SDK generates one automatically.

Datasets enable systematic testing and regression detection by providing consistent evaluation scenarios across experiments.

### Creating a dataset{% #creating-a-dataset %}

You can create datasets from production data, CSV files, or manually construct them programmatically.

{% tab title="From CSV files" %}
To create a dataset from a CSV file, use `LLMObs.create_dataset_from_csv()`:

```python
# Create dataset from CSV
dataset = LLMObs.create_dataset_from_csv(
    csv_path="questions.csv",
    dataset_name="capitals-of-the-world",
    project_name="capitals-project",              # Optional: defaults to the project name from LLMObs.enable
    description="Geography quiz dataset",         # Optional: Dataset description
    input_data_columns=["question", "category"],  # Columns to use as input
    expected_output_columns=["answer"],           # Optional: Columns to use as expected output
    metadata_columns=["difficulty"],              # Optional: Additional columns as metadata
    id_column="record_id",                        # Optional: Column to use as record IDs
    csv_delimiter=","                             # Optional: Defaults to comma
)

# Example "questions.csv":
# record_id,question,category,answer,difficulty
# japan-capital,What is the capital of Japan?,geography,Tokyo,medium
# brazil-capital,What is the capital of Brazil?,geography,Brasília,medium
```

**Notes**:

- CSV files must have a header row
- Maximum field size is 10MB
- All columns not specified in `input_data_columns`, `expected_output_columns`, or `id_column` are automatically treated as metadata
- The dataset is automatically pushed to Datadog after creation

{% /tab %}

{% tab title="Manual creation" %}
To manually create a dataset, use `LLMObs.create_dataset()`:

```python
from ddtrace.llmobs import LLMObs

dataset = LLMObs.create_dataset(
    dataset_name="capitals-of-the-world",
    project_name="capitals-project", # optional, defaults to project_name used in LLMObs.enable
    description="Questions about world capitals",
    records=[
        {
            "id": "china-capital",                                             # optional, user-defined record ID
            "input_data": {"question": "What is the capital of China?"},       # required, JSON or string
            "expected_output": "Beijing",                                      # optional, JSON or string
            "metadata": {"difficulty": "easy"}                                 # optional, JSON
        },
        {
            "input_data": {"question": "Which city serves as the capital of South Africa?"},
            "expected_output": "Pretoria",
            "metadata": {"difficulty": "medium"}
        }
    ]
)
# View dataset in Datadog UI
print(f"View dataset: {dataset.url}")
```

{% /tab %}

{% tab title="From production traces" %}
Add production traces to datasets manually through the UI or automatically with Automations.

**Manual selection (UI)**:

1. Navigate to [AI Observability > Traces](https://app.datadoghq.com/llm/traces). You can also add a new Automation from [Settings > Automations](https://app.datadoghq.com/llm/settings/automations).
1. Find a trace you want to include in a dataset.
1. Click Add to Dataset.
1. Choose an existing dataset or create a dataset.
1. The trace's input, output, and metadata are automatically extracted.

**Automatic routing (Automations)**:

{% alert level="info" %}
Automations apply going forward: new traces matching your rule are routed to the dataset as they arrive. Existing traces matching the filter are not added retroactively.
{% /alert %}

Automations enable you to continuously route production traces to datasets based on configurable rules, keeping your datasets current with production behavior without manual intervention.

To set up automatic dataset updates:

1. Navigate to [AI Observability > Traces](https://app.datadoghq.com/llm/traces).
1. Apply filters to identify traces you want to route (evaluation failures, latency thresholds, specific applications). See [Automation Rules > Supported filter fields](https://docs.datadoghq.com/llm_observability/monitoring/automation_rules.md#supported-filter-fields) for what's allowed.
1. Click Automate Query.
1. Configure sampling rate (for example, 10% of matching traces).
1. Select Add to Dataset as the action.
1. Choose an existing dataset or create a dataset.

After creating an automation, manage it from [AI Observability > Settings > Automations](https://app.datadoghq.com/llm/settings/automations):

- Enable/disable: Control whether new traces are added to the dataset.
- Edit: Modify filters, sampling rates, or target datasets as your needs change.
- Delete: Remove automations that are no longer needed.

**Dataset limits:**

- Datasets populated by automations are capped at 20,000 records.
- These datasets are read-only to prevent accidental modification of automated data.
- To modify records, clone the dataset first.

**Example use cases for Automations:**

- Sample 10% of traces with failed evaluations to build a failure dataset.
- Collect edge cases where latency exceeds thresholds.
- Maintain a diverse dataset with stratified sampling across user segments.
- Automatically capture new failure patterns as they emerge in production.

{% /tab %}

### Retrieving a dataset{% #retrieving-a-dataset %}

To retrieve a project's existing dataset from Datadog:

```python
dataset = LLMObs.pull_dataset(
    dataset_name="capitals-of-the-world",
    project_name="capitals-project", # optional, defaults to the project name from LLMObs.enable
    version=1 # optional, defaults to the latest version
)

# Get dataset length
print(len(dataset))
```

#### Exporting a dataset to pandas{% #exporting-a-dataset-to-pandas %}

The Dataset class also provides the method `as_dataframe()`, which allows you to transform a dataset as a [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).

{% alert level="info" %}
[Pandas](https://pandas.pydata.org/docs/index.html) is required for this operation. To install pandas, `pip install pandas`.
{% /alert %}

```python
# Convert dataset to pandas DataFrame
df = dataset.as_dataframe()
print(df.head())

# DataFrame output with MultiIndex columns:
#                                   input_data     expected_output  metadata
#    question                       category       answer           difficulty
# 0  What is the capital of Japan?  geography      Tokyo            medium
# 1  What is the capital of Brazil? geography      Brasília         medium
```

The DataFrame has a MultiIndex structure with the following columns:

- `input_data`: Contains all input fields from `input_data_columns`
- `expected_output`: Contains all output fields from `expected_output_columns`
- `metadata`: Contains any additional fields from `metadata_columns`

### Dataset versioning{% #dataset-versioning %}

Datasets are automatically versioned to track changes over time. Versioning information enables reproducibility and allows experiments to reference specific dataset versions.

The `Dataset` object has a field, `current_version`, which corresponds to the latest version; previous versions are subject to a 90-day retention window.

Dataset versions start at `0`, and each new version increments the version by 1.

#### When new dataset versions are created{% #when-new-dataset-versions-are-created %}

A new dataset version is created when:

- Adding records
- Updating records (changes to `input`, `expected_output`, or `metadata` fields)
- Deleting records

Dataset versions are **NOT** created when updating the dataset name or description.

#### Version retention{% #version-retention %}

- The active version of a Dataset is retained for 3 years.
- Previous versions (**NOT** the content of `current_version`) are retained for 90 days.
- The 90-day retention period resets when a previous version is used — for example, when an experiment reads a version.
- After 90 consecutive days without use, a previous version is eligible for permanent deletion and may no longer be accessible.

**Example of version retention behavior**

After you publish `12`, `11` becomes a previous version with a 90-day window. After 25 days, you run an experiment with version `11`, which causes the 90-day window to **restart**. After another 90 days, during which you have not used version `11`, version `11` may be deleted.

### Accessing and managing dataset records{% #accessing-and-managing-dataset-records %}

You can access dataset records using standard Python indexing:

```python
# Get a single record
record = dataset[0]

# Get multiple records
records = dataset[1:3]

# Iterate through records
for record in dataset:
    print(record["input_data"])
```

The Dataset class provides methods to manage records: `append()`, `update()`, `delete()`. You need to `push()` changes to save the changes in Datadog.

```python
# Add a new record
dataset.append({
    "id": "switzerland-capital",
    "input_data": {"question": "What is the capital of Switzerland?"},
    "expected_output": "Bern",
    "metadata": {"difficulty": "easy"}
})

# Update an existing record
dataset.update(0, {
    "input_data": {"question": "What is the capital of China?"},
    "expected_output": "Beijing",
    "metadata": {"difficulty": "medium"}
})

# Delete a record
dataset.delete(1)  # Deletes the second record

# Save changes to Datadog
dataset.push()
```

### Customizing the dataset table{% #customizing-the-dataset-table %}

When viewing a dataset's records, you can customize the table to quickly scan and compare records without expanding each one individually.

#### Column picker{% #column-picker %}

Use the column picker to toggle columns on or off and drag to reorder them.

#### Custom columns{% #custom-columns %}

Extract specific fields from your dataset records and display them as dedicated table columns. To add a custom column, type a field path in the Add Column input at the top of the table. You can add multiple custom columns and reorder them with drag-and-drop. Column configuration is saved to your browser's local storage per project.