---
title: Datasets
description: >-
  Using datasets in LLM Observability Experiments, including how to create,
  retrieve, and manage datasets, as well as information about versioning.
breadcrumbs: Docs > LLM Observability > Experiments > Datasets
---

# Datasets

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site). ().
{% /alert %}

{% /callout %}

In LLM Observability Experiments, a *dataset* is a collection of *inputs*, and *expected outputs* and *metadata* that represent scenarios you want to tests your agent on. Each dataset is associated with a *project*.

Each record in a dataset contains:

- **input** (required): Represents all the information that the agent can access in a task.
- **expected output** (optional): Also called *ground truth*, represents the ideal answer that the agent should output. You can use *expected output* to store the actual output of the app, as well as any intermediary results you want to assesss.
- **metadata** (optional): Contains any useful information to categorize the record and use for further analysis. For example: topics, tags, descriptions, notes.

Datasets enable systematic testing and regression detection by providing consistent evaluation scenarios across experiments.

### Creating a dataset{% #creating-a-dataset %}

You can create datasets from production data, CSV files, or manually construct them programmatically.

{% tab title="From CSV files" %}
To create a dataset from a CSV file, use `LLMObs.create_dataset_from_csv()`:

```python
# Create dataset from CSV
dataset = LLMObs.create_dataset_from_csv(
    csv_path="questions.csv",
    dataset_name="capitals-of-the-world",
    project_name="capitals-project",              # Optional: defaults to the project name from LLMObs.enable
    description="Geography quiz dataset",         # Optional: Dataset description
    input_data_columns=["question", "category"],  # Columns to use as input
    expected_output_columns=["answer"],           # Optional: Columns to use as expected output
    metadata_columns=["difficulty"],              # Optional: Additional columns as metadata
    csv_delimiter=","                             # Optional: Defaults to comma
)

# Example "questions.csv":
# question,category,answer,difficulty
# What is the capital of Japan?,geography,Tokyo,medium
# What is the capital of Brazil?,geography,Brasília,medium
```

**Notes**:

- CSV files must have a header row
- Maximum field size is 10MB
- All columns not specified in `input_data_columns` or `expected_output_columns` are automatically treated as metadata
- The dataset is automatically pushed to Datadog after creation

{% /tab %}

{% tab title="Manual creation" %}
To manually create a dataset, use `LLMObs.create_dataset()`:

```python
from ddtrace.llmobs import LLMObs

dataset = LLMObs.create_dataset(
    dataset_name="capitals-of-the-world",
    project_name="capitals-project", # optional, defaults to project_name used in LLMObs.enable
    description="Questions about world capitals",
    records=[
        {
            "input_data": {"question": "What is the capital of China?"},       # required, JSON or string
            "expected_output": "Beijing",                                      # optional, JSON or string
            "metadata": {"difficulty": "easy"}                                 # optional, JSON
        },
        {
            "input_data": {"question": "Which city serves as the capital of South Africa?"},
            "expected_output": "Pretoria",
            "metadata": {"difficulty": "medium"}
        }
    ]
)
# View dataset in Datadog UI
print(f"View dataset: {dataset.url}")
```

{% /tab %}

{% tab title="From production traces" %}
Add production traces to datasets manually through the UI or automatically with Automations.

**Manual selection (UI)**:

1. Navigate to [**AI Observability > Traces**](https://app.datadoghq.com/llm/traces). You can also add a new Automation from [Settings > Automations](https://app.datadoghq.com/llm/settings/automations).
1. Find a trace you want to include in a dataset.
1. Click **Add to Dataset**.
1. Choose an existing dataset or create a dataset.
1. The trace's input, output, and metadata are automatically extracted.

**Automatic routing (Automations)**:

Automations enable you to continuously route production traces to datasets based on configurable rules, keeping your datasets current with production behavior without manual intervention. Automation rules apply only to new traces generated after the rule is created, not to existing historical traces.

To set up automatic dataset updates:

1. Navigate to [**AI Observability > Traces**](https://app.datadoghq.com/llm/traces).
1. Apply filters to identify traces you want to route (evaluation failures, latency thresholds, specific applications). See the example queries in [Search Syntax](https://docs.datadoghq.com/logs/explorer/search_syntax/).
1. Click **Automate Query**.
1. Configure sampling rate (for example, 10% of matching traces).
1. Select **Add to Dataset** as the action.
1. Choose an existing dataset or create a dataset.

After creating an automation, manage it from [**AI Observability > Settings > Automations**](https://app.datadoghq.com/llm/settings/automations):

- **Enable/disable**: Control whether new traces are added to the dataset.
- **Edit**: Modify filters, sampling rates, or target datasets as your needs change.
- **Delete**: Remove automations that are no longer needed.

**Dataset limits:**

- Datasets populated by automations are capped at 20,000 records.
- These datasets are read-only to prevent accidental modification of automated data.
- To modify records, clone the dataset first.

**Example use cases for Automations:**

- Sample 10% of traces with failed evaluations to build a failure dataset.
- Collect edge cases where latency exceeds thresholds.
- Maintain a diverse dataset with stratified sampling across user segments.
- Automatically capture new failure patterns as they emerge in production.

{% /tab %}

### Retrieving a dataset{% #retrieving-a-dataset %}

To retrieve a project's existing dataset from Datadog:

```python
dataset = LLMObs.pull_dataset(
    dataset_name="capitals-of-the-world",
    project_name="capitals-project", # optional, defaults to the project name from LLMObs.enable
    version=1 # optional, defaults to the latest version
)

# Get dataset length
print(len(dataset))
```

#### Exporting a dataset to pandas{% #exporting-a-dataset-to-pandas %}

The Dataset class also provides the method `as_dataframe()`, which allows you to transform a dataset as a [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).

{% alert level="info" %}
[Pandas](https://pandas.pydata.org/docs/index.html) is required for this operation. To install pandas, `pip install pandas`.
{% /alert %}

```python
# Convert dataset to pandas DataFrame
df = dataset.as_dataframe()
print(df.head())

# DataFrame output with MultiIndex columns:
#                                   input_data     expected_output  metadata
#    question                       category       answer           difficulty
# 0  What is the capital of Japan?  geography      Tokyo            medium
# 1  What is the capital of Brazil? geography      Brasília         medium
```

The DataFrame has a MultiIndex structure with the following columns:

- `input_data`: Contains all input fields from `input_data_columns`
- `expected_output`: Contains all output fields from `expected_output_columns`
- `metadata`: Contains any additional fields from `metadata_columns`

### Dataset versioning{% #dataset-versioning %}

Datasets are automatically versioned to track changes over time. Versioning information enables reproducibility and allows experiments to reference specific dataset versions.

The `Dataset` object has a field, `current_version`, which corresponds to the latest version; previous versions are subject to a 90-day retention window.

Dataset versions start at `0`, and each new version increments the version by 1.

#### When new dataset versions are created{% #when-new-dataset-versions-are-created %}

A new dataset version is created when:

- Adding records
- Updating records (changes to `input` or `expected_output` fields)
- Deleting records

Dataset versions are **NOT** created for changes to `metadata` fields, or when updating the dataset name or description.

#### Version retention{% #version-retention %}

- The active version of a Dataset is retained for 3 years.
- Previous versions (**NOT** the content of `current_version`) are retained for 90 days.
- The 90-day retention period resets when a previous version is used — for example, when an experiment reads a version.
- After 90 consecutive days without use, a previous version is eligible for permanent deletion and may no longer be accessible.

**Example of version retention behavior**

After you publish `12`, `11` becomes a previous version with a 90-day window. After 25 days, you run an experiment with version `11`, which causes the 90-day window to **restart**. After another 90 days, during which you have not used version `11`, version `11` may be deleted.

### Accessing and managing dataset records{% #accessing-and-managing-dataset-records %}

You can access dataset records using standard Python indexing:

```python
# Get a single record
record = dataset[0]

# Get multiple records
records = dataset[1:3]

# Iterate through records
for record in dataset:
    print(record["input_data"])
```

The Dataset class provides methods to manage records: `append()`, `update()`, `delete()`. You need to `push()` changes to save the changes in Datadog.

```python
# Add a new record
dataset.append({
    "input_data": {"question": "What is the capital of Switzerland?"},
    "expected_output": "Bern",
    "metadata": {"difficulty": "easy"}
})

# Update an existing record
dataset.update(0, {
    "input_data": {"question": "What is the capital of China?"},
    "expected_output": "Beijing",
    "metadata": {"difficulty": "medium"}
})

# Delete a record
dataset.delete(1)  # Deletes the second record

# Save changes to Datadog
dataset.push()
```