---
title: Reading Experiment Results
description: Read and understand the results of your experiments.
breadcrumbs: Docs > Experiments > Reading Experiment Results
---

> For the complete documentation index, see [llms.txt](https://docs.datadoghq.com/llms.txt).

# Reading Experiment Results

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com, us2.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site.md). ({% placeholder "user-datadog-site-name" /%}).
{% /alert %}

{% /callout %}

## Overview{% #overview %}

After you [launch an experiment](https://docs.datadoghq.com/experiments/plan_and_launch_experiments.md), the experiment results page is the central place to analyze it. From this page, you can:

- **Measure metrics**: Review scorecards that compare control and treatment performance on your decision metrics.
- **Analyze results further**: Break metric lift down by user segments or plot lift over time to understand how your change performed across cohorts.
- **Inspect session replays**: Open individual user session replays to see how specific users experienced each variant.
- **Document learnings**: Record conclusions and takeaways for your team.

The following sections explain the metric scorecard and how to explore results.

## Experiment diagnostics{% #experiment-diagnostics %}

Datadog runs [experiment diagnostics](https://docs.datadoghq.com/experiments/diagnostics.md) with experiment analysis to check exposure data, metric data, randomization, and analysis health. Review diagnostic warnings before interpreting results, especially when a metric is missing, unexpectedly zero, or marked with a warning.

## Metric scorecard{% #metric-scorecard %}

The experiment results page shows a scorecard for each decision metric. Each row summarizes how one metric compared between the treatment and control variants.

{% image
   source="https://docs.dd-static.net/images/product_analytics/experiment/exp_reading_exps_scorecard.9f3ed3c51dd277cfd6030c2403700163.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/product_analytics/experiment/exp_reading_exps_scorecard.9f3ed3c51dd277cfd6030c2403700163.png?auto=format&fit=max&w=850&dpr=2 2x"
   alt="The experiment results overview showing a decision metrics table with control and treatment values, relative lift, and confidence interval bars for three metrics." /%}

### What the scorecard shows{% #what-the-scorecard-shows %}

For each metric, the scorecard displays:

- **Control and treatment values**: The average per-subject metric value in each variant.
- **Relative lift**: The percent change in that average between treatment and control.
- **Confidence interval**: A range of lift values consistent with the observed data, shown as a bar centered on the relative lift estimate.

The width and interpretation of the confidence interval depend on the experiment's configured [analysis method](https://docs.datadoghq.com/experiments/statistics/analysis_methods.md).

{% collapsible-section #how-metrics-are-calculated %}
#### How metrics are calculated

Datadog analyzes experiments at the **subject** level—the unit you configured when you set up the experiment, typically a user. Datadog computes a metric value for each enrolled subject (for example, revenue per user or whether the user completed a signup). These per-subject values form a distribution for each variant. Datadog's statistical engine then compares these distributions between control and treatment.

**Relative lift** measures how much the treatment shifted the average per-subject metric value compared to the control:

```
Relative lift = (Treatment − Control) / Control
```

A relative lift of 10% means the treatment group's average per-subject value is 10% higher than the control group's average. Negative lift means the treatment performed worse on average.
{% /collapsible-section %}

### Confidence intervals{% #confidence-intervals %}

The confidence interval is a range of lift values that are consistent with the observed data. The true lift could fall outside this range, but values inside the interval are more consistent with what the experiment measured.

- If the **entire interval is above zero**, the result is statistically significant in the positive direction. An improvement at least this large is unlikely to occur if there is no true effect.
- If the **entire interval is below zero**, the result is statistically significant in the negative direction. The treatment likely reduced the metric.
- If the **interval crosses zero**, the result is not statistically significant. The result is consistent with a true effect of zero.

Use the interval width as an indicator of precision: a narrower interval means a more precise estimate of lift; a wider interval means more uncertainty, often because the sample is smaller or the metric is noisy.

If [multiple testing correction](https://docs.datadoghq.com/experiments/statistics/multiple_testing_correction.md) is enabled, confidence intervals are wider because Datadog controls the family-wise error rate across the experiment's metric and treatment-variant comparisons.

### Global lift{% #global-lift %}

Experiments typically enroll only a subset of eligible users. Switch to the Global lift tab on the metric scorecard to estimate how rolling out the treatment to all eligible users would affect your overall metric totals. See [Global Lift](https://docs.datadoghq.com/experiments/global_lift.md) for the full methodology.

{% image
   source="https://docs.dd-static.net/images/product_analytics/experiment/exp_reading_global_lift.10947ba9154fbe39b1b5d7574caea4b1.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/product_analytics/experiment/exp_reading_global_lift.10947ba9154fbe39b1b5d7574caea4b1.png?auto=format&fit=max&w=850&dpr=2 2x"
   alt="The Global lift tab of the experiment scorecard showing control and treatment average metric values, coverage, and global lift for each decision metric." /%}

For each metric, the Global lift tab displays:

- **Control and treatment values**: The average per-subject metric value in each variant—the same values shown on the main scorecard tab.
- **Coverage**: The estimated proportion of your global metric total associated with the experiment's eligible population (excluding the effect of the experiment).
- **Global lift**: The estimated change to your overall metric totals if the treatment were released to all eligible users. Datadog calculates global lift as the product of coverage and the experiment's local (relative) lift.

## Exploring results{% #exploring-results %}

From the metric scorecard, hover over a metric name to view exploration options. The options available depend on your metric's data source.

### Chart{% #chart %}

Click Chart on any metric to open an interactive visualization of how a metric performed during the experiment. Within the chart, you can:

- **Split by segmentation properties**: Compare lift across cohorts such as device type or user tier. Properties reflect subject attributes at the initial time of exposure.
- **Plot lift over time**: See how lift trends across the experiment, plotted by calendar date or by days since each subject's first experiment exposure.
- **Add filters**: Narrow the chart to a specific subset of subjects.
- **Switch lift types**: Toggle between relative lift and absolute lift (Treatment − Control).

The example below shows a segment-level breakout by country. Use this view to understand when certain cohorts reacted differently to the new experience.

{% image
   source="https://docs.dd-static.net/images/product_analytics/experiment/exp_segment_view.0310c285096a01282426f2a823c60ca4.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/product_analytics/experiment/exp_segment_view.0310c285096a01282426f2a823c60ca4.png?auto=format&fit=max&w=850&dpr=2 2x"
   alt="Segment-level view of a metric split by Country ISO Code, showing a bar chart of relative lift and a data table with control and treatment values per country." /%}

### Copy SQL{% #copy-sql %}

For [warehouse-native metrics](https://docs.datadoghq.com/experiments/guide/connecting_a_data_warehouse.md), click Copy SQL to copy a simplified version of the pipeline logic Datadog used to calculate the result. Paste the query into your warehouse to audit the result or run follow-up analysis.

{% image
   source="https://docs.dd-static.net/images/product_analytics/experiment/exposure-sql/copy-sql.47232728bfa17d23f9894c2eaf6539d9.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/product_analytics/experiment/exposure-sql/copy-sql.47232728bfa17d23f9894c2eaf6539d9.png?auto=format&fit=max&w=850&dpr=2 2x"
   alt="The experiment results page with the Copy SQL button highlighted on a warehouse metric." /%}

### Replays{% #replays %}

For metrics built on [RUM](https://docs.datadoghq.com/real_user_monitoring.md) or [Product Analytics](https://docs.datadoghq.com/product_analytics.md) data, click Replays to watch [session replays](https://docs.datadoghq.com/session_replay.md) for users enrolled in the experiment. Review how subjects in each variant experienced the product.

## Further reading{% #further-reading %}

- [Make data-driven design decisions with Product Analytics](https://www.datadoghq.com/blog/datadog-product-analytics/)
- [Analytics Explorer](https://docs.datadoghq.com/product_analytics/analytics_explorer.md)
- [Experiment Diagnostics](https://docs.datadoghq.com/experiments/diagnostics.md)