---
title: Understand Datadog retention policy to efficiently retain trace data
description: Datadog, the leading service for cloud-scale monitoring.
breadcrumbs: >-
  Docs > APM > Tracing Guides > Understand Datadog retention policy to
  efficiently retain trace data
---

# Understand Datadog retention policy to efficiently retain trace data

## Ingesting and retaining the traces you care about{% #ingesting-and-retaining-the-traces-you-care-about %}

Most traces generated by your applications are repetitive, and it's not necessarily relevant to ingest and retain them all. For successful requests, retaining a **representative sample** of your applications' traffic is enough, since you can't possibly scan through dozens of individual traced requests every second.

What's most important are the traces that contain symptoms of potential issues in your infrastructure, that is, **traces with errors or unusual latency**. In addition, for specific **endpoints that are critical to your business**, you might want to retain 100% of the traffic, to ensure that you are able to investigate and troubleshoot any customer problem in great detail.

{% image
   source="https://datadog-docs.imgix.net/images/tracing/guide/leveraging_diversity_sampling/relevant_traces.f0b48848bbcaebbcd9c6d1d6471b2190.png?auto=format"
   alt="Relevant traces are retained by storing a combination of high-latency traces, error traces, and business critical traces." /%}

## How Datadog's retention policy helps you retain what matters{% #how-datadogs-retention-policy-helps-you-retain-what-matters %}

Datadog provides two main ways of retaining data past 15 minutes:

- The Intelligent retention filter which is always enabled.
- Custom tag-based retention filters that you can manually configure.

{% image
   source="https://datadog-docs.imgix.net/images/tracing/guide/leveraging_diversity_sampling/datadog_captures_relevant_traces.e7e34caadb2f367478b9f3f9c3d25315.png?auto=format"
   alt="Datadog captures relevant error and latency traces through the Intelligent retention filter, and business critical traces through custom retention filters." /%}

### Diversity sampling algorithm: Intelligent retention filter{% #diversity-sampling-algorithm-intelligent-retention-filter %}

By default, the Intelligent retention filter keeps a representative selection of traces without requiring you to create dozens of custom retention filters.

It keeps at least one span (and the associated distributed trace) for each combination of `environment`, `service`, `operation`, and `resource` every 15 minutes at most for the `p75`, `p90`, and `p95` latency percentiles, as well as a representative selection of errors, for each distinct response status code.

To learn more, read the [Intelligent retention filter documentation](https://docs.datadoghq.com/tracing/trace_pipeline/trace_retention#datadog-intelligent-retention-filter).

### Tag-based retention filters{% #tag-based-retention-filters %}

[Tag-based retention filters](https://docs.datadoghq.com/tracing/trace_pipeline/trace_retention) provide the flexibility to keep traces that are the most critical to your business. When indexing spans with retention filters, the associated trace is also stored, which ensures that you keep visibility over the entire request and its distributed context.

## Searching and analyzing indexed span data effectively{% #searching-and-analyzing-indexed-span-data-effectively %}

The set of data captured by diversity sampling is **not uniformly sampled** (that is, it is not proportionally representative of the full traffic). It is biased towards errors and high latency traces. If you want to build analytics only on top of a uniformly sampled dataset, exclude these spans that are sampled for diversity reasons by adding the `-retained_by:diversity_sampling` query parameter in the Trace Explorer.

For example, to measure the number of checkout operations grouped by merchant tier on your application, **excluding the diversity sampling dataset** ensures that you perform this analysis on top of a representative set of data, and so proportions of `basic`, `enterprise`, and `premium` checkouts are realistic:

{% image
   source="https://datadog-docs.imgix.net/images/tracing/guide/leveraging_diversity_sampling/checkout_ops_by_tier.8854c7197001dfd74b42972dd2fef944.png?auto=format"
   alt="Number of checkout operations by tier, analytics that exclude diversity-sampled data" /%}

On the other hand, if you want to measure the number of unique merchants by merchant tier, **include the diversity sampling** dataset which might capture additional merchant IDs not caught by custom retention filters:

{% image
   source="https://datadog-docs.imgix.net/images/tracing/guide/leveraging_diversity_sampling/nb_merchants_by_merchant_tier.92f57c6535d04c8b381ad2a65c623d69.png?auto=format"
   alt="Number of unique merchants by tier. analytics that include diversity-sampled data" /%}

- [Controlling trace indexing for retention](https://docs.datadoghq.com/tracing/trace_pipeline/trace_retention/)
