Understand Datadog retention policy to efficiently retain trace data
Ingesting and retaining the traces you care about
Most traces generated by your applications are repetitive, and it’s not necessarily relevant to ingest and retain them all. For successful requests, retaining a representative sample of your applications’ traffic is enough, since you can’t possibly scan through dozens of individual traced requests every second.
What’s most important are the traces that contain symptoms of potential issues in your infrastructure, that is, traces with errors or unusual latency. In addition, for specific endpoints that are critical to your business, you might want to retain 100% of the traffic, to ensure that you are able to investigate and troubleshoot any customer problem in great detail.
How Datadog’s retention policy helps you retain what matters
Datadog provides two main ways of retaining data past 15 minutes:
Diversity sampling algorithm: Intelligent retention filter
By default, the Intelligent retention filter keeps a representative selection of traces without requiring you to create dozens of custom retention filters.
It keeps at least one span (and the associated distributed trace) for each combination of environment
, service
, operation
, and resource
every 15 minutes at most for the p75
, p90
, and p95
latency percentiles, as well as a representative selection of errors, for each distinct response status code.
To learn more, read the Intelligent retention filter documentation.
Tag-based retention filters
Tag-based retention filters provide the flexibility to keep traces that are the most critical to your business. When indexing spans with retention filters, the associated trace is also stored, which ensures that you keep visibility over the entire request and its distributed context.
Searching and analyzing indexed span data effectively
The set of data captured by diversity sampling is not uniformly sampled (that is, it is not proportionally representative of the full traffic). It is biased towards errors and high latency traces. If you want to build analytics only on top of a uniformly sampled dataset, exclude these spans that are sampled for diversity reasons by adding the -retained_by:diversity_sampling
query parameter in the Trace Explorer.
For example, to measure the number of checkout operations grouped by merchant tier on your application, excluding the diversity sampling dataset ensures that you perform this analysis on top of a representative set of data, and so proportions of basic
, enterprise
, and premium
checkouts are realistic:
On the other hand, if you want to measure the number of unique merchants by merchant tier, include the diversity sampling dataset which might capture additional merchant IDs not caught by custom retention filters:
Additional helpful documentation, links, and articles: