Trace Retention and Ingestion

Trace Retention and Ingestion

With Tracing without Limits™, both the ingestion of traces to Datadog as well as the retention of those traces for 15 days are fully customizable.

To track or monitor your usage of Tracing without Limits™, see the Usage Metrics documentation.

Retention Filters

After spans have been ingested by Datadog, some will be kept for 15 days according to the retention filters that have been set on your account. By default, the only retention filter enabled will be the Intelligent Retention Filter, which retains error traces and traces from different latency distributions.

You can also create any number of additional tag-based retention filters for your services.

Note: Admin rights are required to create, modify, or disable retention filters.

In the Datadog app, on the Retention Filters tab, you can see the following information:

Filter Name
The name of each filter used to index spans. By default, the only filter is Datadog Intelligent Retention.
Filter Query
The tag-based query for each filter.
Retention Rate
A percentage from 0 to 100% of how many matching spans will be indexed by Datadog.
Spans Indexed
The number of spans indexed by the filter over the selected time period.
Last Updated
The date and user who last modified the retention filter.
Enabled toggle
Allows filters to be turned on and off.

In addition to the ‘Spans Indexed’ column per retention filter, there is also the metric datadog.estimated_usage.apm.indexed_spans that you can use to track spans indexed by retention filters.

For more information, refer to the Usage Metrics documentation, or see the dashboard available in your account.

Datadog intelligent retention filter

Intelligent retention is always active for your services, and it keeps a proportion of traces to help you monitor the health of your applications. All top level spans are indexed for the traces kept by the intelligent retention filter.

Intelligent Retention retains:

  • A representative selection of errors, ensuring error diversity (for example, response code 400s, 500s).
  • High Latency in the different quartiles p75, p90, p95.
  • All Resources with any traffic will have associated Traces in the past for any time window selection.
  • True maximum duration trace for each time window.

If there are specific tags, facets, or groups of traces that you want to investigate in detail, meaning you want to retain more than what Intelligent Retention retains, then create your own retention filter. For example, you might want to keep more than a representative selection of errors from your production environment. To ensure all production errors are retained and available for search and analytics for 15 days, create a 100 percent retention filter scoped to env:prod and status:error. As discussed below, this may have an impact on your bill.

Create your own Retention Filter

Span Indexing

To customize what spans are indexed and retained for 15 days, you can create, modify, and disable additional filters based on tags, and set a percentage of spans matching each filter to be retained. Any span that is retained will have its corresponding trace saved as well, and when it is viewed, the complete trace will be available. In order to be searched by tag in Search and Analytics, however, the span that directly contains the searched-upon tag must have been indexed by a retention filter.

  1. Name your filter.
  2. Set the tags you would like to index spans that match all of.
  3. Select whether this filter will retain any span that matches the criteria, or only top level spans.
  4. Set a percentage of spans matching these tags to be indexed.
  5. Save your new filter.

Note: Selecting “Top-Level Spans for Services Only” means the retention filter will retain only the selected proportion of top level spans of service and index them. Use this if you want to only index top level spans with matching tags. If “All Spans” is selected, the retention filter will retain the selected proportion of all spans of the distributed trace, irrespective of their hierarchy, and index them. This may have an impact on your bill, and the visual indicator within the app while setting a retention filter will inform you how many matching spans have been detected over the time period.

For example, you can create filters to keep all traces for:

  • Credit card transactions over $100.
  • High-priority customers using a mission-critical feature of your SaaS solution.
  • Specific versions of an online delivery service application.

Ingestion Controls

Ingestion Controls affect what traces are sent by your applications to Datadog. Stats and metrics are always calculated based on all traces, and are not impacted by ingestion controls.

Many instrumented services will send 100% of their traces to Datadog by default. The Datadog Agent will not drop or sample any spans by default at volumes of up to 50 traces per second. High-volume services or services that experience intermittent traffic are likelier to not send 100% of spans by default. This 50-traces-per-second default ingestion is based on Intelligent Retention and will keep diverse traces by default.

For the best experience, set services to send 100% of their traces so that all traces can be used for live search and analytics.

Note: If you are seeing numbers below 100% for Ingestion Rate, ensure you are using Agent 6.19+ or 7.19+ as these versions increased the default rate.

In the Datadog app, on the ‘Ingestion Controls’ tab, you can see the following information:

Root Service
The name of each service instrumented and sending traces to Datadog.
Data Ingested
Amount of data ingested by Datadog over the selected time period.
Ingestion Rate
A percentage from 0 to 100% of how many of the spans that are produced by the service are being ingested by Datadog.Any number lower than 100% means some traces are not being ingested by Datadog from the Datadog Agent, and these traces will be dropped by the Datadog Agent after metrics and stats have been calculated.
Ingestion Breakdown
A detailed breakdown of the destination of every trace generated by the service. See Ingestion Breakdown for more information.
Tracers Configuration
Shows Default unless changed by using the instructions in-app to configure the tracer. See Change the Default Ingestion Rate for more information. If all hosts with this service deployed are configured to send a specific volume of traces, this indicator displays Fully Configured. If only a portion of hosts with this service deployed are configured, the label will instead show Partially Configured.
Dropped Spans
The percentage of incoming spans dropped by the Datadog Agent.If this percent is higher than 0%, the service can be configured by clicking anywhere on the service row.See Change the Default Ingestion Rate for more information.
Traces Ingested per Second
Average number of traces per second ingested into Datadog for the service over the selected time period.
Spans Ingested
Number of spans ingested by Datadog over the selected time period.

In addition to the Data Ingestion column for each retention filter, there are also two metrics datadog.estimated_usage.apm.ingested_spans and datadog.estimated_usage.apm.ingested_bytes. These metrics are tagged by service and env, and top lists are available within the Trace Analytics Dashboard to show where the highest ingestion volumes are occurring. See the Usage Metrics documentation for more information.

Change the Default Ingestion Rate

Change the Data Ingestion Rate

To specify that a specific percentage of a service’s traffic should be sent, add a generated code snippet to your tracer configuration for that service.

  1. Select the service you want to change the ingested span percent for.
  2. Choose the service language.
  3. Choose the desired ingestion percentage.
  4. Apply the appropriate configuration generated from these choices to the indicated service and redeploy.
  5. Confirm on the Data Ingestion page that your new percentage has been applied.

In order to ingest 100% of your traces in Datadog for all services for live search and analytics as well as to have the most control with retention filters, Datadog recommends configuring all services to send 100% of traces by default.

To configure for 100% ingestion on every service instrumented with a Datadog tracing library, set the following environment variable in the tracer configuration:


Note: This may impact your bill if your total ingestion exceeds the included GBs. For more information, see the APM Billing page.

Ingestion Breakdown

The Ingestion Breakdown column breaks down the destination of all traces originating from the service. It can help you understand lower than expected ingestion rates and missing traces.

The breakdown is composed of the following parts:

  • Complete traces ingested (green): The percentage of traces that have been ingested by Datadog.

  • Complete traces not retained (gray): The percentage of traces that have intentionally not been forwarded to Datadog by the agent or the tracer. This can happen for one of two reasons depending on your configuration:

    1. By default, the agent and the tracers intelligently set the service ingestion rate. See Change the Default Ingestion Rate to configure this behavior.
    2. When you change the default ingestion rate to less than 100%.
  • Complete traces dropped by the tracer rate limiter (orange): When you choose to configure the ingestion rate of a service, you explicitly define the ingestion rate that your service should have. However, as a protection mechanism, a rate limiter set to 100 traces per second by default is automatically enabled. To configure this rate limiter, open a support ticket so we can guide you through the process.

  • Traces dropped due to the agent CPU limit (red): The agent has a configuration option allowing users to limit the usage of the CPU. After this limit is reached the agent will stop accepting traces from the tracers. Change the agent configuration to configure how much CPU to allocate to the agent.

Traces dropped before ingestion

You won’t get 100% trace ingestion if you have not set the environment variable configuration DD_TRACE_SAMPLE_RATE=1.0 for Tracing without Limits, and:

  • your applications generate above 50 traces per second;
  • your applications send intermittent traffic loads; or
  • your applications traces are large in size or otherwise have complicated trace payloads.

In this case, some traces will be dropped by the Datadog Agent after stats are computed, so that metrics calculated will be based on 100% of your traces.

If you are seeing ingestion rates below 100% within Datadog and would like to send all your traces, enable Tracing without Limits by setting the environment variable as described above. If you have questions, contact our support team.

App Analytics to Tracing Without Limits

Before October 20, 2020, Datadog offered App Analytics to index spans for performing analytics. While this is no longer the recommended setup configuration and is not needed to use Trace Search and Analytics, the legacy instructions are available within the App Analytics setup page.

All existing App Analytics filters have been automatically transitioned to Retention Filters. You can continue to use the unchanged filters or modify them as needed. Transitioned filters are marked with an i representing Legacy App Analytics Filters.

Note: Existing App Analytics filters can be edited within Datadog, but only by editing the transitioned retention filters. Legacy filters are read only on the settings page in-app.