Ingestion Sampling with OpenTelemetry

Overview

OpenTelemetry SDKs and the OpenTelemetry Collector provide sampling capabilities, as ingesting 100% of traces is often unnecessary to gain visibility into the health of your applications. Configure sampling rates before sending traces to Datadog to ingest data that is most relevant to your business and observability goals, while controlling and managing overall costs.

This document demonstrates two primary methods for sending traces to Datadog with OpenTelemetry:

Note: Datadog doesn’t support running the OpenTelemetry Collector and the Datadog Agent on the same host.

Using the OpenTelemetry Collector

With this method, the OpenTelemetry Collector receives traces from OpenTelemetry SDKs and exports them to Datadog using the Datadog Exporter. In this scenario, APM trace metrics are computed by the Datadog Connector:

OpenTelemetry APM Metrics computation using the Collector

Choose this method if you require the advanced processing capabilities of the OpenTelemetry Collector, such as tail-based sampling. To configure the Collector to receive traces, follow the instructions on OpenTelemetry Collector and Datadog Exporter.

Using Datadog Agent OTLP ingestion

With this method, the Datadog Agent receives traces directly from OpenTelemetry SDKs using the OTLP protocol. This allows you to send traces to Datadog without running a separate OpenTelemetry Collector service. In this scenario, APM trace metrics are computed by the Agent:

OpenTelemetry APM Metrics computation using the Datadog Agent

Choose this method if you prefer a simpler setup without the need for a separate OpenTelemetry Collector service. To configure the Datadog Agent to receive traces using OTLP, follow the instructions on OTLP Ingestion by the Datadog Agent.

Reducing ingestion volume

With OpenTelemetry, you can configure sampling both in the OpenTelemetry libraries and in the OpenTelemetry Collector:

  • Head-based sampling in the OpenTelemetry SDKs
  • Tail-based sampling in the OpenTelemetry Collector
  • Probabilistic sampling in the Datadog Agent

Head-based sampling

At the SDK level, you can implement head-based sampling. This is when the sampling decision is made at the beginning of the trace. This type of sampling is particularly useful for high-throughput applications, where you have a clear understanding of which traces are most important to ingest and want to make sampling decisions early in the tracing process.

Configuring

To configure head-based sampling, use the TraceIdRatioBased or ParentBased samplers provided by the OpenTelemetry SDKs. These allow you to implement deterministic head-based sampling based on the trace_id at the SDK level.

Considerations

Head-based sampling affects the computation of APM metrics. Only sampled traces are sent to the OpenTelemetry Collector or Datadog Agent, which perform metrics computation.

To approximate unsampled metrics from sampled metrics, use formulas and functions with the sampling rate configured in the SDK.

Use the ingestion volume control guide to read more about the implications of setting up trace sampling on trace analytics monitors and metrics from spans.

Tail-based sampling

At the OpenTelemetry Collector level, you can do tail-based sampling, which allows you to define more advanced rules to maintain visibility over traces with errors or high latency.

Configuring

To configure tail-based sampling, use the Tail Sampling Processor or Probabilistic Sampling Processor to sample traces based on a set of rules at the collector level.

Considerations

A limitation of tail-based sampling is that all spans for a given trace must be received by the same collector instance for effective sampling decisions. If a trace is distributed across multiple collector instances, and tail-based sampling is used, some parts of that trace may not be sent to Datadog.

To ensure that APM metrics are computed based on 100% of the applications’ traffic while using collector-level tail-based sampling, use the Datadog Connector.

The Datadog Connector is available starting v0.83.0. Read Switch from Datadog Processor to Datadog Connector for OpenTelemetry APM Metrics if migrating from an older version.

See the ingestion volume control guide for information about the implications of setting up trace sampling on trace analytics monitors and metrics from spans.

Probabilistic sampling

When using Datadog Agent OTLP ingest, a probabilistic sampler is available starting with Agent v7.44.0.

Configuring

To configure probabilistic sampling, either:

  • Set the DD_OTLP_CONFIG_TRACES_PROBABILISTIC_SAMPLER_SAMPLING_PERCENTAGE environment variable, or

  • Add the following YAML to your Agent’s configuration file:

    otlp_config:
      # ...
      traces:
        probabilistic_sampler:
          sampling_percentage: 50 #In this example, 50% of traces are captured.
    

Considerations

The probabilistic sampler ignores spans for which the sampling priority is set at the SDK level. Spans not captured by the probabilistic sampler may still be captured by the Datadog Agent’s error and rare samplers.

Monitoring ingested volumes in Datadog

Use the APM Estimated Usage dashboard and the datadog.estimated_usage.apm.ingested_bytes metric to get visibility into your ingested volumes over a specific time period. Filter the dashboard to specific environments and services to see which services are responsible for the largest shares of the ingested volume.

If the ingestion volume is higher than expected, consider adjusting your sampling rates.

Further reading