Ingestion Sampling with OpenTelemetry
Cette page n'est pas encore disponible en français, sa traduction est en cours.
Si vous avez des questions ou des retours sur notre projet de traduction actuel,
n'hésitez pas à nous contacter.
Overview
If your applications and services are instrumented with OpenTelemetry libraries, you can:
In the first and second scenario, APM RED metrics (request/errors counts and latency distributions by service, operation and resource) are computed in the Datadog Exporter. In the third case, the Datadog Agent computes these metrics.
Both APM metrics and distributed traces are useful for you to monitor your application performance. Metrics are useful to spot increases in latency or error rates for specific resources while distributed traces allow you to drill down to the individual request level.
Why sampling is useful
The Datadog tracing libraries, the Datadog Agent, the OpenTelemetry SDKs, and the OpenTelemetry Collector all provide sampling capabilities because for most services, ingesting 100% of the traces is unnecessary in order to gain visibility into the health of your applications.
Configuring sampling rates before sending traces to Datadog allows you to:
- Ingest the data that is most relevant to your business and your observability goals.
- Reduce network costs by avoiding sending unused trace data to the Datadog platform.
- Control and manage your overall costs.
Reducing your ingestion volume
With OpenTelemetry, you can configure sampling both in the OpenTelemetry libraries and in the OpenTelemetry collector:
- Head-based Sampling in the OpenTelemetry SDKs
- Tail-based Sampling in the OpenTelemetry Collector
SDK-level sampling
At the SDK level, you can implement head-based sampling, which is when the sampling decision is made at the beginning of the trace. This type of sampling is particularly useful for high throughput applications, for which you know that you do not need visibility over 100% of the traffic to monitor the application health. It can also help control the overhead introduced by OpenTelemetry.
TraceIdRatioBased and ParentBased are the SDK’s built-in samplers that allow you to implement deterministic head-based sampling based on the trace_id
at the SDK level.
With head-based sampling, the APM metrics are computed on the sampled traffic, since only the sampled traffic is sent to the OpenTelemetry Collector or Datadog Agent, which is where the metrics calculation is done.
To get accurate stats, you can upscale the metrics by using formulas and functions in Datadog dashboards and monitors, provided that you know the configured sampling rate in the SDK.
Use the ingestion volume control guide to read more about the implications of setting up trace sampling on trace analytics monitors and metrics from spans.
Collector-level sampling
At the OpenTelemetry collector level, you can do tail-based sampling, which allows you to define more advanced rules to keep accrued visibility over error or high latency traces.
The Tail Sampling Processor and Probabilistic Sampling Processor allow you to sample traces based on a set of rules at the collector level.
Note: Tail sampling’s main limitation is that all spans for a given trace must be received by the same collector instance for effective sampling decisions. If the trace is distributed across multiple collector instances, there’s a risk that some parts of a trace are dropped whereas some other parts of the same trace are sent to Datadog.
To ensure that APM metrics are computed based on 100% of the applications’ traffic while using collector-level tail-based sampling, use the Datadog Connector.
See the ingestion volume control guide for information about the implications of setting up trace sampling on trace analytics monitors and metrics from spans.
Sampling with the Datadog Agent
When using Datadog Agent OTLP Ingest, a probabilistic sampler is available starting from Agent version 7.44.0. Configure it using the DD_OTLP_CONFIG_TRACES_PROBABILISTIC_SAMPLER_SAMPLING_PERCENTAGE
environment variable, or set the following YAML in your Agent’s configuration file:
otlp_config:
# ...
traces:
probabilistic_sampler:
sampling_percentage: 50
In the above example, 50% of traces are captured.
Note: Probabilistic sampler properties ensure that only complete traces are ingested, assuming you use the same sampling percentage across all Agents.
The probabilistic sampler ignores spans for which the sampling priority is already set at the SDK level. Additionally, spans not caught by the probabilistic sampler might still be captured by the Datadog Agent’s error and rare samplers, ensuring a higher representation of errors and rare endpoint traces in the ingested dataset.
Monitor ingested volumes from Datadog UI
You can leverage the APM Estimated Usage dashboard and the estimated usage metric datadog.estimated_usage.apm.ingested_bytes
to get visibility into your ingested volumes for a specific time period. Filter the dashboard to specific environments and services to see which services are responsible for the largest shares of the ingested volume.
Further Reading
Documentation, liens et articles supplémentaires utiles: