Handling Load and Backpressure

Docs > Observability Pipelines（観測データの制御） > Scaling and Performance > Handling Load and Backpressure

このページは日本語には対応しておりません。随時翻訳に取り組んでいます。
翻訳に関してご質問やご意見ございましたら、お気軽にご連絡ください。

Overview

Sometimes problems can occur even when you try to ensure your Observability Pipelines deployments are sized for the expected load. For example, an application might start generating more data than usual, or the downstream service you are sending data to starts responding slower than expected. To address these issues:

Observability Pipelines propagates backpressure, which signals that the system cannot process events immediately upon receiving them.
Observability Pipelines components also have in-memory buffering in case the next component is busy processing incoming data.

Backpressure

Backpressure is a signal that events cannot be processed as soon as they are received, and that signal is propagated up through the pipeline. For example, if a processor is sending events to a destination, and the destination cannot process it as fast as it is receiving the events, backpressure is propagated from the destination to the processor and all the way back to the source.

Backpressure determines if the system should slow down the consumption or acceptance of events because it is too busy to handle more work. In some cases though, the system should not immediately propagate backpressure, because this could lead to the constant slowing down of upstream components, and potentially cause issues outside of Observability Pipelines. For example, the system prevents processes from slowing down when a component barely exceeds the saturation threshold. It can also handle temporary slowdowns and outages with external services that receive data from destinations.

Component buffers

All components in Observability Pipelines have a small in-memory buffer between them to ensure smooth handoff of events as they traverse your pipeline. These buffers are not intended for large scale buffering. Sources and processors have a buffer capacity of 100 events.

By default, destinations have an in-memory buffer which can store 500 events. Destinations in particular are susceptible to intermittent latency and outages, because destinations involve sending events over a network to an external service. The size of destination buffers is configurable, allowing you to set it based on your pipeline’s throughput. As long as there is still space in the buffer, your source keeps ingesting events and does not propagate backpressure.

Destination buffer behavior

If a destination becomes unavailable, events start to fill the destination buffer. The destination retries indefinitely to ensure the pipeline flows again as soon as the destination becomes available. If the buffer fills up during this time, it blocks new events from being processed upstream. This eventually results in backpressure propagation, which stops your source from ingesting any new events.

Which buffer type to use for a destination

There are two types of buffers you can use for your destination:

Memory buffers prioritize throughput over durability; they can handle significant bandwidth, but memory buffers do not persist between Worker restarts.
Disk buffers prioritize durability over throughput. Disk buffers write to the page cache first, then flush to disk if the data is not immediately transmitted by the destination. Disk buffers wait at most 500 ms before calling fsync and flushing a data file to disk. A disk buffer flushes more frequently if a data file fills up to its maximum 128 MB size before the 500 ms has elapsed since the last flush.

Both types of buffering help to prevent backpressure from propagating back to your source and application when a destination is temporarily unavailable. Specific reasons you might have for choosing:

Memory buffers
- You plan on sending a high bandwidth of data through your Worker, which a disk buffer might not be able to keep up with.
- You are okay with potential data loss.
Disk buffers
- The bandwidth of data you plan on sending through your pipeline is unlikely to get bottlenecked by I/O as the buffer flushes to disk.
- You need to minimize any potential data loss which might occur if the Worker unexpectedly shuts down.

This table compares the differences between the memory and disk buffer.

Property	Memory Buffer	Disk Buffer
Default size	500 events	Configurable Minimum buffer size: 256 MB Maximum buffer size: 500 GB
Performance	Higher	Lower
Durability through an unexpected Worker restart or crash	None	Events flushed to disk latest every 500 ms
Data loss due to an unexpected restart or crash	All buffered data is lost	All buffered data is retained
Data loss on graceful shutdown	All buffered data is lost	None (All data is flushed to disk before exit)

Kubernetes persistent volumes

If you enable disk buffering for destinations, you must enable Kubernetes persistent volumes in the Observability Pipelines helm chart. With disk buffering enabled, events are first sent to the buffer and written to the persistent volumes and then sent downstream.

Buffer metrics

Use these metrics to analyze buffer performance. All metrics are emitted on a one-second interval, unless otherwise stated. Note: counter metrics, such as pipelines.buffer_received_events_total, represent the count per second and not the cumulative total, even though total is in the metric name.

Tags for metrics

Use the component_id tag to filter or group by individual components.
Use the component_type tag to filter or group by sources, processors, or destinations. Note: For processors, use component_type:transform.

Destination buffer metrics

These metrics are specific to destination buffers, located upstream of a destination. Each destination emits its own respective buffer metrics.

pipelines.buffer_size_events: Description: Number of events in a destination’s buffer.; Metric type: gauge
pipelines.buffer_size_bytes: Description: Number of bytes in a destination’s buffer.; Metric type: gauge
pipelines.buffer_received_events_total: Description: Events received by a destination’s buffer.; Metric type: counter
pipelines.buffer_received_bytes_total: Description: Bytes received by a destination’s buffer.; Metric type: counter
pipelines.buffer_sent_events_total: Description: Events sent downstream by a destination’s buffer.; Metric type: counter
pipelines.buffer_sent_bytes_total: Description: Bytes sent downstream by a destination’s buffer.; Metric type: counter
pipelines.buffer_discarded_events_total: Description: Events discarded by the buffer.; Metric type: counter; Additional tags: intentional:true means an incoming event was dropped because the buffer was configured to drop the newest logs when it’s full. intentional:false means the event was dropped due to an error.
pipelines.buffer_discarded_bytes_total: Description: Bytes discarded by the buffer.; Metric type: counter; Additional tags: intentional:true means an incoming event was dropped because the buffer was configured to drop the newest logs when it’s full. intentional:false means the event was dropped due to an error.

Source buffer metrics

These metrics are specific to source buffers, located downstream of a source. Each source emits its own respective buffer metrics. Note: Source buffers are not configurable, but these metrics can help monitor backpressure as it propagates to your pipeline’s source.

pipelines.source_buffer_utilization: Description: Event count in a source’s buffer.; Metric type: histogram
pipelines.source_buffer_utilization_level: Description: Number of events in a source’s buffer.; Metric type: gauge
pipelines.source_buffer_utilization_mean: Description: The exponentially weighted moving average (EWMA) of the number of events in the source’s buffer.; Metric type: gauge
pipelines.source_buffer_max_size_events: Description: A source buffer’s maximum event capacity.; Metric type: gauge

Processor buffer metrics

These metrics are specific to processor buffers, located upstream of a processor. Each processor emits its own respective buffer metrics. Note: Processor buffers are not configurable, but these metrics can help monitor backpressure as it propagates through your pipeline’s processors.

pipelines.transform_buffer_utilization: Description: Event count in a processor’s buffer.; Metric type: histogram
pipelines.transform_buffer_utilization_level: Description: Event count in a processor’s buffer.; Metric type: gauge
pipelines.transform_buffer_utilization_mean: Description: The exponentially weighted moving average (EWMA) of the number of events in a processor’s buffer.; Metric type: gauge
pipelines.transform_buffer_max_size_events: Description: A processor buffer’s maximum event capacity.; Metric type: gauge

Deprecated buffer metrics

These metrics are still emitted by the Observability Pipelines Worker for backwards compatibility. Datadog recommends using the replacements when possible.

pipelines.buffer_events: Description: Number of events in a destination’s buffer. Use pipelines.buffer_size_events instead.; Metric type: gauge
pipelines.buffer_byte_size: Description: Number of bytes in a destination’s buffer. Use pipelines.buffer_size_bytes instead.; Metric type: gauge