---
title: Agent Retry and Buffering Logic
description: >-
  Learn about the Datadog Agent's retry strategies and backoff behavior,
  buffering mechanisms and limits, and data drop conditions and loss scenarios.
breadcrumbs: Docs > Agent > Agent Guides > Agent Retry and Buffering Logic
---

# Agent Retry and Buffering Logic

## Overview{% #overview %}

This guide describes the Datadog Agent's behavior when it fails to send HTTP requests to the Metrics, Logs, APM, and Processes intake endpoints. All retry strategies use exponential backoff with randomized jitter. See the [backoff source code](https://github.com/DataDog/datadog-agent/blob/main/pkg/util/backoff/backoff.go) for implementation details.

{% alert level="info" %}
A failed HTTP request in this guide refers to any request that does not result in a `2xx` HTTP response.
{% /alert %}

{% tab title="Metrics" %}
### Metrics retry strategy{% #metrics-retry-strategy %}

The Agent retries failed HTTP requests using an exponential backoff strategy. The Agent uses the following default retry configurations for the metrics intake:

- Base backoff time: 2 seconds
- Maximum backoff time: [64 seconds](https://github.com/DataDog/datadog-agent/blob/main/pkg/util/backoff/backoff.go#L47)
- Maximum backoff time is reached after 6 retries

The Agent retries failed requests for the following scenarios:

- Network timeouts
- HTTP `4xx` responses (see note for exceptions)
- HTTP `5xx` responses

{% alert level="info" %}
For `4xx` responses, the Agent does not retry requests with status codes `400`, `403`, or `413`.Requests that return a `404` response are retried because they often indicate a configuration or availability issue that could be resolved.
{% /alert %}

### Metrics buffering mechanisms and limits{% #metrics-buffering-mechanisms-and-limits %}

When the Agent fails to send a metric to the Datadog intake, it compresses and stores the metric in an in-memory retry buffer. See Buffer configurations for the available settings.

The Agent also supports an optional [on-disk retry buffer](https://docs.datadoghq.com/agent/configuration/network.md#data-buffering). If you enable this setting, the Agent:

1. Fills the in-memory buffer until it is full
1. Evicts older payloads from memory and serializes them to disk
1. Retries payloads in the following order:
   1. In-memory payloads (newest first)
   1. On-disk payloads (newest first)

This prioritization helps ensure that the Agent sends recent and live metrics before it backfills older data.

#### Buffer configurations{% #buffer-configurations %}

The Datadog Agent has the following default configurations for metric retry buffering:

- On-disk buffer size: 2 GB
- Maximum disk usage ratio: 0.8
- Maximum in-memory buffer size: 15 MB

You can configure the default maximum in-memory buffer size using the `forwarder_retry_queue_payloads_max_size` [setting](https://docs.datadoghq.com/agent/configuration/network.md#data-buffering).

#### Restart and shutdown behavior{% #restart-and-shutdown-behavior %}

During restart, the Agent:

- Drops in-memory payloads
- Preserves and resends on-disk payloads

During shutdown, the Agent:

- Flushes in-flight requests
- Does not flush payloads in retry queues (both in-memory and on-disk)

{% /tab %}

{% tab title="Logs" %}
### Logs retry strategy{% #logs-retry-strategy %}

The Logs Agent retries failed HTTP requests indefinitely using an exponential backoff strategy. The Agent uses the following default retry configurations for the logs intake:

- Base backoff time: 2 seconds
- Maximum backoff time: 120 seconds

The Agent retries failed log payloads until the logs intake endpoint becomes available.

{% alert level="info" %}
The Logs Agent does not retry requests with status codes `400`, `401`, `403`, or `413`.
{% /alert %}

### Logs buffering mechanisms and limits{% #logs-buffering-mechanisms-and-limits %}

#### Backpressure and consumption{% #backpressure-and-consumption %}

The Logs Agent guarantees log delivery during transmission. When a payload fails to send, the Agent applies backpressure by stopping reading from the log source and resuming from the last known position when the intake becomes available.

#### Data loss scenarios{% #data-loss-scenarios %}

- **Kubernetes**: Log files may rotate before intake recovery
- **Host-based systems**: Files may be removed by tools such as `logrotate`

#### Log buffer limits{% #log-buffer-limits %}

- HTTP logs:

  - Not configurable

- TCP logs:

  - Buffer limit: 100 log lines
  - The Agent sends logs line by line

#### Registry and restart behavior{% #registry-and-restart-behavior %}

The Logs Agent maintains a registry that tracks log sources and current read offsets. The Agent flushes the registry to disk every second and reloads it when the Agent restarts. You cannot configure this process.

On restart, the Agent resumes reading from the position recorded in the registry.

### Advanced shipping configuration{% #advanced-shipping-configuration %}

#### Dual shipping{% #dual-shipping %}

When you enable [dual shipping](https://docs.datadoghq.com/agent/configuration/dual-shipping.md?tab=helm&site=us):

- The Agent sends logs to the first available endpoint
- The Agent drops payloads for any endpoint that fails
- Log consumption continues as long as at least one endpoint succeeds

For the Agent logic when `is_reliable` is enabled, see [Logs Dual Shipping](https://docs.datadoghq.com/agent/configuration/dual-shipping.md?tab=helm#environment-variable-configuration-6).
{% /tab %}

{% tab title="APM" %}
### APM retry strategy{% #apm-retry-strategy %}

The Agent retries failed APM requests using an exponential backoff strategy. The Agent uses the following default retry configurations for the APM intake:

- Base backoff time: 2 seconds
- Maximum backoff time: 10 seconds

The Agent retries failed requests for the following scenarios:

- Network connectivity errors
- HTTP `408` responses
- HTTP `5xx` responses

{% alert level="info" %}
You cannot configure the retry behavior and retriable status codes for APM.
{% /alert %}

### APM buffering mechanisms and limits{% #apm-buffering-mechanisms-and-limits %}

#### In-memory queues{% #in-memory-queues %}

The Agent compresses and stores failed APM payloads in memory, dropping them when queues are full.

#### Stats{% #stats %}

- Configurable using `apm_config.stats_writer.queue_size`
- Default calculation:
  - `int(max(1, max memory / payload size))`
  - Example: `int(max(1, (250 * 1024 * 1024) / 1500000)) = 174` [payloads](https://github.com/DataDog/datadog-agent/blob/7.43.1/pkg/trace/writer/stats.go#L73-L83)

### Advanced shipping configuration{% #advanced-shipping-configuration %}

#### Dual shipping{% #dual-shipping %}

When you enable [dual shipping](https://docs.datadoghq.com/agent/configuration/dual-shipping.md?tab=helm&site=us) for the APM intake, each endpoint has an independent sender and queue.
{% /tab %}

{% tab title="Processes" %}
### Processes retry strategy{% #processes-retry-strategy %}

The Agent retries failed Processes requests using an exponential backoff strategy. The Agent uses the same default retry configurations as the metrics intake:

- Base backoff time: 2 seconds
- Maximum backoff time: [64 seconds](https://github.com/DataDog/datadog-agent/blob/main/pkg/util/backoff/backoff.go#L47)
- Maximum backoff time is reached after 6 retries

See **Metrics retry strategy** for complete details on retry scenarios and exceptions.

{% alert level="info" %}
On-disk buffering is not supported for Processes.
{% /alert %}

### Processes buffering mechanisms and limits{% #processes-buffering-mechanisms-and-limits %}

The Process Agent uses the **metrics forwarder** for downstream delivery. Before forwarding check results, the Process Agent stores them in an in-memory queue.

#### Queue mechanism{% #queue-mechanism %}

The in-memory queue buffers data when the intake is unavailable or during transmission delays.

#### Buffer limits{% #buffer-limits %}

- **Queue size**: 256 payloads (`DefaultProcessQueueSize`)
- **Queue memory**: [60 MB](https://github.com/DataDog/datadog-agent/blob/main/pkg/config/setup/process.go#L34-L36) (`DefaultProcessQueueBytes`)

With checks running every 10 seconds, these settings buffer approximately 30 minutes of process data.

#### Version-specific queue behavior{% #version-specific-queue-behavior %}

**Agent versions 7.38 and earlier:**

- Process and Connections (NPM) payloads share a single queue
- Buffer limits apply to the combined payloads
- Buffers approximately 30 minutes of combined data

**Agent versions 7.39 and later:**

- Process and Connections (NPM) payloads use separate queues
- Each payload type has independent buffer limits
- Default settings buffer approximately 40 minutes of process data

{% /tab %}
