Metrics retry strategy
The Agent retries failed HTTP requests using an exponential backoff strategy. The Agent uses the following default retry configurations for the metrics intake:
- Base backoff time: 2 seconds
- Maximum backoff time: 64 seconds
- Maximum backoff time is reached after 6 retries
The Agent retries failed requests for the following scenarios:
- Network timeouts
- HTTP
4xx responses (see note for exceptions) - HTTP
5xx responses
For 4xx responses, the Agent does not retry requests with status codes 400, 403, or 413.
Requests that return a 404 response are retried because they often indicate a configuration or availability issue that could be resolved.
Metrics buffering mechanisms and limits
When the Agent fails to send a metric to the Datadog intake, it compresses and stores the metric in an in-memory retry buffer. See Buffer configurations for the available settings.
The Agent also supports an optional on-disk retry buffer. If you enable this setting, the Agent:
- Fills the in-memory buffer until it is full
- Evicts older payloads from memory and serializes them to disk
- Retries payloads in the following order:
- In-memory payloads (newest first)
- On-disk payloads (newest first)
This prioritization helps ensure that the Agent sends recent and live metrics before it backfills older data.
Buffer configurations
The Datadog Agent has the following default configurations for metric retry buffering:
- On-disk buffer size: 2 GB
- Maximum disk usage ratio: 0.8
- Maximum in-memory buffer size: 15 MB
You can configure the default maximum in-memory buffer size using the forwarder_retry_queue_payloads_max_size setting.
Restart and shutdown behavior
During restart, the Agent:
- Drops in-memory payloads
- Preserves and resends on-disk payloads
During shutdown, the Agent:
- Flushes in-flight requests
- Does not flush payloads in retry queues (both in-memory and on-disk)
Logs retry strategy
The Logs Agent retries failed HTTP requests indefinitely using an exponential backoff strategy. The Agent uses the following default retry configurations for the logs intake:
- Base backoff time: 2 seconds
- Maximum backoff time: 120 seconds
The Agent retries failed log payloads until the logs intake endpoint becomes available.
The Logs Agent does not retry requests with status codes 400, 401, 403, or 413.
Logs buffering mechanisms and limits
Backpressure and consumption
The Logs Agent guarantees log delivery during transmission. When a payload fails to send, the Agent applies backpressure by stopping reading from the log source and resuming from the last known position when the intake becomes available.
Data loss scenarios
- Kubernetes: Log files may rotate before intake recovery
- Host-based systems: Files may be removed by tools such as
logrotate
Log buffer limits
HTTP logs:
TCP logs:
- Buffer limit: 100 log lines
- The Agent sends logs line by line
Registry and restart behavior
The Logs Agent maintains a registry that tracks log sources and current read offsets. The Agent flushes the registry to disk every second and reloads it when the Agent restarts. You cannot configure this process.
On restart, the Agent resumes reading from the position recorded in the registry.
Advanced shipping configuration
Dual shipping
When you enable dual shipping:
- The Agent sends logs to the first available endpoint
- The Agent drops payloads for any endpoint that fails
- Log consumption continues as long as at least one endpoint succeeds
For the Agent logic when is_reliable is enabled, see Logs Dual Shipping.
APM retry strategy
The Agent retries failed APM requests using an exponential backoff strategy. The Agent uses the following default retry configurations for the APM intake:
- Base backoff time: 2 seconds
- Maximum backoff time: 10 seconds
The Agent retries failed requests for the following scenarios:
- Network connectivity errors
- HTTP
408 responses - HTTP
5xx responses
You cannot configure the retry behavior and retriable status codes for APM.
APM buffering mechanisms and limits
In-memory queues
The Agent compresses and stores failed APM payloads in memory, dropping them when queues are full.
Stats
- Configurable using
apm_config.stats_writer.queue_size - Default calculation:
int(max(1, max memory / payload size))- Example:
int(max(1, (250 * 1024 * 1024) / 1500000)) = 174 payloads
Advanced shipping configuration
Dual shipping
When you enable dual shipping for the APM intake, each endpoint has an independent sender and queue.
Processes retry strategy
The Agent retries failed Processes requests using an exponential backoff strategy. The Agent uses the same default retry configurations as the metrics intake:
- Base backoff time: 2 seconds
- Maximum backoff time: 64 seconds
- Maximum backoff time is reached after 6 retries
See Metrics retry strategy for complete details on retry scenarios and exceptions.
On-disk buffering is not supported for Processes.
Processes buffering mechanisms and limits
The Process Agent uses the metrics forwarder for downstream delivery. Before forwarding check results, the Process Agent stores them in an in-memory queue.
Queue mechanism
The in-memory queue buffers data when the intake is unavailable or during transmission delays.
Buffer limits
- Queue size: 256 payloads (
DefaultProcessQueueSize) - Queue memory: 60 MB (
DefaultProcessQueueBytes)
With checks running every 10 seconds, these settings buffer approximately 30 minutes of process data.
Version-specific queue behavior
Agent versions 7.38 and earlier:
- Process and Connections (NPM) payloads share a single queue
- Buffer limits apply to the combined payloads
- Buffers approximately 30 minutes of combined data
Agent versions 7.39 and later:
- Process and Connections (NPM) payloads use separate queues
- Each payload type has independent buffer limits
- Default settings buffer approximately 40 minutes of process data