BYOC Logs Production Best Practices
This product is not supported for your selected
Datadog site. (
).
Overview
This page outlines operational best practices for running BYOC (Bring Your Own Cloud) Logs in production to help reduce common deployment issues.
Enable monitoring from the start
Set up BYOC Logs monitoring before sending production traffic. Without monitoring, diagnosing ingestion or search issues is difficult.
- Deploy the Datadog Agent (or standalone DogStatsD) in the same cluster as BYOC Logs.
- Verify that BYOC Logs metrics appear in the out-of-the-box dashboard.
- Confirm that key metrics are reporting:
indexed_events.count, search_requests.count, disk.available_space.gauge.
See Monitor BYOC Logs for detailed setup instructions.
Use the podSize Helm parameter
Always use the indexer.podSize and searcher.podSize parameters in the Helm chart instead of manually setting CPU and memory resource limits.
The podSize parameter automatically configures component-specific settings (cache sizes, queue limits, concurrent search settings) that are tuned for each tier. Manually setting resources without these tuned values can lead to degraded performance.
indexer:
podSize: xlarge # 4 vCPUs, 16 GB RAM + tuned settings
searcher:
podSize: 2xlarge # 8 vCPUs, 32 GB RAM + tuned settings
See Helm chart sizing tiers for the full list of available sizes and their configurations.
Indexers use a write-ahead log (WAL) to temporarily buffer data before uploading splits to object storage. Configure persistent volumes to prevent data loss if an indexer pod restarts.
Recommended configuration:
- Size: At least 250 GB per indexer pod
- Storage class: Network-attached block storage (for example,
gp3 on AWS, Persistent Disk on GCP, Managed Disks on Azure)
Note: Local SSDs are not recommended because the WAL is not replicated. Ephemeral disks can result in data loss if the disk fails. Use network-attached storage for built-in redundancy, and always enable persistent volumes for production deployments.
Example Helm values:
indexer:
persistentVolume:
enabled: true
storage: 250Gi
storageClass: gp3
Monitor disk capacity on indexers
Indexer disk usage grows as the WAL accumulates data and during merge operations. If disk space runs out, indexers stop ingesting logs.
- Set up an alert on
disk.available_space.gauge that notifies you when available space drops below 20% of total capacity. - Monitor
pending_merge_ops.gauge. A growing backlog of pending merges can indicate indexers are falling behind.
Scale searchers based on your query patterns
Searcher sizing depends more on query patterns than ingestion volume:
- Term searches (for example,
status:error AND service:web): Lower resource requirements - Aggregation queries (timeseries, top lists, percentiles): Higher CPU and memory requirements, especially for multi-day time ranges
If you observe search timeouts or slow dashboard loads, adjust capacity:
- Increase the number of searcher pods.
- Use a larger
searcher.podSize for more memory per pod (especially for aggregation-heavy workloads).
Use Lambda search offloading on AWS
On AWS, BYOC Logs can offload leaf search operations to AWS Lambda. Instead of provisioning searcher pods for peak query load, Lambda handles overflow automatically.
This is useful when:
- Your query load has significant peaks (for example, during incidents or business hours)
- You want to avoid over-provisioning searchers for occasional heavy aggregation queries
- You want to search over large time ranges without adding permanent searcher capacity
With Lambda offloading enabled, you can run fewer searcher pods sized for your baseline query load, and let Lambda absorb spikes. See Lambda Search Offloading for setup instructions.
Keep your Helm chart version up to date
BYOC Logs improvements and bug fixes are delivered through Helm chart updates.
Refresh the Datadog repository and upgrade to the latest chart version with your existing values file:
helm repo update datadog
helm upgrade <RELEASE_NAME> datadog/cloudprem \
--namespace datadog-byoc-logs \
--values values.yaml
To list the available chart versions before upgrading:
helm search repo datadog/cloudprem --versions
Further reading
Additional helpful documentation, links, and articles: