---
title: BYOC Logs Production Best Practices
description: Recommendations for running BYOC Logs reliably in production.
breadcrumbs: Docs > BYOC Logs > Operate BYOC Logs > BYOC Logs Production Best Practices
---

# BYOC Logs Production Best Practices

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com, us2.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site.md). ({% placeholder "user-datadog-site-name" /%}).
{% /alert %}

{% /callout %}

## Overview{% #overview %}

This page outlines operational best practices for running BYOC (Bring Your Own Cloud) Logs in production to help reduce common deployment issues.

## Enable monitoring from the start{% #enable-monitoring-from-the-start %}

Set up BYOC Logs monitoring before sending production traffic. Without monitoring, diagnosing ingestion or search issues is difficult.

1. Deploy the Datadog Agent (or standalone DogStatsD) in the same cluster as BYOC Logs.
1. Verify that BYOC Logs metrics appear in the [out-of-the-box dashboard](https://app.datadoghq.com/dashboard/lists?q=byoc&p=1).
1. Confirm that key metrics are reporting: `indexed_events.count`, `search_requests.count`, `disk.available_space.gauge`.

See [Monitor BYOC Logs](https://docs.datadoghq.com/byoc-logs/operate/monitoring.md) for detailed setup instructions.

## Use the podSize Helm parameter{% #use-the-podsize-helm-parameter %}

Always use the `indexer.podSize` and `searcher.podSize` parameters in the Helm chart instead of manually setting CPU and memory resource limits.

The `podSize` parameter automatically configures component-specific settings (cache sizes, queue limits, concurrent search settings) that are tuned for each tier. Manually setting resources without these tuned values can lead to degraded performance.

```yaml
indexer:
  podSize: xlarge   # 4 vCPUs, 16 GB RAM + tuned settings
searcher:
  podSize: 2xlarge  # 8 vCPUs, 32 GB RAM + tuned settings
```

See [Helm chart sizing tiers](https://docs.datadoghq.com/byoc-logs/operate/sizing.md#helm-chart-sizing-tiers) for the full list of available sizes and their configurations.

## Configure persistent storage for indexers{% #configure-persistent-storage-for-indexers %}

Indexers use a write-ahead log (WAL) to temporarily buffer data before uploading splits to object storage. Configure persistent volumes to prevent data loss if an indexer pod restarts.

**Recommended configuration**:

- **Size**: At least 250 GB per indexer pod
- **Storage class**: Network-attached block storage (for example, `gp3` on AWS, Persistent Disk on GCP, Managed Disks on Azure)

**Note**: Local SSDs are not recommended because the WAL is not replicated. Ephemeral disks can result in data loss if the disk fails. Use network-attached storage for built-in redundancy, and always enable persistent volumes for production deployments.

Example Helm values:

```yaml
indexer:
  persistentVolume:
    enabled: true
    storage: 250Gi
    storageClass: gp3
```

## Monitor disk capacity on indexers{% #monitor-disk-capacity-on-indexers %}

Indexer disk usage grows as the WAL accumulates data and during merge operations. If disk space runs out, indexers stop ingesting logs.

- Set up an alert on `disk.available_space.gauge` that notifies you when available space drops below 20% of total capacity.
- Monitor `pending_merge_ops.gauge`. A growing backlog of pending merges can indicate indexers are falling behind.

## Scale searchers based on your query patterns{% #scale-searchers-based-on-your-query-patterns %}

Searcher sizing depends more on query patterns than ingestion volume:

- **Term searches** (for example, `status:error AND service:web`): Lower resource requirements
- **Aggregation queries** (timeseries, top lists, percentiles): Higher CPU and memory requirements, especially for multi-day time ranges

If you observe search timeouts or slow dashboard loads, adjust capacity:

1. Increase the number of searcher pods.
1. Use a larger `searcher.podSize` for more memory per pod (especially for aggregation-heavy workloads).

## Use Lambda search offloading on AWS{% #use-lambda-search-offloading-on-aws %}

On AWS, BYOC Logs can offload leaf search operations to AWS Lambda. Instead of provisioning searcher pods for peak query load, Lambda handles overflow automatically.

This is useful when:

- Your query load has significant peaks (for example, during incidents or business hours)
- You want to avoid over-provisioning searchers for occasional heavy aggregation queries
- You want to search over large time ranges without adding permanent searcher capacity

With Lambda offloading enabled, you can run fewer searcher pods sized for your baseline query load, and let Lambda absorb spikes. See [Lambda Search Offloading](https://docs.datadoghq.com/byoc-logs/configure/lambda.md) for setup instructions.

## Keep your Helm chart version up to date{% #keep-your-helm-chart-version-up-to-date %}

BYOC Logs improvements and bug fixes are delivered through Helm chart updates.

Refresh the Datadog repository and upgrade to the latest chart version with your existing values file:

```shell
helm repo update datadog
helm upgrade <RELEASE_NAME> datadog/cloudprem \
  --namespace datadog-byoc-logs \
  --values values.yaml
```

To list the available chart versions before upgrading:

```shell
helm search repo datadog/cloudprem --versions
```

## Further reading{% #further-reading %}

- [Cluster Sizing](https://docs.datadoghq.com/byoc-logs/operate/sizing.md)
- [Monitor BYOC Logs](https://docs.datadoghq.com/byoc-logs/operate/monitoring.md)
- [Troubleshooting](https://docs.datadoghq.com/byoc-logs/operate/troubleshooting.md)
