Install the DDOT Collector as a Gateway on Kubernetes

Docs > OpenTelemetry dans Datadog > Send OpenTelemetry Data to Datadog > Datadog Distribution of OpenTelemetry Collector > Install > Install the DDOT Collector as a Gateway on Kubernetes

Cette page n'est pas encore disponible en français, sa traduction est en cours.
Si vous avez des questions ou des retours sur notre projet de traduction actuel, n'hésitez pas à nous contacter.

Support for installing the DDOT Collector as a gateway on Kubernetes is in Preview.

This guide assumes you are familiar with deploying the DDOT Collector as a DaemonSet. For more information, see Install the DDOT Collector as a DaemonSet on Kubernetes.

Overview

The OpenTelemetry Collector can be deployed in multiple ways. The daemonset pattern is a common deployment where a Collector instance runs on every Kubernetes node alongside the core Datadog Agent.

Architecture diagram of the OpenTelemetry Collector daemonset pattern. A Kubernetes cluster contains three nodes. On each node, an application instrumented with OpenTelemetry sends OTLP data to a local Agent DaemonSet. The Agent DaemonSet then forwards this data directly to the Datadog backend.

The gateway pattern provides an additional deployment option that uses a centralized, standalone Collector service. This gateway layer can perform actions such as tail-based sampling, aggregation, filtering, and routing before exporting the data to one or more backends such as Datadog. It acts as a central point for managing and enforcing observability policies.

Architecture diagram of the OpenTelemetry Collector gateway pattern. Applications send OTLP data to local Agent DaemonSets running on each node. The DaemonSets forward this data to a central load balancer, which distributes it to a separate deployment of gateway Collector pods. These gateway pods then process and send the telemetry data to Datadog.

When you enable the gateway:

A Kubernetes Deployment (<RELEASE_NAME>-datadog-otel-agent-gateway-deployment) manages the standalone gateway Collector pods.
A Kubernetes Service (<RELEASE_NAME>-datadog-otel-agent-gateway) exposes the gateway pods and provides load balancing.
The existing DaemonSet Collector pods are configured by default to send their telemetry data to the gateway service instead of directly to Datadog.

Requirements

Before you begin, ensure you have the following:

Datadog Account:
- A Datadog account.
- Your Datadog API key.
Software:
- A Kubernetes cluster (v1.29+). EKS Fargate and GKE Autopilot are not supported.
- Helm (v3+).
- Datadog Helm chart version 3.143.0 or higher.
- kubectl.

Installation and configuration

This guide uses the Datadog Helm chart to configure the DDOT Collector gateway. Check out all the available configurations on the Datadog Helm chart README.

Deploying the gateway with a daemonset

To get started, enable both the gateway and the DaemonSet Collector in your values.yaml file. This is the most common setup.

# values.yaml
targetSystem: "linux"
datadog:
  apiKey: <DATADOG_API_KEY>
  appKey: <DATADOG_APP_KEY>
  # Enable the Collector in the Agent Daemonset
  otelCollector:
    enabled: true

# Enable the standalone Gateway Deployment
otelAgentGateway:
  enabled: true
  replicas: 3
  nodeSelector:
    # Example selector to place gateway pods on specific nodes
    gateway: "true"

In this case, the daemonset Collector uses a default config that sends OTLP data to the gateway’s Kubernetes service:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
exporters:
  debug:
    verbosity: detailed
  otlphttp:
    endpoint: http://<release>-datadog-otel-agent-gateway:4318
    tls:
      insecure: true
processors:
  batch:
    timeout: 10s
connectors:
  datadog/connector:
    traces:
      compute_top_level_by_span_kind: true
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp, datadog/connector]
    metrics:
      receivers: [otlp, datadog/connector]
      processors: [batch]
      exporters: [otlphttp]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp]

The gateway Collector uses a default config that listens on the service ports and sends data to Datadog:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
exporters:
  debug:
    verbosity: detailed
  datadog:
    api:
      key: ${env:DD_API_KEY}
processors:
  batch:
    timeout: 10s
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [datadog]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [datadog]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [datadog]

Always configure otelAgentGateway.affinity or otelAgentGateway.nodeSelector to control the nodes where the gateway pods are scheduled.
Adjust otelAgentGateway.replicas (default is 1) to scale the number of gateway pods based on your needs.

Deploying a standalone gateway

If you have an existing DaemonSet deployment, you can deploy the gateway independently.

# values.yaml
targetSystem: "linux"
fullnameOverride: "gw-only"
agents:
  enabled: false
clusterAgent:
  enabled: false
datadog:
  apiKey: <DATADOG_API_KEY>
  appKey: <DATADOG_APP_KEY>
otelAgentGateway:
  enabled: true
  replicas: 3
  nodeSelector:
    gateway: "true"

After deploying the gateway, you must update the configuration of your existing DaemonSet Collectors to send data to the new gateway service endpoint (for example, http://gw-only-otel-agent-gateway:4318).

Customizing Collector configurations

You can override the default configurations for both the DaemonSet and gateway Collectors using the datadog.otelCollector.config and otelAgentGateway.config values, respectively.

# values.yaml
targetSystem: "linux"
fullnameOverride: "my-gw"
datadog:
  apiKey: <DATADOG_API_KEY>
  appKey: <DATADOG_APP_KEY>
  # Enable and configure the DaemonSet Collector
  otelCollector:
    enabled: true
    config: |
      receivers:
        otlp:
          protocols:
            grpc:
              endpoint: "localhost:4317"
      exporters:
        otlp:
          endpoint: http://my-gw-otel-agent-gateway:4317
          tls:
            insecure: true
      service:
        pipelines:
          traces:
            receivers: [otlp]
            exporters: [otlp]
          metrics:
            receivers: [otlp]
            exporters: [otlp]
          logs:
            receivers: [otlp]
            exporters: [otlp]

# Enable and configure the gateway Collector
otelAgentGateway:
  enabled: true
  replicas: 3
  nodeSelector:
    gateway: "true"
  ports:
    - containerPort: 4317
      name: "otel-grpc"
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: "0.0.0.0:4317"
    exporters:
      datadog:
        api:
          key: ${env:DD_API_KEY}
    service:
      pipelines:
        traces:
          receivers: [otlp]
          exporters: [datadog]
        metrics:
          receivers: [otlp]
          exporters: [datadog]
        logs:
          receivers: [otlp]
          exporters: [datadog]

For the infraattributes processor to add Kubernetes tags, your telemetry must include the container.id resource attribute. This is often, but not always, added by OTel SDK auto-instrumentation.

If your tags are missing, see the troubleshooting guide for details on how to add this attribute.

If you set fullnameOverride, the gateway's Kubernetes service name becomes -otel-agent-gateway. The ports defined in otelAgentGateway.ports are exposed on this service. Ensure these ports match the OTLP receiver configuration in the gateway and the OTLP exporter configuration in the DaemonSet.

The example configurations use insecure TLS for simplicity. Follow the OTel configtls instructions if you want to enable TLS.

Advanced use cases

Tail sampling with the load balancing exporter

A primary use case for the gateway is tail-based sampling. To ensure that all spans for a given trace are processed by the same gateway pod, use the load balancing exporter in your DaemonSet Collectors. This exporter consistently routes spans based on a key, such as traceID.

In the values.yaml below:

The daemonset Collector (datadog.otelCollector) is configured with the loadbalancing exporter, which uses the Kubernetes service resolver to discover and route data to the gateway pods.
The gateway Collector (otelAgentGateway) uses the tail_sampling processor to sample traces based on defined policies before exporting them to Datadog.

# values.yaml
targetSystem: "linux"
fullnameOverride: "my-gw"
datadog:
  apiKey: <DATADOG_API_KEY>
  appKey: <DATADOG_APP_KEY> 
  otelCollector:
    enabled: true
    # RBAC permissions are required for the k8s resolver in the loadbalancing exporter
    rbac:
      create: true
      rules:
        - apiGroups: [""]
          resources: ["endpoints"]
          verbs: ["get", "watch", "list"]
    config: |
      receivers:
        otlp:
          protocols:
            grpc:
              endpoint: "localhost:4317"
      exporters:
        loadbalancing:
          routing_key: "traceID"
          protocol:
            otlp:
              tls:
                insecure: true
          resolver:
            k8s:
              service: my-gw-otel-agent-gateway
              ports:
                - 4317
      service:
        pipelines:
          traces:
            receivers: [otlp]
            exporters: [loadbalancing]

otelAgentGateway:
  enabled: true
  replicas: 3
  ports:
    - containerPort: 4317
      name: "otel-grpc"
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: "0.0.0.0:4317"
    processors:
      tail_sampling:
        decision_wait: 10s
        policies: <Add your sampling policies here>
    connectors:
      datadog/connector:
    exporters:
      datadog:
        api:
          key: ${env:DD_API_KEY}
    service:
      pipelines:
        traces/sample:
          receivers: [otlp]
          processors: [tail_sampling]
          exporters: [datadog]
        traces:
          receivers: [otlp]
          exporters: [datadog/connector]
        metrics:
          receivers: [datadog/connector]
          exporters: [datadog]

To ensure APM Stats are calculated on 100% of your traces before sampling, the datadog/connector runs in a separate pipeline without the tail_sampling processor. The Connector can run in either the DaemonSet or the gateway layer.

Using a custom Collector image

To use a custom-built Collector image for your gateway, specify the image repository and tag under otelAgentGateway.image. If you need instructions on how to build the custom images, see Use Custom OpenTelemetry Components.

# values.yaml
targetSystem: "linux"
agents:
  enabled: false
clusterAgent:
  enabled: false
otelAgentGateway:
  enabled: true
  image:
    repository: <YOUR REPO>
    tag: <IMAGE TAG>
    doNotCheckTag: true
  ports:
    - containerPort: "4317"
      name: "otel-grpc"
  config: | <YOUR CONFIG>

Enable Autoscaling with Horizontal Pod Autoscaler (HPA)

The DDOT Collector gateway supports autoscaling with the Kubernetes Horizontal Pod Autoscaler (HPA) feature. To enable HPA, configure otelAgentGateway.autoscaling.

# values.yaml
targetSystem: "linux"
agents:
  enabled: false
clusterAgent:
  enabled: false
otelAgentGateway:
  enabled: true
  ports:
    - containerPort: "4317"
      name: "otel-grpc"
  config: | <YOUR CONFIG>
  replicas: 4  # 4 replicas to begin with and HPA may override it based on the metrics
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    metrics:
      # Aim for high CPU utilization for higher throughput
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 80
    behavior:
      scaleUp:
        stabilizationWindowSeconds: 30
      scaleDown:
        stabilizationWindowSeconds: 60

You can use resource metrics (CPU or memory), custom metrics (Kubernetes Pod or Object), or external metrics as autoscaling inputs. For resource metrics, ensure that the Kubernetes metrics server is running in your cluster. For custom or external metrics, consider configuring the Datadog Cluster Agent metrics provider.

Deploying a multi-layer gateway

For advanced scenarios, you can deploy multiple gateway layers to create a processing chain. To do this, deploy each layer as a separate Helm release, starting from the final layer and working backward.

Deploy Layer 1 (Final Layer): This layer receives from Layer 2 and exports to Datadog.

# layer-1-values.yaml
targetSystem: "linux"
fullnameOverride: "gw-layer-1"
agents:
  enabled: false
clusterAgent:
  enabled: false
otelAgentGateway:
  enabled: true
  replicas: 3
  nodeSelector:
    gateway: "gw-node-1"
  ports:
    - containerPort: "4317"
      hostPort: "4317"
      name: "otel-grpc"
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: "0.0.0.0:4317"
    exporters:
      datadog:
        api:
          key: <API Key>
    service:
      pipelines:
        traces:
          receivers: [otlp]
          exporters: [datadog]
        metrics:
          receivers: [otlp]
          exporters: [datadog]
        logs:
          receivers: [otlp]
          exporters: [datadog]

Deploy Layer 2 (Intermediate Layer): This layer receives from the DaemonSet and exports to Layer 1.

# layer-2-values.yaml
targetSystem: "linux"
fullnameOverride: "gw-layer-2"
agents:
  enabled: false
clusterAgent:
  enabled: false
otelAgentGateway:
  enabled: true
  replicas: 3
  nodeSelector:
    gateway: "gw-node-2"
  ports:
    - containerPort: "4317"
      hostPort: "4317"
      name: "otel-grpc"
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: "0.0.0.0:4317"
    exporters:
      otlp:
        endpoint: http://gw-layer-1-otel-agent-gateway:4317
        tls:
          insecure: true
    service:
      pipelines:
        traces:
          receivers: [otlp]
          exporters: [otlp]
        metrics:
          receivers: [otlp]
          exporters: [otlp]
        logs:
          receivers: [otlp]
          exporters: [otlp]

Deploy DaemonSet: Configure the DaemonSet to export to Layer 2.

# daemonset-values.yaml
targetSystem: "linux"
datadog:
  apiKey: <DATADOG_API_KEY>
  appKey: <DATADOG_APP_KEY>
  otelCollector:
    enabled: true
    config: |
      receivers:
        otlp:
          protocols:
            grpc:
              endpoint: "localhost:4317"
      exporters:
        otlp:
          endpoint: http://gw-layer-2-otel-agent-gateway:4317
          tls:
            insecure: true
      service:
        pipelines:
          traces:
            receivers: [otlp]
            exporters: [otlp]
          metrics:
            receivers: [otlp]
            exporters: [otlp]
          logs:
            receivers: [otlp]
            exporters: [otlp]

Known limitations

Gateway pods on Fleet Automation: Standalone gateway pods are not yet visible on the Fleet Automation page. Only DaemonSet Collectors are displayed. This is being actively addressed.
Startup race condition: When deploying the DaemonSet and gateway in the same release, DaemonSet pods might start before the gateway service is ready, causing initial connection error logs. The OTLP exporter automatically retries, so these logs can be safely ignored. Alternatively, deploy the gateway first and wait for it to become ready before deploying the DaemonSet.
infraattributes processor requirement: The infraattributes processor requires a datadog exporter to be defined in the same Collector configuration, even if it’s not used in a pipeline. The Collector will fail to start if the exporter is missing. To resolve this, add a datadog exporter to your configuration, even if you do not reference it in a service pipeline.
Ignorable Core Agent Connection Logs: Gateway pods might generate warning logs about failing to connect to a core Datadog Agent (for example, grpc: addrConn.createTransport failed to connect). This occurs because the gateway deployment does not include a core agent in the same pod. These logs are expected and can be safely ignored. This is being actively addressed.