The OpenTelemetry Kubernetes integration is in Preview. To request access, contact your Datadog account team.

Overview

Collect Kubernetes metrics using the OpenTelemetry Collector to gain comprehensive insights into your cluster’s health and performance. This integration uses a combination of OpenTelemetry receivers to gather data, which populates the Kubernetes - Overview dashboard.

The 'Kubernetes - Overview' dashboard, showing metrics for containers, including status and resource usage of your cluster and its containers.

This integration requires the kube-state-metrics service and uses a two-collector architecture to gather data.

The kube-state-metrics service is a required component that generates detailed metrics about the state of Kubernetes objects like deployments, nodes, and pods. This architecture uses two separate OpenTelemetry Collectors:

  • A Cluster Collector, deployed as a Kubernetes Deployment, gathers cluster-wide metrics (for example, the total number of deployments).
  • A Node Collector, deployed as a Kubernetes DaemonSet, runs on each node to collect node-specific metrics (for example, CPU and memory usage per node).

This approach ensures that cluster-level metrics are collected only once, preventing data duplication, while node-level metrics are gathered from every node in the cluster.

Setup

To collect Kubernetes metrics with OpenTelemetry, you need to deploy kube-state-metrics and configure both of the above OpenTelemetry Collectors in your cluster.

Prerequisites

  • Helm: The setup uses Helm to deploy resources. To install Helm, see the official Helm documentation.
  • Collector Image: This guide uses the otel/opentelemetry-collector-contrib:0.130.0 image or newer.

Installation

1. Install kube-state-metrics

Run the following commands to add the prometheus-community Helm repository and install kube-state-metrics:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install kube-state-metrics prometheus-community/kube-state-metrics

2. Create a Datadog API Key Secret

Create a Kubernetes secret to store your Datadog API key securely.

export DD_API_KEY="<YOUR_DATADOG_API_KEY>"
kubectl create secret generic datadog-secret --from-literal api-key=$DD_API_KEY

3. Install the OpenTelemetry Collectors

  1. Add the OpenTelemetry Helm chart repository:

    helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
    helm repo update
    
  2. Download the configuration files for the two Collectors:

  3. Set your cluster name as an environment variable and use Helm to deploy both the Cluster and Node Collectors. Make sure the paths to the YAML files are correct.

    # Set your cluster name
    export K8S_CLUSTER_NAME="<YOUR_CLUSTER_NAME>"
    
    # Install the Node Collector (DaemonSet)
    helm install otel-daemon-collector open-telemetry/opentelemetry-collector \
      -f daemonset-collector.yaml \
      --set image.repository=otel/opentelemetry-collector-contrib \
      --set image.tag=0.130.0 \
      --set-string "config.processors.resource.attributes[0].key=k8s.cluster.name" \
      --set-string "config.processors.resource.attributes[0].value=${K8S_CLUSTER_NAME}"
    
    # Install the Cluster Collector (Deployment)
    helm install otel-cluster-collector open-telemetry/opentelemetry-collector \
      -f cluster-collector.yaml \
      --set image.repository=otel/opentelemetry-collector-contrib \
      --set image.tag=0.130.0 \
      --set-string "config.processors.resource.attributes[0].key=k8s.cluster.name" \
      --set-string "config.processors.resource.attributes[0].value=${K8S_CLUSTER_NAME}"
    

Metric metadata configuration

Some metrics require manual metadata updates in Datadog to ensure they are interpreted and displayed correctly.

To edit a metric’s metadata:

  1. Go to Metrics > Summary.
  2. Select the metric you want to edit.
  3. Click Edit in the side panel.
  4. Edit the metadata as needed.
  5. Click Save.

Repeat this process for each of the metrics listed in the following table:

Metric NameMetric TypeUnit
k8s.pod.cpu.usageGaugecore
k8s.pod.network.ioGaugebyte_in_binary_bytes_family per second
k8s.pod.network.errorsGaugebyte_in_binary_bytes_family per second

Correlating traces with infrastructure metrics

To correlate your APM traces with Kubernetes infrastructure metrics, Datadog uses unified service tagging. This requires setting three standard resource attributes on telemetry from both your application and your infrastructure. Datadog automatically maps these OpenTelemetry attributes to the standard Datadog tags (env, service, and version) used for correlation.

The required OpenTelemetry attributes are:

  • service.name
  • service.version
  • deployment.environment.name (formerly deployment.environment)

This ensures that telemetry from your application is consistently tagged, allowing Datadog to link traces, metrics, and logs to the same service.

Application configuration

Set the following environment variables in your application’s container specification to tag outgoing telemetry:

spec:
  containers:
    - name: my-container
      env:
        - name: OTEL_SERVICE_NAME
          value: "<SERVICE_NAME>"
        - name: OTEL_SERVICE_VERSION
          value: "<SERVICE_VERSION>"
        - name: OTEL_ENVIRONMENT
          value: "<ENVIRONMENT>"
        - name: OTEL_RESOURCE_ATTRIBUTES
          value: "service.name=$(OTEL_SERVICE_NAME),service.version=$(OTEL_SERVICE_VERSION),deployment.environment.name=$(OTEL_ENVIRONMENT)"

Infrastructure configuration

Add the corresponding annotations to your Kubernetes Deployment metadata. The k8sattributes processor in the Collector uses these annotations to enrich infrastructure metrics with service context.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  annotations:
    # Use resource.opentelemetry.io/ for the k8sattributes processor
    resource.opentelemetry.io/service.name: "<SERVICE_NAME>"
    resource.opentelemetry.io/service.version: "<SERVICE_VERSION>"
    resource.opentelemetry.io/deployment.environment.name: "<ENVIRONMENT>"
spec:
  template:
    metadata:
      annotations:
        resource.opentelemetry.io/service.name: "<SERVICE_NAME>"
        resource.opentelemetry.io/service.version: "<SERVICE_VERSION>"
        resource.opentelemetry.io/deployment.environment.name: "<ENVIRONMENT>"
# ... rest of the manifest

Data collected

This integration collects metrics using several OpenTelemetry receivers.

kube-state-metrics (using Prometheus receiver)

Metrics scraped from the kube-state-metrics endpoint provide information about the state of Kubernetes API objects.

Kubelet stats receiver

The kubeletstatsreceiver collects metrics from the Kubelet on each node, focusing on pod, container, and volume resource usage.

OTELDATADOGDESCRIPTIONFILTER
container.filesystem.usagekubernetes.filesystem.usageContainer filesystem usage
container.filesystem.usagekubernetes.filesystem.usageContainer filesystem usage
container.memory.rsskubernetes.memory.rssContainer memory rss
k8s.node.filesystem.usagekubernetes.filesystem.usageNode filesystem usage
k8s.node.filesystem.usagekubernetes.filesystem.usageNode filesystem usage
k8s.node.memory.rsskubernetes.memory.rssNode memory rss
k8s.node.memory.usagekubernetes.memory.usageNode memory usage
k8s.node.memory.usagekubernetes.memory.usageNode memory usage
k8s.node.network.errorskubernetes.network.tx_errorsNode network errorsdirection: transmit
k8s.node.network.errorskubernetes.network.rx_errorsNode network errorsdirection: receive
k8s.node.network.iokubernetes.io.read_bytesNode network IOdirection: receive
k8s.node.network.iokubernetes.io.write_bytesNode network IOdirection: transmit
k8s.node.network.iokubernetes.network.rx_bytesNode network IOdirection: receive
k8s.node.network.iokubernetes.network.tx_bytesNode network IOdirection: transmit
k8s.pod.filesystem.usagekubernetes.filesystem.usagePod filesystem usage
k8s.pod.filesystem.usagekubernetes.filesystem.usagePod filesystem usage
k8s.pod.memory.node.utilizationkubernetes.memory.usage_pctPod memory utilization as a ratio of the node’s capacity
k8s.pod.memory.rsskubernetes.memory.rssPod memory rss
k8s.pod.memory.usagekubernetes.memory.usagePod memory usage
k8s.pod.memory.usagekubernetes.memory.usagePod memory usage
k8s.pod.network.errorskubernetes.network.tx_errorsPod network errorsdirection: transmit
k8s.pod.network.errorskubernetes.network.rx_errorsPod network errorsdirection: receive
k8s.pod.network.iokubernetes.network.rx_bytesPod network IOdirection: receive
k8s.pod.network.iokubernetes.network.tx_bytesPod network IOdirection: transmit
k8s.pod.network.iokubernetes.io.write_bytesPod network IOdirection: transmit
k8s.pod.network.iokubernetes.io.read_bytesPod network IOdirection: receive

Kubernetes cluster receiver

The k8sclusterreceiver collects cluster-level metrics, such as the status and count of nodes, pods, and other objects.

OTELDATADOGDESCRIPTION
k8s.container.cpu_limitkubernetes.cpu.limitsMaximum resource limit set for the container. See https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#resourcerequirements-v1-core for details
k8s.container.cpu_requestkubernetes.cpu.requestsResource requested for the container. See https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#resourcerequirements-v1-core for details
k8s.container.memory_requestkubernetes.memory.requestsResource requested for the container. See https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#resourcerequirements-v1-core for details
k8s.job.active_podskubernetes.pods.runningThe number of actively running pods for a job

Host metrics receiver

The hostmetricsreceiver gathers system-level metrics from each node in the cluster.

OTELDATADOGDESCRIPTIONFILTERTRANSFORM
system.cpu.load_average.15msystem.load.15Average CPU Load over 15 minutes.
system.cpu.load_average.1msystem.load.1Average CPU Load over 1 minute.
system.cpu.load_average.5msystem.load.5Average CPU Load over 5 minutes.
system.cpu.utilizationsystem.cpu.idleDifference in system.cpu.time since the last measurement per logical CPU, divided by the elapsed time (value in interval [0,1]).state: idle× 100
system.cpu.utilizationsystem.cpu.iowaitDifference in system.cpu.time since the last measurement per logical CPU, divided by the elapsed time (value in interval [0,1]).state: wait× 100
system.cpu.utilizationsystem.cpu.stolenDifference in system.cpu.time since the last measurement per logical CPU, divided by the elapsed time (value in interval [0,1]).state: steal× 100
system.cpu.utilizationsystem.cpu.systemDifference in system.cpu.time since the last measurement per logical CPU, divided by the elapsed time (value in interval [0,1]).state: system× 100
system.cpu.utilizationsystem.cpu.userDifference in system.cpu.time since the last measurement per logical CPU, divided by the elapsed time (value in interval [0,1]).state: user× 100
system.filesystem.utilizationsystem.disk.in_useFraction of filesystem bytes used.
system.filesystem.utilizationsystem.disk.in_useFraction of filesystem bytes used.
system.memory.usagesystem.mem.totalBytes of memory in use.× 1048576
system.memory.usagesystem.mem.usableBytes of memory in use.state: free, cached, buffered× 1048576
system.network.iosystem.net.bytes_rcvdThe number of bytes transmitted and received.direction: receive
system.network.iosystem.net.bytes_sentThe number of bytes transmitted and received.direction: transmit
system.paging.usagesystem.swap.freeSwap (unix) or pagefile (windows) usage.state: free× 1048576
system.paging.usagesystem.swap.usedSwap (unix) or pagefile (windows) usage.state: used× 1048576

See OpenTelemetry Metrics Mapping for more information.

Further reading