This section aims to document specifics and to provide good base configuration for all major Kubernetes distributions.
These configurations can then be customized to add any Datadog feature.
In an EKS cluster, you can install the Operator using Helm or as an EKS add-on.
The configuration below is meant to work with either setup (Helm or EKS add-on) when the Agent is installed in the same namespace as the Datadog Operator.
AKS requires a specific configuration for the Kubelet integration due to how AKS has set up the SSL Certificates. Additionally, the optional Admission Controller feature requires a specific configuration to prevent an error when reconciling the webhook.
Replace <DATADOG_SITE> with your Datadog site. Your site is datadoghq.com. (Ensure that the correct SITE for your account is selected on the right of this page).
Using spec.nodeName keeps TLS verification. In some clusters, DNS resolution for spec.nodeName inside Pods may not work in AKS. This has been reported on all AKS Windows nodes, as well as Linux nodes when the cluster is set up in a Virtual Network using custom DNS. In this case, use the first AKS configuration provided: remove any settings for the Kubelet host path (which defaults to status.hostIP) and use tlsVerify: false. This setting is required. Do NOT set the Kubelet host path and tlsVerify: false in the same configuration.
GKE can be configured in two different mode of operation:
Standard: You manage the cluster’s underlying infrastructure, giving you node configuration flexibility.
Autopilot: GKE provisions and manages the cluster’s underlying infrastructure, including nodes and node pools, giving you an optimized cluster with a hands-off experience.
Depending on the operation mode of your cluster, the Datadog Agent needs to be configured differently.
Since Agent 7.26, no specific configuration is required for GKE (whether you run Docker or containerd).
Note: When using COS (Container Optimized OS), the eBPF-based OOM Kill and TCP Queue Length checks are supported starting from the version 3.0.1 of the Helm chart. To enable these checks, configure the following setting:
datadog.systemProbe.enableDefaultKernelHeadersPaths to false.
GKE Autopilot requires some configuration, shown below.
Datadog recommends that you specify resource limits for the Agent container. Autopilot sets a relatively low default limit (50m CPU, 100Mi memory) that may lead the Agent container to quickly OOMKill depending on your environment. If applicable, also specify resource limits for the Trace Agent and Process Agent containers. Additionally, you may wish to create a priority class for the Agent to ensure it is scheduled.
Note: Cloud Network Monitoring is supported from version 3.100.0 of the Helm chart and with GKE version 1.32.1-gke.1729000 or later
datadog:apiKey:<DATADOG_API_KEY>appKey:<DATADOG_APP_KEY>clusterName:<CLUSTER_NAME># The site of the Datadog intake to send Agent data to (example: `us3.datadoghq.com`)# Default value is `datadoghq.com' (the US1 site)# Documentation: https://docs.datadoghq.com/getting_started/site/site:<DATADOG_SITE>agents:containers:agent:# resources for the Agent containerresources:requests:cpu:200mmemory:256MitraceAgent:# resources for the Trace Agent containerresources:requests:cpu:100mmemory:200MiprocessAgent:# resources for the Process Agent containerresources:requests:cpu:100mmemory:200MisystemProbe:# resources for the System Probe containerresources:requests:cpu:100mmemory:400MipriorityClassCreate:trueproviders:gke:autopilot:true
Using Spot Pods in GKE Autopilot clusters introduces taints to the corresponding Spot GKE nodes. When using Spot Pods, additional configuration is required to provide the Agent DaemonSet with a matching toleration.
agents:#(...)# agents.tolerations -- Allow the DaemonSet to schedule on tainted nodes (requires Kubernetes >= 1.6)tolerations:- effect:NoSchedulekey:cloud.google.com/gke-spotoperator:Equalvalue:"true"
Similarly when using GKE Autopilot Compute classes to run workloads that have specific hardware requirements, take note of the taints that GKE Autopilot is applying to these specific nodes and add matching tolerations to the Agent DaemonSet. You can match the tolerations on your corresponding pods. For example for the Scale-Out compute class use a toleration like:
OpenShift comes with hardened security by default with SELinux and SecurityContextConstraints (SCC). As a result, it requires some specific configurations:
Elevated SCC access for the Node Agent and Cluster Agent
Kubelet API certificates may not always be signed by cluster CA
Tolerations are required to schedule the Node Agent on master and infra nodes
Cluster name should be set as it cannot be retrieved automatically from cloud provider
(Optional) Set hostNetwork: true in the Node Agent to allow the Agent to make requests to cloud provider metadata services (IMDS)
This core configuration supports OpenShift 3.11 and OpenShift 4, but it works best with OpenShift 4.
Additionally log collection and APM have slightly different requirements as well.
The use of Unix Domain Socket (UDS) for APM and DogStatsD can work in OpenShift. However, Datadog does not recommend this, as it requires additional privileged permissions and SCC access to both your Datadog Agent pod and your application pod. Without these, your application pod can fail to deploy. Datadog recommends disabling the UDS option to avoid this, allowing the Admission Controller to inject the appropriate TCP/IP setting or Service setting for APM connectivity.
When using the Datadog Operator in OpenShift, Datadog recommends that you use the Operator Lifecycle Manager to deploy the Datadog Operator from OperatorHub in your OpenShift Cluster web console. Refer to the Operator install steps. The configuration below works with that setup, which creates the ClusterRole and ClusterRoleBinding based access to the SCC for the specified ServiceAccount datadog-agent-scc. This DatadogAgent configuration should be deployed in the same namespace as the Datadog Operator.
kind:DatadogAgentapiVersion:datadoghq.com/v2alpha1metadata:name:datadognamespace:openshift-operators# set as the same namespace where the Datadog Operator was deployedspec:features:logCollection:enabled:truecontainerCollectAll:trueapm:enabled:truehostPortConfig:enabled:trueunixDomainSocketConfig:enabled:falsedogstatsd:unixDomainSocketConfig:enabled:falseglobal:credentials:apiKey:<DATADOG_API_KEY>appKey:<DATADOG_APP_KEY>clusterName:<CLUSTER_NAME>kubelet:tlsVerify:falseoverride:clusterAgent:serviceAccountName:datadog-agent-sccnodeAgent:serviceAccountName:datadog-agent-scchostNetwork:truesecurityContext:runAsUser:0seLinuxOptions:level:s0role:system_rtype:spc_tuser:system_utolerations:- key:node-role.kubernetes.io/masteroperator:Existseffect:NoSchedule- key:node-role.kubernetes.io/infraoperator:Existseffect:NoSchedule
Note: The nodeAgent.securityContext.seLinuxOptions override is necessary for log collection when deploying with the Operator. If log collection is not enabled, you can omit this override.
The configuration below creates custom SCCs for the Agent and Cluster Agent Service Accounts.
TKG requires some small configuration changes, shown below. For example, setting a toleration is required for the controller to schedule the Node Agent on the master nodes.
datadog:apiKey:<DATADOG_API_KEY>appKey:<DATADOG_APP_KEY>kubelet:# Set tlsVerify to false since the Kubelet certificates are self-signedtlsVerify:false# Disable the `kube-state-metrics` dependency chart installation.kubeStateMetricsEnabled:false# Enable the new `kubernetes_state_core` check.kubeStateMetricsCore:enabled:true# Add a toleration so that the agent can be scheduled on the control plane nodes.agents:tolerations:- key:node-role.kubernetes.io/mastereffect:NoSchedule
Additional helpful documentation, links, and articles: