Admission Controller responds to the creation of new pods within your Kubernetes cluster: at pod creation, the Cluster Agent receives a request from Kubernetes and responds with the details of what changes (if any) to make to the pod.
Therefore, Admission Controller does not mutate existing pods within your cluster. If you recently enabled the Admission Controller or made other environmental changes, delete your existing pod and let Kubernetes recreate it. This ensures that Admission Controller updates your pod.
The Cluster Agent responds to labels and annotations on the created pod—not the workload (Deployment, DaemonSet, CronJob, etc.) that created that pod. Ensure that your pod template references this accordingly.
Admission Controller’s injection mode (socket, hostip, service) is set by the configuration of your Cluster Agent. For example, if you have socket mode enabled in your Agent, Admission Controller also uses socket mode.
If you are using GKE Autopilot or OpenShift, you need to use a specific injection mode.
GKE Autopilot restricts the use of any volumes with a hostPath. Therefore, if Admission Controller uses socket mode, the Pods are blocked from scheduling by the GKE Warden.
Enabling GKE Autopilot mode in the Helm chart disables the socket mode to prevent this from ocurring. To enable APM, enable the port and use the hostip or service method instead. The Admission Controller will default to hostip to match.
OpenShift has SecurityContextConstraints (SCCs) that are required to deploy pods with extra permissions, such as a volume with a hostPath. Datadog components are deployed with SCCs to allow activity specific to Datadog pods, but Datadog does not create SCCs for other pods. The Admission Controller might add the socket based configuration to your application pods, causing them to fail to deploy.
If you are using OpenShift, use hostip mode. The following configuration enables hostip mode by disabling the socket options:
The Cluster Agent’s status output provides information to verify that it has created the datadog-webhook for the MutatingWebhookConfiguration and has a valid certificate.
Run the following command:
% kubectl exec -it <Cluster Agent Pod> -- agent status
<TIMESTAMP> | CLUSTER | INFO | (pkg/clusteragent/admission/controllers/secret/controller.go:73 in Run) | Starting secrets controller for default/webhook-certificate
<TIMESTAMP> | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:148 in enqueue) | Adding object with key default/webhook-certificate to the queue
<TIMESTAMP> | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/secret/controller.go:140 in enqueue) | Adding object with key default/webhook-certificate to the queue
<TIMESTAMP> | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:148 in enqueue) | Adding object with key datadog-webhook to the queue
<TIMESTAMP> | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/util.go:47 in func1) | Sync done for informer admissionregistration.k8s.io/v1/mutatingwebhookconfigurations in 101.116625ms, last resource version: 152728
<TIMESTAMP> | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/webhook/controller_v1.go:140 in reconcile) | The Webhook datadog-webhook was found, updating it
<TIMESTAMP> | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/secret/controller.go:211 in reconcile) | The certificate is up-to-date, doing nothing. Duration before expiration: 8558h17m27.909792831s
<TIMESTAMP> | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/secret/controller.go:174 in processNextWorkItem) | Secret default/webhook-certificate reconciled successfully
<TIMESTAMP> | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:176 in processNextWorkItem) | Webhook datadog-webhook reconciled successfully
If do not see that the datadog-webhook webhook has been reconciled successfully, ensure that you have correctly enabled Admission Controller according to the configuration instructions.
If you see errors with the injection for a given pod, contact Datadog support with your Datadog configuration and your pod configuration.
If you do not see the injection attempts for any pod, verify your mutateUnlabelled settings and ensure your pod labels match up with the expected values. If these match up, your problem is likely with the networking between the control plane, webhook, and service. See Networking for further information.
Kubernetes Network Policies help you control different ingress (inbound) and egress (outbound) flows of traffic to your pods.
If you are using network policies, Datadog recommends creating corresponding policies for the Cluster Agent to ensure connectivity to the pod over this port. You can do this with the following configuration:
When a pod is created, the Kubernetes cluster sends a request from the control plane, to datadog-webhook, through the service, and finally to the Cluster Agent pod. This request requires inbound connectivity from the control plane to the node that the Cluster Agent is on, over its Admission Controller port (8000). After this request is resolved, the Cluster Agent mutates your pod to configure the network connection for the Datadog tracer.
Depending on your Kubernetes distribution, this may have some additional requirements for your security rules and Admission Controller settings.
In an EKS cluster, you can deploy the Cluster Agent pod on any of your Linux-based nodes by default. These nodes and their EC2 instances need a security group with the following inbound rule:
Protocol: TCP
Port range: 8000, or a range that covers 8000
Source: The ID of either the cluster security group, or one of your cluster’s additional security groups. You can find these IDs in the EKS console, under the Networking tab for your EKS cluster.
This security group rule allows the control plane to access the node and the downstream Cluster Agent over port 8000.
If you have multiple managed node groups, each with distinct security groups, add this inbound rule to each security group.
Then, delete one of your pods to re-trigger a request through Admission Controller. When the request fails, you can view logs that resemble the following:
W0908 <TIMESTAMP> 10 dispatcher.go:202] Failed calling webhook, failing open datadog.webhook.auto.instrumentation: failed calling webhook "datadog.webhook.auto.instrumentation": failed to call webhook: Post "https://datadog-cluster-agent-admission-controller.default.svc:443/injectlib?timeout=10s": context deadline exceeded
E0908 <TIMESTAMP> 10 dispatcher.go:206] failed calling webhook "datadog.webhook.auto.instrumentation": failed to call webhook: Post "https://datadog-cluster-agent-admission-controller.default.svc:443/injectlib?timeout=10s": context deadline exceeded
These failures are relative to a Cluster Agent deployed in the default namespace; the DNS name adjusts relative to the namespace used.
You may also see failures for the other Admission Controller webhooks, such as datadog.webhook.tags and datadodg.webhook.config.
Note: EKS often generates two log streams within the CloudWatch log group for the cluster. Be sure to check both for these types of logs.
You can also edit an existing rule. By default, the network for your cluster has a firewall rule named gke-<CLUSTER_NAME>-master. Ensure that this rule’s source filters include your cluster control plane’s CIDR block. Edit this rule to allow access over protocol tcp on port 8000.
If you are using Rancher with an EKS cluster or a private GKE cluster, additional configuration is required. For more information, see Rancher Webhook - Common Issues in the Rancher documentation.
Note: Since Datadog’s Admission Controller’s webhook operates similarly to the Rancher webhook, Datadog needs access to port 8000 instead of Rancher’s 9443.