Kubernetes v1.2 introduced Horizontal Pod Autoscaling. Kubernetes v1.6 introduced autoscaling off of custom metrics that are user-defined and are collected from within the cluster. Kubernetes v1.10 introduced support for external metrics so that users can autoscale off of any metric from outside the cluster that is collected for you by Datadog. The custom and external metric providers, as opposed to the metrics server, are resources that have to be implemented and registered by the user. As of v1.0.0, the Custom Metrics Server in the Datadog Cluster Agent implements the External Metrics Provider interface for external metrics.
This guide explains how to set up and autoscale your Kubernetes workload based off of your Datadog metrics.
Autoscaling over external metrics does not require the Node Agent to be running — you only need the metrics to be available in your Datadog account. Nevertheless, this guide describes autoscaling an NGINX deployment based on NGINX metrics, collected by a Node Agent.
The following assumptions are made:
To spin up the Datadog Cluster Agent, perform the following steps:
Create appropriate RBAC rules. The Datadog Cluster Agent acts as a proxy between the API Server and the Node Agent, and it needs to have access to some cluster-level resources.
kubectl apply -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/cluster-agent/agent-rbac.yaml"
Which should produce the following output:
clusterrole.rbac.authorization.k8s.io "datadog-cluster-agent" created clusterrolebinding.rbac.authorization.k8s.io "datadog-cluster-agent" created serviceaccount "datadog-cluster-agent" created
Create the Datadog Cluster Agent and its services. Add your
<APP_KEY> in the Deployment manifest of the Datadog Cluster Agent.
Enable HPA processing by setting the
DD_EXTERNAL_METRICS_PROVIDER_ENABLED variable to
Spin up the resources:
kubectl apply -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/cluster-agent/agent-services.yaml"
kubectl apply -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/hpa-example/cluster-agent-hpa-svc.yaml"
kubectl apply -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/cluster-agent/cluster-agent-deployment.yaml"
Note: The first service is used for the communication between the Node Agents and the Datadog Cluster Agent, but the second is used by Kubernetes to register the External Metrics Provider.
At this point, you should see:
PODS: NAMESPACE NAME READY STATUS RESTARTS AGE default datadog-cluster-agent-7b7f6d5547-cmdtc 1/1 Running 0 28m SVCS: NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default datadog-custom-metrics-server ClusterIP 192.168.254.87 <none> 443/TCP 28m default datadog-cluster-agent ClusterIP 192.168.254.197 <none> 5005/TCP 28m
Once the Datadog Cluster Agent is up and running, register it as an External Metrics Provider with the service, exposing the port
443. To do so, apply the following RBAC rules:
kubectl apply -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/hpa-example/rbac-hpa.yaml"
Which should produce the following results:
clusterrolebinding.rbac.authorization.k8s.io "system:auth-delegator" created rolebinding.rbac.authorization.k8s.io "dca" created apiservice.apiregistration.k8s.io "v1beta1.external.metrics.k8s.io" created clusterrole.rbac.authorization.k8s.io "external-metrics-reader" created clusterrolebinding.rbac.authorization.k8s.io "external-metrics-reader" created
Once you have the Datadog Cluster Agent running and the service registered, create an HPA manifest and let the Datadog Cluster Agent pull metrics from Datadog.
At this point, you should see:
PODS NAMESPACE NAME READY STATUS RESTARTS AGE default datadog-agent-4c5pp 1/1 Running 0 14m default datadog-agent-ww2da 1/1 Running 0 14m default datadog-agent-2qqd3 1/1 Running 0 14m [...] default datadog-cluster-agent-7b7f6d5547-cmdtc 1/1 Running 0 16m
Now, create a Horizontal Pod Autoscaler manifest. If you take a look at the hpa-manifest.yaml file, you see that:
5and the minimum is
nginx.net.request_per_s, and the scope is
kube_container_name: nginx. This metric format corresponds to the Datadog one.
Every 30 seconds, Kubernetes queries the Datadog Cluster Agent to get the value of this metric and autoscales proportionally if necessary. For advanced use cases, it is possible to have several metrics in the same HPA, as you can see in the Kubernetes horizontal pod autoscale documentation. The largest of the proposed values is the one chosen.
Create the NGINX deployment:
kubectl apply -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/hpa-example/nginx.yaml"
Then, apply the HPA manifest.
kubectl apply -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/hpa-example/hpa-manifest.yaml"
You should be seeing your NGINX pod running with the corresponding service:
POD: default nginx-6757dd8769-5xzp2 1/1 Running 0 3m SVC: NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default nginx ClusterIP 192.168.251.36 none 8090/TCP 3m HPAS: NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE default nginxext Deployment/nginx 0/9 (avg) 1 3 1 3m
At this point, the setup is ready to be stressed. As a result of the stress, Kubernetes autoscales the NGINX pods.
Curl the IP of the NGINX service as follows:
You should receive output resembling:
$ curl 192.168.254.216:8090/nginx_status Active connections: 1 server accepts handled requests 1 1 1 Reading: 0 Writing: 1 Waiting: 0
Behind the scenes, the number of requests per second also increased. This metric is being collected by the Node Agent, using Autodiscovery detecting the NGINX pod through its annotations. For more information on how Autodiscovery works, see the Autodiscovery documentation. So, if you stress it, you should see the uptick in your Datadog app. As you reference this metric in your HPA manifest, the Datadog Cluster Agent is also pulling the latest value regularly. Then, as Kubernetes queries the Datadog Cluster Agent to get this value, it notices that the number is going above the threshold and autoscales accordingly.
To do this, run:
while true; do curl <nginx_svc>:8090/nginx_status; sleep 0.1; done
You should soon see the number of requests per second spiking and going above 9, the threshold over which the NGINX pods autoscale. Then, you should see new NGINX pods being created:
PODS: NAMESPACE NAME READY STATUS RESTARTS AGE default datadog-cluster-agent-7b7f6d5547-cmdtc 1/1 Running 0 9m default nginx-6757dd8769-5xzp2 1/1 Running 0 2m default nginx-6757dd8769-k6h6x 1/1 Running 0 2m default nginx-6757dd8769-vzd5b 1/1 Running 0 29m HPAS: NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE default nginxext Deployment/nginx 30/9 (avg) 1 3 3 29m