The Datadog Cluster Agent provides a streamlined, centralized approach to collecting cluster-level monitoring data. By acting as a proxy between the API server and node-based Agents, the Cluster Agent helps to alleviate server load. It also relays cluster-level metadata to node-based Agents, allowing them to enrich the metadata of locally collected metrics.
Using the Datadog Cluster Agent helps you to:
Note: To leverage all features from the Datadog Cluster Agent, you must run Kubernetes v1.10+.
Review the manifests in the Datadog Cluster Agent RBAC folder.
datadog-agent directory, and run:
kubectl apply -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/rbac/clusterrole.yaml" kubectl apply -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/rbac/serviceaccount.yaml" kubectl apply -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/rbac/clusterrolebinding.yaml"
This creates the appropriate ServiceAccount, ClusterRole, and ClusterRoleBinding.
Secure communication between the Agent and the Cluster Agent by creating a secret.
echo -n '<ThirtyX2XcharactersXlongXtoken>' | base64
Use this string in the
dca-secret.yaml file located in the
Alternatively, run this one-line command:
kubectl create secret generic datadog-auth-token --from-literal=token=<ThirtyX2XcharactersXlongXtoken>
Refer to this secret with the environment variable
DD_CLUSTER_AGENT_AUTH_TOKEN in the manifests of the Cluster Agent and the node-based Agent.
- name: DD_CLUSTER_AGENT_AUTH_TOKEN valueFrom: secretKeyRef: name: datadog-auth-token key: token
Note: This needs to be set in the manifest of the Cluster Agent and the node agent.
Alternatively, if you do not want to rely on environment variables, mount the
datadog.yaml file. Datadog recommends using a ConfigMap. To do so, add the following into the Cluster Agent manifest:
[...] volumeMounts: - name: "dca-yaml" mountPath: "/etc/datadog-agent/datadog.yaml" subPath: "datadog-cluster.yaml" volumes: - name: "dca-yaml" configMap: name: "dca-yaml" [...]
Then, create your
datadog-cluster.yaml with the variables of your choice. Create the ConfigMap accordingly:
kubectl create configmap dca-yaml --from-file datadog-cluster.yaml
Locate the following manifests, and replace
<DD_API_KEY> with your API key:
kubectl apply -f Dockerfiles/manifests/cluster-agent/datadog-cluster-agent_service.yaml
kubectl apply -f Dockerfiles/manifests/cluster-agent/cluster-agent.yaml
At this point, you should see:
-> kubectl get deploy NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE datadog-cluster-agent 1 1 1 1 1d -> kubectl get secret NAME TYPE DATA AGE datadog-auth-token Opaque 1 1d -> kubectl get pods -l app=datadog-cluster-agent datadog-cluster-agent-8568545574-x9tc9 1/1 Running 0 2h -> kubectl get service -l app=datadog-cluster-agent NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE datadog-cluster-agent ClusterIP 10.100.202.234 none 5005/TCP 1d
Review the manifest found at
Dockerfiles/manifests/cluster-agent/rbac/rbac-agent.yaml. This limits an Agent’s access to the kubelet API.
kubectl apply -f Dockerfiles/manifests/cluster-agent/rbac/rbac-agent.yaml
Add the following environment variables to the
- name: DD_CLUSTER_AGENT_ENABLED value: 'true' - name: DD_CLUSTER_AGENT_AUTH_TOKEN valueFrom: secretKeyRef: name: datadog-auth-token key: token # value: "<ThirtyX2XcharactersXlongXtoken>" # If you are not using the secret, just set the string.
Create the DaemonSet with this command:
kubectl apply -f Dockerfiles/manifests/agent.yaml
kubectl get pods | grep agent
You should see:
datadog-agent-4k9cd 1/1 Running 0 2h datadog-agent-4v884 1/1 Running 0 2h datadog-agent-9d5bl 1/1 Running 0 2h datadog-agent-dtlkg 1/1 Running 0 2h datadog-agent-jllww 1/1 Running 0 2h datadog-agent-rdgwz 1/1 Running 0 2h datadog-agent-x5wk5 1/1 Running 0 2h [...] datadog-cluster-agent-8568545574-x9tc9 1/1 Running 0 2h
Kubernetes events are beginning to flow into your Datadog account, and relevant metrics collected by your Agents are tagged with their corresponding cluster-level metadata.
The available commands for the Datadog Cluster Agents are:
||Gives an overview of the components of the agent and their health.|
||Queries the local cache of the mapping between the pods living on
||Similarly to the node agent, the cluster agent can aggregate the logs and the configurations used and forward an archive to the support team or be deflated and used locally. Note this command is run from within the Cluster Agent pod.|
The following environment variables are supported:
||Your Datadog API key.|
||Hostname to use for the Datadog Cluster Agent.|
||Port for the Datadog Cluster Agent to serve, default is
||Enables the cluster level metadata mapping, default is
||Configures the agent to collect Kubernetes events. Default to
||Activates the leader election. You must set
||Used only if the leader election is activated. See the details in the leader election section. Value in seconds, 60 by default.|
||32 characters long token that needs to be shared between the node agent and the Datadog Cluster Agent.|
||Configures the namespace where the Cluster Agent creates the configmaps required for the Leader Election, the Event Collection (optional) and the Horizontal Pod Autoscaling.|
||Name of the Kubernetes service Cluster Agents are exposed through. Default is
||Frequency in seconds to query the API Server to resync the local cache. The default is 5 minutes.|
||Timeout in seconds of the client communicating with the API Server. Default is 60 seconds.|
||Change the port for fetching expvar public variables from the Datadog Cluster Agent. The default is port is
||Time waited in seconds to process a batch of metrics from multiple Autoscalers. Default to 10 seconds.|
||Maximum age in seconds of a datapoint before considering it invalid to be served. Default to 90 seconds.|
||Aggregator for the Datadog metrics. Applies to all Autoscalers processed. Chose among
||Size of the window in seconds used to query metric from Datadog. Default to 300 seconds.|
||Rate to resync local cache of processed metrics with the global store. Useful when there are several replicas of the Cluster Agent.|
||Enable Cluster Check Autodiscovery. Default is
||Additionnal Autodiscovery configuration providers to use.|
||Additionnal Autodiscovery listeners to run.|
||Cluster name, will be added as instance tag to all cluster check configurations.|
||Name of the instance tag set with the
||Time after which node-based Agents are considered down and removed from the pool. Default is 30 seconds.|
||Delay between acquiring leadership and starting the Cluster Checks logic, allows for all node-based Agents to register first. Default is 30 seconds.|
In order to collect events, you need the following environment variables in your
- name: DD_COLLECT_KUBERNETES_EVENTS value: "true" - name: DD_LEADER_ELECTION value: "true"
Enabling the leader election ensures that only one agent collects the events.
In the Node Agent, set the environment variable
DD_CLUSTER_AGENT_ENABLED to true.
The environment variable
DD_KUBERNETES_METADATA_TAG_UPDATE_FREQ can be set to specify how often the Node Agents hit the Datadog Cluster Agent.
Disable the Kubernetes metadata tag collection with
The Datadog Cluster Agent implements the External Metrics Provider’s interface (currently in beta). Therefore it can serve Custom Metrics to Kubernetes for Horizontal Pod Autoscalers. It is referred throughout the documentation as the Custom Metrics Server, per Kubernetes’ terminology.
To enable the Custom Metrics Server:
truein the Deployment of the Datadog Cluster Agent.
<DD_APP_KEY>as well as the
<DD_API_KEY>in the Deployment of the Datadog Cluster Agent.
Refer to the dedicated Custom metrics server guide to configure the Custom Metrics Server and get more details about this feature.
Note: An HPA is required for values to be served on the external metrics route.
Starting with version 1.2.0, the Datadog Cluster Agent can extend the Autodiscovery mechanism for non-containerized cluster resources. To enable this, make the following changes to the Cluster Agent deployment:
DD_CLUSTER_NAME. It will be injected as a
cluster_nameinstance tag to all configurations, to help you scope your metrics.
datadog-cluster-agent, ensure the
DD_CLUSTER_AGENT_KUBERNETES_SERVICE_NAMEenvironment variable reflects that.
Two configuration sources are currently supported, described in the Autodiscovery documentation:
/conf.dfolder, they will be automatically imported by the image’s entrypoint.
DD_EXTRA_LISTENERSenvironment variables to
Refer to the dedicated Cluster Checks Autodiscovery guide for more configuration and troubleshooting details on this feature.
Additional helpful documentation, links, and articles: