Kubernetes Controller Manager

Supported OS Linux Windows Mac OS

Integration version7.0.0

Kube Controller Manager dashboard

Overview

This check monitors the Kubernetes Controller Manager, part of the Kubernetes control plane.

Note: This check does not collect data for Amazon EKS clusters, as those services are not exposed.

Setup

Installation

The Kubernetes Controller Manager check is included in the Datadog Agent package, so you do not need to install anything else on your server.

Configuration

  1. Edit the kube_controller_manager.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your kube_controller_manager performance data. See the sample kube_controller_manager.d/conf.yaml for all available configuration options.

  2. Restart the Agent

This integration requires access to the controller manager’s metric endpoint. To have access to the metric endpoint you should:

  • have access to the IP/Port of the controller-manager process
  • have get RBAC permissions to the /metrics endpoint (the default Datadog Helm chart already adds the right RBAC roles and bindings for this)

Validation

Run the Agent’s status subcommand and look for kube_controller_manager under the Checks section.

Data Collected

Metrics

kube_controller_manager.goroutines
(gauge)
Number of goroutines that currently exist
kube_controller_manager.job_controller.terminated_pods_tracking_finalizer
(count)
Used to monitor whether the job controller is removing Pod finalizers from terminated Pods after accounting them in Job status
kube_controller_manager.leader_election.lease_duration
(gauge)
Duration of the leadership lease
kube_controller_manager.leader_election.transitions
(count)
Number of leadership transitions observed
kube_controller_manager.max_fds
(gauge)
Maximum allowed open file descriptors
kube_controller_manager.nodes.count
(gauge)
Number of registered nodes, per zone
kube_controller_manager.nodes.evictions
(count)
Count of node eviction events, per zone
kube_controller_manager.nodes.unhealthy
(gauge)
Number of unhealthy nodes, per zone
kube_controller_manager.open_fds
(gauge)
Number of open file descriptors
kube_controller_manager.queue.adds
(count)
Elements added, by queue
kube_controller_manager.queue.depth
(gauge)
Current depth, by queue
kube_controller_manager.queue.latency.count
(gauge)
Processing latency count, by queue (deprecated in kubernetes v1.14)
kube_controller_manager.queue.latency.quantile
(gauge)
Processing latency quantiles, by queue (deprecated in kubernetes v1.14)
Shown as microsecond
kube_controller_manager.queue.latency.sum
(gauge)
Processing latency sum, by queue (deprecated in kubernetes v1.14)
Shown as microsecond
kube_controller_manager.queue.process_duration.count
(gauge)
How long processing an item from workqueue takes, by queue
kube_controller_manager.queue.process_duration.sum
(gauge)
Total workqueue processing time, by queue
Shown as second
kube_controller_manager.queue.queue_duration.count
(gauge)
How long item stays in a queue before being requested, by queue
kube_controller_manager.queue.queue_duration.sum
(gauge)
Total time of items stays in a queue before being requested, by queue
Shown as second
kube_controller_manager.queue.retries
(count)
Retries handled, by queue
kube_controller_manager.queue.work_duration.count
(gauge)
Work duration, by queue (deprecated in kubernetes v1.14)
kube_controller_manager.queue.work_duration.quantile
(gauge)
Work duration quantiles, by queue (deprecated in kubernetes v1.14)
Shown as microsecond
kube_controller_manager.queue.work_duration.sum
(gauge)
Work duration sum, by queue (deprecated in kubernetes v1.14)
Shown as microsecond
kube_controller_manager.queue.work_longest_duration
(gauge)
How many seconds has the longest running processor been running, by queue
Shown as second
kube_controller_manager.queue.work_unfinished_duration
(gauge)
How many seconds of work has done that is in progress and hasn't been observed by process_duration, by queue
Shown as second
kube_controller_manager.rate_limiter.use
(gauge)
Usage of the rate limiter, by limiter
kube_controller_manager.slis.kubernetes_healthcheck
(gauge)
Result of a single controller manager healthcheck (alpha; requires k8s v1.26+)
kube_controller_manager.slis.kubernetes_healthcheck_total
(count)
Cumulative results of all controller manager healthchecks (alpha; requires k8s v1.26+)
kube_controller_manager.threads
(gauge)
Number of OS threads created

Events

The Kubernetes Controller Manager check does not include any events.

Service Checks

kube_controller_manager.prometheus.health
Returns CRITICAL if the check cannot access the metrics endpoint.
Statuses: ok, critical

kube_controller_manager.leader_election.status
Returns CRITICAL if no replica is currently set as leader.
Statuses: ok, critical

kube_controller_manager.up
Returns CRITICAL if Kube Controller Manager is not healthy.
Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog Support.