---
title: Kubernetes Controller Manager
description: Monitors the Kubernetes Controller Manager
breadcrumbs: Docs > Integrations > Kubernetes Controller Manager
---

# Kubernetes Controller Manager
Supported OS Integration version8.4.1
{% callout %}
# Important note for users on the following Datadog sites: us2.ddog-gov.com

{% alert level="info" %}
To find out if this integration is available in your organization, see your [Datadog Integrations](https://app.datadoghq.com/integrations) page or ask your organization administrator.

To initiate an exception request to enable this integration for your organization, email [support@ddog-gov.com](mailto:support@ddog-gov.com).
{% /alert %}

{% /callout %}


## Overview{% #overview %}

This check monitors the [Kubernetes Controller Manager](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager), part of the Kubernetes control plane.

**Note**: This check does not collect data for Amazon EKS clusters, as those services are not exposed.

## Setup{% #setup %}

### Installation{% #installation %}

The Kubernetes Controller Manager check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package, so you do not need to install anything else on your server.

### Configuration{% #configuration %}

1. Edit the `kube_controller_manager.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory to start collecting your kube_controller_manager performance data. See the [sample kube_controller_manager.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/kube_controller_manager/datadog_checks/kube_controller_manager/data/conf.yaml.example) for all available configuration options.

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands.md#start-stop-and-restart-the-agent)

This integration requires access to the controller manager's metric endpoint. To have access to the metric endpoint you should:

- have access to the IP/Port of the controller-manager process
- have `get` RBAC permissions to the /metrics endpoint (the default Datadog Helm chart already adds the right RBAC roles and bindings for this)

### Validation{% #validation %}

[Run the Agent's `status` subcommand](https://docs.datadoghq.com/agent/guide/agent-commands.md#agent-status-and-information) and look for `kube_controller_manager` under the Checks section.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **kube\_controller\_manager.goroutines**(gauge)                                            | Number of goroutines that currently exist                                                                                      |
| **kube\_controller\_manager.job\_controller.terminated\_pods\_tracking\_finalizer**(count) | Used to monitor whether the job controller is removing Pod finalizers from terminated Pods after accounting them in Job status |
| **kube\_controller\_manager.leader\_election.lease\_duration**(gauge)                      | Duration of the leadership lease                                                                                               |
| **kube\_controller\_manager.leader\_election.transitions**(count)                          | Number of leadership transitions observed                                                                                      |
| **kube\_controller\_manager.max\_fds**(gauge)                                              | Maximum allowed open file descriptors                                                                                          |
| **kube\_controller\_manager.nodes.count**(gauge)                                           | Number of registered nodes, per zone                                                                                           |
| **kube\_controller\_manager.nodes.evictions**(count)                                       | Count of node eviction events, per zone                                                                                        |
| **kube\_controller\_manager.nodes.unhealthy**(gauge)                                       | Number of unhealthy nodes, per zone                                                                                            |
| **kube\_controller\_manager.open\_fds**(gauge)                                             | Number of open file descriptors                                                                                                |
| **kube\_controller\_manager.queue.adds**(count)                                            | Elements added, by queue                                                                                                       |
| **kube\_controller\_manager.queue.depth**(gauge)                                           | Current depth, by queue                                                                                                        |
| **kube\_controller\_manager.queue.latency.count**(gauge)                                   | Processing latency count, by queue (deprecated in kubernetes v1.14)                                                            |
| **kube\_controller\_manager.queue.latency.quantile**(gauge)                                | Processing latency quantiles, by queue (deprecated in kubernetes v1.14)*Shown as microsecond*                                  |
| **kube\_controller\_manager.queue.latency.sum**(gauge)                                     | Processing latency sum, by queue (deprecated in kubernetes v1.14)*Shown as microsecond*                                        |
| **kube\_controller\_manager.queue.process\_duration.count**(gauge)                         | How long processing an item from workqueue takes, by queue                                                                     |
| **kube\_controller\_manager.queue.process\_duration.sum**(gauge)                           | Total workqueue processing time, by queue*Shown as second*                                                                     |
| **kube\_controller\_manager.queue.queue\_duration.count**(gauge)                           | How long item stays in a queue before being requested, by queue                                                                |
| **kube\_controller\_manager.queue.queue\_duration.sum**(gauge)                             | Total time of items stays in a queue before being requested, by queue*Shown as second*                                         |
| **kube\_controller\_manager.queue.retries**(count)                                         | Retries handled, by queue                                                                                                      |
| **kube\_controller\_manager.queue.work\_duration.count**(gauge)                            | Work duration, by queue (deprecated in kubernetes v1.14)                                                                       |
| **kube\_controller\_manager.queue.work\_duration.quantile**(gauge)                         | Work duration quantiles, by queue (deprecated in kubernetes v1.14)*Shown as microsecond*                                       |
| **kube\_controller\_manager.queue.work\_duration.sum**(gauge)                              | Work duration sum, by queue (deprecated in kubernetes v1.14)*Shown as microsecond*                                             |
| **kube\_controller\_manager.queue.work\_longest\_duration**(gauge)                         | How many seconds has the longest running processor been running, by queue*Shown as second*                                     |
| **kube\_controller\_manager.queue.work\_unfinished\_duration**(gauge)                      | How many seconds of work has done that is in progress and hasn't been observed by process_duration, by queue*Shown as second*  |
| **kube\_controller\_manager.rate\_limiter.use**(gauge)                                     | Usage of the rate limiter, by limiter                                                                                          |
| **kube\_controller\_manager.slis.kubernetes\_healthcheck**(gauge)                          | Result of a single controller manager healthcheck (alpha; requires k8s v1.26+)                                                 |
| **kube\_controller\_manager.slis.kubernetes\_healthcheck\_total**(count)                   | Cumulative results of all controller manager healthchecks (alpha; requires k8s v1.26+)                                         |
| **kube\_controller\_manager.threads**(gauge)                                               | Number of OS threads created                                                                                                   |

### Events{% #events %}

The Kubernetes Controller Manager check does not include any events.

### Service Checks{% #service-checks %}

**kube\_controller\_manager.prometheus.health**

Returns `CRITICAL` if the check cannot access the metrics endpoint.

*Statuses: ok, critical*

**kube\_controller\_manager.leader\_election.status**

Returns `CRITICAL` if no replica is currently set as leader.

*Statuses: ok, critical*

**kube\_controller\_manager.up**

Returns `CRITICAL` if Kube Controller Manager is not healthy.

*Statuses: ok, critical*

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog Support](https://docs.datadoghq.com/help/).