---
title: Kubernetes Cluster Autoscaler
description: Integration for Kubernetes Cluster Autoscaler
breadcrumbs: Docs > Integrations > Kubernetes Cluster Autoscaler
---

# Kubernetes Cluster Autoscaler
Supported OS Integration version3.4.1
{% callout %}
# Important note for users on the following Datadog sites: us2.ddog-gov.com

{% alert level="info" %}
To find out if this integration is available in your organization, see your [Datadog Integrations](https://app.datadoghq.com/integrations) page or ask your organization administrator.

To initiate an exception request to enable this integration for your organization, email [support@ddog-gov.com](mailto:support@ddog-gov.com).
{% /alert %}

{% /callout %}

## Overview{% #overview %}

This check monitors [Kubernetes Cluster Autoscaler](https://docs.datadoghq.com/integrations/kubernetes_cluster_autoscaler.md) through the Datadog Agent.

**Minimum Agent version:** 7.54.1

## Setup{% #setup %}

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the [Autodiscovery Integration Templates](https://docs.datadoghq.com/agent/kubernetes/integrations.md) for guidance on applying these instructions.

### Installation{% #installation %}

The Kubernetes Cluster Autoscaler check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package. (Agent >= 7.55.x) No additional installation is needed on your server.

### Configuration{% #configuration %}

1. Edit the `kubernetes_cluster_autoscaler.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory to start collecting your kubernetes_cluster_autoscaler performance data. See the [sample kubernetes_cluster_autoscaler.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/kubernetes_cluster_autoscaler/datadog_checks/kubernetes_cluster_autoscaler/data/conf.yaml.example) for all available configuration options.

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands.md#start-stop-and-restart-the-agent).

#### Metric collection{% #metric-collection %}

Make sure that the Prometheus-formatted metrics are exposed in your `kubernetes_cluster_autoscaler` cluster. For the Agent to start collecting metrics, the `kubernetes_cluster_autoscaler` pods need to be annotated.

[Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-monitor-cluster-autoscaler) has metrics and livenessProbe endpoints that can be accessed on port `8085`. These endpoints are located under `/metrics` and `/health-check` and provide valuable information about the state of your cluster during scaling operations.

**Note**: To change the default port, use the `--address` flag.

To configure the Cluster Autoscaler to expose metrics, do the following:

1. Enable access to the `/metrics` route and expose port `8085` for your Cluster Autoscaler deployment:

```
ports:
--name: app
containerPort: 8085
```

b) instruct your Prometheus to scrape it, by adding the following annotation to your Cluster Autoscaler service:

```
prometheus.io/scrape: true
```

**Note**: The listed metrics can only be collected if they are available. Some metrics are generated only when certain actions are performed.

The only parameters required for configuring the `kubernetes_cluster_autoscaler` check are:

- CONTAINER_NAME Name of the container of the cluster autoscaler controller.
- `openmetrics_endpoint` This parameter should be set to the location where the Prometheus-formatted metrics are exposed. The default port is `8085`. To configure a different port, use the `METRICS_PORT` [environment variable](https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/). In containerized environments, `%%host%%` should be used for [host autodetection](https://docs.datadoghq.com/agent/kubernetes/integrations.md).

```yaml
apiVersion: v1
kind: Pod
# (...)
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/<CONTAINER_NAME>.checks: |
      {
        "kubernetes_cluster_autoscaler": {
          "init_config": {},
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:8085/metrics"
            }
          ]
        }
      }
    # (...)
spec:
  containers:
    - name: '<CONTAINER_NAME>'
# (...)
```

### Validation{% #validation %}

[Run the Agent's status subcommand](https://docs.datadoghq.com/agent/guide/agent-commands.md#agent-status-and-information) and look for `kubernetes_cluster_autoscaler` under the Checks section.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **kubernetes\_cluster\_autoscaler.cluster.cpu.current.cores**(gauge)            | Current CPU cores usage in the cluster                                           |
| **kubernetes\_cluster\_autoscaler.cluster.memory.current.bytes**(gauge)         | Current memory usage in bytes in the cluster                                     |
| **kubernetes\_cluster\_autoscaler.cluster.safe.to.autoscale**(gauge)            | Indicates whether the cluster is safe to autoscale                               |
| **kubernetes\_cluster\_autoscaler.cpu.limits.cores**(gauge)                     | Total CPU cores limits set for pods in the cluster                               |
| **kubernetes\_cluster\_autoscaler.created.node.groups.count**(count)            | Total count of node groups created in the cluster                                |
| **kubernetes\_cluster\_autoscaler.deleted.node.groups.count**(count)            | Total count of node groups deleted in the cluster                                |
| **kubernetes\_cluster\_autoscaler.errors.count**(count)                         | Total count of errors occurred in the cluster                                    |
| **kubernetes\_cluster\_autoscaler.evicted.pods.count**(count)                   | Total count of evicted pods in the cluster                                       |
| **kubernetes\_cluster\_autoscaler.failed.scale.ups.count**(count)               | Total count of failed scale-up operations in the cluster                         |
| **kubernetes\_cluster\_autoscaler.function.duration.seconds.bucket**(count)     | Duration of a specific function in the cluster (bucket)                          |
| **kubernetes\_cluster\_autoscaler.function.duration.seconds.count**(count)      | Duration of a specific function in the cluster (count)                           |
| **kubernetes\_cluster\_autoscaler.function.duration.seconds.sum**(count)        | Duration of a specific function in the cluster (sum)                             |
| **kubernetes\_cluster\_autoscaler.go.gc.duration.seconds.count**(count)         | A summary of the pause duration of garbage collection cycles.*Shown as second*   |
| **kubernetes\_cluster\_autoscaler.go.gc.duration.seconds.quantile**(gauge)      | A summary of the pause duration of garbage collection cycles*Shown as second*    |
| **kubernetes\_cluster\_autoscaler.go.gc.duration.seconds.sum**(count)           | A summary of the pause duration of garbage collection cycles*Shown as second*    |
| **kubernetes\_cluster\_autoscaler.go.goroutines**(gauge)                        | Number of goroutines that currently exist                                        |
| **kubernetes\_cluster\_autoscaler.go.info**(gauge)                              | Information about the Go environment                                             |
| **kubernetes\_cluster\_autoscaler.go.memstats.alloc\_bytes**(gauge)             | Number of bytes allocated and still in use*Shown as byte*                        |
| **kubernetes\_cluster\_autoscaler.go.memstats.alloc\_bytes.count**(count)       | Total number of bytes allocated even if freed*Shown as byte*                     |
| **kubernetes\_cluster\_autoscaler.go.memstats.buck\_hash.sys\_bytes**(gauge)    | Number of bytes used by the profiling bucket hash table*Shown as byte*           |
| **kubernetes\_cluster\_autoscaler.go.memstats.frees.count**(count)              | Total number of frees                                                            |
| **kubernetes\_cluster\_autoscaler.go.memstats.gc.sys\_bytes**(gauge)            | Number of bytes used for garbage collection system metadata*Shown as byte*       |
| **kubernetes\_cluster\_autoscaler.go.memstats.heap.alloc\_bytes**(gauge)        | Number of heap bytes allocated and still in use*Shown as byte*                   |
| **kubernetes\_cluster\_autoscaler.go.memstats.heap.idle\_bytes**(gauge)         | Number of heap bytes waiting to be used*Shown as byte*                           |
| **kubernetes\_cluster\_autoscaler.go.memstats.heap.inuse\_bytes**(gauge)        | Number of heap bytes that are in use*Shown as byte*                              |
| **kubernetes\_cluster\_autoscaler.go.memstats.heap.objects**(gauge)             | Number of allocated objects*Shown as object*                                     |
| **kubernetes\_cluster\_autoscaler.go.memstats.heap.released\_bytes**(gauge)     | Number of heap bytes released to OS*Shown as byte*                               |
| **kubernetes\_cluster\_autoscaler.go.memstats.heap.sys\_bytes**(gauge)          | Number of heap bytes obtained from system*Shown as byte*                         |
| **kubernetes\_cluster\_autoscaler.go.memstats.lookups.count**(count)            | Total number of pointer lookups                                                  |
| **kubernetes\_cluster\_autoscaler.go.memstats.mallocs.count**(count)            | Total number of mallocs                                                          |
| **kubernetes\_cluster\_autoscaler.go.memstats.mcache.inuse\_bytes**(gauge)      | Number of bytes in use by mcache structures*Shown as byte*                       |
| **kubernetes\_cluster\_autoscaler.go.memstats.mcache.sys\_bytes**(gauge)        | Number of bytes used for mcache structures obtained from system*Shown as byte*   |
| **kubernetes\_cluster\_autoscaler.go.memstats.mspan.inuse\_bytes**(gauge)       | Number of bytes in use by mspan structures*Shown as byte*                        |
| **kubernetes\_cluster\_autoscaler.go.memstats.mspan.sys\_bytes**(gauge)         | Number of bytes used for mspan structures obtained from system*Shown as byte*    |
| **kubernetes\_cluster\_autoscaler.go.memstats.next.gc\_bytes**(gauge)           | Number of heap bytes when next garbage collection will take place*Shown as byte* |
| **kubernetes\_cluster\_autoscaler.go.memstats.other.sys\_bytes**(gauge)         | Number of bytes used for other system allocations*Shown as byte*                 |
| **kubernetes\_cluster\_autoscaler.go.memstats.stack.inuse\_bytes**(gauge)       | Number of bytes in use by the stack allocator*Shown as byte*                     |
| **kubernetes\_cluster\_autoscaler.go.memstats.stack.sys\_bytes**(gauge)         | Number of bytes obtained from system for stack allocator*Shown as byte*          |
| **kubernetes\_cluster\_autoscaler.go.memstats.sys\_bytes**(gauge)               | Number of bytes obtained from system*Shown as byte*                              |
| **kubernetes\_cluster\_autoscaler.go.threads**(gauge)                           | Number of OS threads created*Shown as thread*                                    |
| **kubernetes\_cluster\_autoscaler.last.activity**(gauge)                        | Timestamp of the last activity in the cluster                                    |
| **kubernetes\_cluster\_autoscaler.max.nodes.count**(gauge)                      | Maximum number of nodes allowed in the cluster                                   |
| **kubernetes\_cluster\_autoscaler.memory.limits.bytes**(gauge)                  | Total memory limits set for pods in the cluster                                  |
| **kubernetes\_cluster\_autoscaler.nap.enabled**(gauge)                          | Indicates whether Node Auto-Provisioning (NAP) is enabled in the cluster         |
| **kubernetes\_cluster\_autoscaler.node.groups.count**(gauge)                    | Number of node groups in the cluster                                             |
| **kubernetes\_cluster\_autoscaler.nodes.count**(gauge)                          | Number of nodes in cluster                                                       |
| **kubernetes\_cluster\_autoscaler.old.unregistered.nodes.removed.count**(count) | Total count of old unregistered nodes removed from the cluster                   |
| **kubernetes\_cluster\_autoscaler.scaled.down.gpu.nodes.count**(count)          | Total count of GPU nodes scaled down in the cluster                              |
| **kubernetes\_cluster\_autoscaler.scaled.down.nodes.count**(count)              | Total count of nodes scaled down in the cluster                                  |
| **kubernetes\_cluster\_autoscaler.scaled.up.gpu.nodes.count**(count)            | Total count of GPU nodes scaled up in the cluster                                |
| **kubernetes\_cluster\_autoscaler.scaled.up.nodes.count**(count)                | Total count of nodes scaled up in the cluster                                    |
| **kubernetes\_cluster\_autoscaler.skipped.scale.events.count**(count)           | Total count of skipped scale events in the cluster                               |
| **kubernetes\_cluster\_autoscaler.unneeded.nodes.count**(gauge)                 | Total count of unneeded nodes in the cluster                                     |
| **kubernetes\_cluster\_autoscaler.unschedulable.pods.count**(gauge)             | Number of unschedulable pods in the cluster                                      |

### Events{% #events %}

The Kubernetes Cluster Autoscaler integration does not include any events.

### Service Checks{% #service-checks %}

The Kubernetes Cluster Autoscaler integration does not include any service checks.

**kubernetes\_cluster\_autoscaler.openmetrics.health**

Returns `CRITICAL` if the Agent is unable to connect to the Kubernetes Cluster Autoscaler OpenMetrics endpoint, otherwise returns `OK`.

*Statuses: ok, critical*

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).