Kubernetes Data Collected

Docs > Containers > Kubernetes > Kubernetes Data Collected

This page lists data collected by the Datadog Agent when deployed on a Kubernetes cluster. The set of metrics collected may vary depending on the version of Kubernetes in use.

Note: For Windows containers, see Limited metrics for Windows deployments.

Metrics

Kubernetes


kubernetes.cpu.capacity (gauge)	The number of cores in this machine Shown as core
kubernetes.cpu.limits (gauge)	The limit of cpu cores set Shown as core
kubernetes.cpu.requests (gauge)	The requested cpu cores Shown as core
kubernetes.cpu.usage.total (gauge)	The number of cores used Shown as nanocore
kubernetes.diskio.io_service_bytes.stats.total (gauge)	The amount of disk space the container uses. Shown as byte
kubernetes.filesystem.usage (gauge)	The amount of disk used. Requires Docker container runtime. Shown as byte
kubernetes.filesystem.usage_pct (gauge)	The percentage of disk used. Requires Docker container runtime. Shown as fraction
kubernetes.memory.capacity (gauge)	The amount of memory (in bytes) in this machine Shown as byte
kubernetes.memory.limits (gauge)	The limit of memory set Shown as byte
kubernetes.memory.requests (gauge)	The requested memory Shown as byte
kubernetes.memory.usage (gauge)	The amount of memory used Shown as byte
kubernetes.network.rx_bytes (gauge)	The amount of bytes per second received Shown as byte
kubernetes.network.tx_bytes (gauge)	The amount of bytes per second transmitted Shown as byte
kubernetes.network_errors (gauge)	The amount of network errors per second Shown as error

Note: For more information about kubernetes.cpu.* metrics, see Discrepancies in kubernetes.cpu.* and container.cpu.* metrics.

Kubelet

For more information, see the documentation for the Kubelet integration.


kubernetes.containers.last_state.terminated (gauge)	The number of containers that were previously terminated
kubernetes.pods.running (gauge)	The number of running pods
kubernetes.pods.expired (gauge)	The number of expired pods the check ignored
kubernetes.containers.running (gauge)	The number of running containers
kubernetes.containers.restarts (gauge)	The number of times the container has been restarted
kubernetes.containers.state.terminated (gauge)	The number of currently terminated containers
kubernetes.containers.state.waiting (gauge)	The number of currently waiting containers
kubernetes.cpu.load.10s.avg (gauge)	Container cpu load average over the last 10 seconds
kubernetes.cpu.system.total (gauge)	The number of cores used for system time Shown as core
kubernetes.cpu.user.total (gauge)	The number of cores used for user time Shown as core
kubernetes.cpu.cfs.periods (gauge)	Number of elapsed enforcement period intervals
kubernetes.cpu.cfs.throttled.periods (gauge)	Number of throttled period intervals
kubernetes.cpu.cfs.throttled.seconds (gauge)	Total time duration the container has been throttled
kubernetes.cpu.capacity (gauge)	The number of cores in this machine (available until kubernetes v1.18) Shown as core
kubernetes.cpu.usage.total (gauge)	The number of cores used Shown as nanocore
kubernetes.cpu.limits (gauge)	The limit of cpu cores set Shown as core
kubernetes.cpu.requests (gauge)	The requested cpu cores Shown as core
kubernetes.filesystem.usage (gauge)	The amount of disk used Shown as byte
kubernetes.filesystem.usage_pct (gauge)	The percentage of disk used Shown as fraction
kubernetes.io.read_bytes (gauge)	The amount of bytes read from the disk Shown as byte
kubernetes.io.write_bytes (gauge)	The amount of bytes written to the disk Shown as byte
kubernetes.memory.capacity (gauge)	The amount of memory (in bytes) in this machine (available until kubernetes v1.18) Shown as byte
kubernetes.memory.limits (gauge)	The limit of memory set Shown as byte
kubernetes.memory.sw_limit (gauge)	The limit of swap space set Shown as byte
kubernetes.memory.requests (gauge)	The requested memory Shown as byte
kubernetes.memory.usage (gauge)	Current memory usage in bytes including all memory regardless of when it was accessed Shown as byte
kubernetes.memory.working_set (gauge)	Current working set in bytes - this is what the OOM killer is watching for Shown as byte
kubernetes.memory.cache (gauge)	The amount of memory that is being used to cache data from disk (e.g. memory contents that can be associated precisely with a block on a block device) Shown as byte
kubernetes.memory.rss (gauge)	Size of RSS in bytes Shown as byte
kubernetes.memory.swap (gauge)	The amount of swap currently used by by processes in this cgroup Shown as byte
kubernetes.memory.usage_pct (gauge)	The percentage of memory used per pod (memory limit must be set) Shown as fraction
kubernetes.memory.sw_in_use (gauge)	The percentage of swap space used Shown as fraction
kubernetes.network.rx_bytes (gauge)	The amount of bytes per second received Shown as byte
kubernetes.network.rx_dropped (gauge)	The amount of rx packets dropped per second Shown as packet
kubernetes.network.rx_errors (gauge)	The amount of rx errors per second Shown as error
kubernetes.network.tx_bytes (gauge)	The amount of bytes per second transmitted Shown as byte
kubernetes.network.tx_dropped (gauge)	The amount of tx packets dropped per second Shown as packet
kubernetes.network.tx_errors (gauge)	The amount of tx errors per second Shown as error
kubernetes.diskio.io_service_bytes.stats.total (gauge)	The amount of disk space the container uses Shown as byte
kubernetes.apiserver.certificate.expiration.count (gauge)	The count of remaining lifetime on the certificate used to authenticate a request Shown as second
kubernetes.apiserver.certificate.expiration.sum (gauge)	The sum of remaining lifetime on the certificate used to authenticate a request Shown as second
kubernetes.rest.client.requests (gauge)	The number of HTTP requests Shown as operation
kubernetes.rest.client.latency.count (gauge)	The count of request latency in seconds broken down by verb and URL
kubernetes.rest.client.latency.sum (gauge)	The sum of request latency in seconds broken down by verb and URL Shown as second
kubernetes.kubelet.pleg.discard_events (count)	The number of discard events in PLEG
kubernetes.kubelet.pleg.last_seen (gauge)	Timestamp in seconds when PLEG was last seen active Shown as second
kubernetes.kubelet.pleg.relist_duration.count (gauge)	The count of relisting pods in PLEG
kubernetes.kubelet.pleg.relist_duration.sum (gauge)	The sum of duration in seconds for relisting pods in PLEG Shown as second
kubernetes.kubelet.pleg.relist_interval.count (gauge)	The count of relisting pods in PLEG Shown as second
kubernetes.kubelet.pleg.relist_interval.sum (gauge)	The sum of interval in seconds between relisting in PLEG
kubernetes.kubelet.runtime.operations (count)	The number of runtime operations Shown as operation
kubernetes.kubelet.runtime.errors (gauge)	Cumulative number of runtime operations errors Shown as operation
kubernetes.kubelet.runtime.operations.duration.sum (gauge)	The sum of duration of operations Shown as operation
kubernetes.kubelet.runtime.operations.duration.count (gauge)	The count of operations
kubernetes.kubelet.network_plugin.latency.sum (gauge)	The sum of latency in microseconds of network plugin operations Shown as microsecond
kubernetes.kubelet.network_plugin.latency.count (gauge)	The count of network plugin operations by latency
kubernetes.kubelet.network_plugin.latency.quantile (gauge)	The quantiles of network plugin operations by latency
kubernetes.kubelet.volume.stats.available_bytes (gauge)	The number of available bytes in the volume Shown as byte
kubernetes.kubelet.volume.stats.capacity_bytes (gauge)	The capacity in bytes of the volume Shown as byte
kubernetes.kubelet.volume.stats.used_bytes (gauge)	The number of used bytes in the volume Shown as byte
kubernetes.kubelet.volume.stats.inodes (gauge)	The maximum number of inodes in the volume Shown as inode
kubernetes.kubelet.volume.stats.inodes_free (gauge)	The number of free inodes in the volume Shown as inode
kubernetes.kubelet.volume.stats.inodes_used (gauge)	The number of used inodes in the volume Shown as inode
kubernetes.ephemeral_storage.limits (gauge)	Ephemeral storage limit of the container (requires kubernetes v1.8+) Shown as byte
kubernetes.ephemeral_storage.requests (gauge)	Ephemeral storage request of the container (requires kubernetes v1.8+) Shown as byte
kubernetes.ephemeral_storage.usage (gauge)	Ephemeral storage usage of the POD Shown as byte
kubernetes.kubelet.evictions (count)	The number of pods that have been evicted from the kubelet (ALPHA in kubernetes v1.16)
kubernetes.kubelet.cpu.usage (gauge)	The number of cores used by kubelet Shown as nanocore
kubernetes.kubelet.memory.usage (gauge)	Current kubelet memory usage in bytes Shown as byte
kubernetes.kubelet.memory.rss (gauge)	Size of kubelet RSS in bytes Shown as byte
kubernetes.runtime.cpu.usage (gauge)	The number of cores used by the runtime Shown as nanocore
kubernetes.runtime.memory.usage (gauge)	Current runtime memory usage in bytes Shown as byte
kubernetes.runtime.memory.rss (gauge)	Size of runtime RSS in bytes Shown as byte
kubernetes.kubelet.container.log_filesystem.used_bytes (gauge)	Bytes used by the container’s logs on the filesystem (requires kubernetes 1.14+) Shown as byte
kubernetes.kubelet.pod.start.duration (gauge)	Duration in microseconds for a single pod to go from pending to running Shown as microsecond
kubernetes.kubelet.pod.worker.duration (gauge)	Duration in microseconds to sync a single pod. Broken down by operation type: create, update, or sync Shown as microsecond
kubernetes.kubelet.pod.worker.start.duration (gauge)	Duration in microseconds from seeing a pod to starting a worker Shown as microsecond
kubernetes.kubelet.docker.operations (count)	The number of docker operations Shown as operation
kubernetes.kubelet.docker.errors (count)	The number of docker operations errors Shown as operation
kubernetes.kubelet.docker.operations.duration.sum (gauge)	The sum of duration of docker operations Shown as operation
kubernetes.kubelet.docker.operations.duration.count (gauge)	The count of docker operations
kubernetes.go_threads (gauge)	Number of OS threads created
kubernetes.go_goroutines (gauge)	Number of goroutines that currently exist
kubernetes.liveness_probe.success.total (gauge)	Cumulative number of successful liveness probe for a container (ALPHA in kubernetes v1.15)
kubernetes.liveness_probe.failure.total (gauge)	Cumulative number of failed liveness probe for a container (ALPHA in kubernetes v1.15)
kubernetes.readiness_probe.success.total (gauge)	Cumulative number of successful readiness probe for a container (ALPHA in kubernetes v1.15)
kubernetes.readiness_probe.failure.total (gauge)	Cumulative number of failed readiness probe for a container (ALPHA in kubernetes v1.15)
kubernetes.startup_probe.success.total (gauge)	Cumulative number of successful startup probe for a container (ALPHA in kubernetes v1.15)
kubernetes.startup_probe.failure.total (gauge)	Cumulative number of failed startup probe for a container (ALPHA in kubernetes v1.15)
kubernetes.node.filesystem.usage (gauge)	The amount of disk used at node level Shown as byte
kubernetes.node.filesystem.usage_pct (gauge)	The percentage of disk space used at node level Shown as fraction
kubernetes.node.image.filesystem.usage (gauge)	The amount of disk used on image filesystem (node level) Shown as byte
kubernetes.node.image.filesystem.usage_pct (gauge)	The percentage of disk used (node level) Shown as fraction
kubernetes.pod.terminating.duration (gauge)	Amount of time the pod hangs in termination phase Shown as second
kubernetes.pod.resize.pending (gauge)	Number of pods with resource resize request in pending state

Kubernetes state metrics core

For more information, see the documentation for the Kubernetes state metrics core integration. This check requires Datadog Cluster Agent v1.12 or later.


kubernetes_state.apiservice.condition (gauge)	The current condition of this apiservice. Tags:`kube_namespace` `apiservice` `condition` `status`.
kubernetes_state.apiservice.count (gauge)	The current count of apiservices.
kubernetes_state.configmap.count (gauge)	Number of ConfigMaps. Requires ConfigMaps to be added to Cluster Agent collector. Tags: `kube_namespace`.
kubernetes_state.container.cpu_limit (gauge)	The value of CPU limit by a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `unit` (`env` `service` `version` from standard labels). Shown as cpu
kubernetes_state.container.cpu_limit.total (gauge)	The total value of CPU limits by all containers in the cluster. Tags:`kube_namespace` `kube_container_name` `kube_<owner kind>`. Shown as cpu
kubernetes_state.container.cpu_requested (gauge)	The value of CPU requested by a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `unit` (`env` `service` `version` from standard labels). Shown as cpu
kubernetes_state.container.cpu_requested.total (gauge)	The total value of CPU requested by all containers in the cluster. Tags:`kube_namespace` `kube_container_name` `kube_<owner kind>`. Shown as cpu
kubernetes_state.container.gpu_limit (gauge)	The value of GPU limit by a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `mig_profile` `unit` (`env` `service` `version` from standard labels).
kubernetes_state.container.gpu_limit.total (gauge)	The total value of GPU limits by all containers in the cluster. Tags:`kube_namespace` `kube_container_name` `kube_<owner kind>`.
kubernetes_state.container.gpu_requested (gauge)	The value of GPU requested by a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `mig_profile` `unit` (`env` `service` `version` from standard labels).
kubernetes_state.container.gpu_requested.total (gauge)	The total value of GPU requested by all containers in the cluster. Tags:`kube_namespace` `kube_container_name` `kube_<owner kind>`.
kubernetes_state.container.memory_limit (gauge)	The value of memory limit by a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `unit` (`env` `service` `version` from standard labels). Shown as byte
kubernetes_state.container.memory_limit.total (gauge)	The total value of memory limits by all containers in the cluster. Tags:`kube_namespace` `kube_container_name` `kube_<owner kind>`. Shown as byte
kubernetes_state.container.memory_requested (gauge)	The value of memory requested by a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `unit` (`env` `service` `version` from standard labels). Shown as byte
kubernetes_state.container.memory_requested.total (gauge)	The total value of memory requested by all containers in the cluster. Tags:`kube_namespace` `kube_container_name` `kube_<owner kind>`. Shown as byte
kubernetes_state.container.network_bandwidth_limit (gauge)	The value of network bandwidth limit for a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `unit` (`env` `service` `version` from standard labels).
kubernetes_state.container.network_bandwidth_requested (gauge)	The value of network bandwidth requested by a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `unit` (`env` `service` `version` from standard labels).
kubernetes_state.container.ready (gauge)	Describes whether the containers readiness check succeeded. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.container.restarts (gauge)	The number of container restarts per container. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.container.running (gauge)	Describes whether the container is currently in running state. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.container.status_report.count.terminated (gauge)	Describes the reason the container is currently in terminated state. Tags:`kube_namespace` `pod_name` `kube_container_name` `reason` (`env` `service` `version` from standard labels).
kubernetes_state.container.status_report.count.waiting (gauge)	Describes the reason the container is currently in waiting state. Tags:`kube_namespace` `pod_name` `kube_container_name` `reason` (`env` `service` `version` from standard labels).
kubernetes_state.container.terminated (gauge)	Describes whether the container is currently in terminated state. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.container.waiting (gauge)	Describes whether the container is currently in waiting state. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.crd.condition (gauge)	The current condition of this custom resource definition. Tags: `customresourcedefinition` `condition` `status`.
kubernetes_state.crd.count (gauge)	Number of custom resource definitions.
kubernetes_state.cronjob.count (gauge)	Number of cronjobs. Tags:`kube_namespace`.
kubernetes_state.cronjob.duration_since_last_schedule (gauge)	The duration since the last time the cronjob was scheduled. Tags:`kube_cronjob` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.cronjob.duration_since_last_successful (gauge)	The duration since the last time the cronjob was successfully scheduled. Tags:`kube_cronjob` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.cronjob.spec_suspend (gauge)	Suspend flag tells the controller to suspend subsequent executions. Tags:`kube_namespace` `kube_cronjob` (`env` `service` `version` from standard labels).
kubernetes_state.daemonset.count (gauge)	Number of DaemonSets. Tags:`kube_namespace`.
kubernetes_state.daemonset.daemons_available (gauge)	The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and available. Tags:`kube_daemon_set` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.daemonset.daemons_unavailable (gauge)	The number of nodes that should be running the daemon pod and have none of the daemon pod running and available. Tags:`kube_daemon_set` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.daemonset.desired (gauge)	The number of nodes that should be running the daemon pod. Tags:`kube_daemon_set` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.daemonset.misscheduled (gauge)	The number of nodes running a daemon pod but are not supposed to. Tags:`kube_daemon_set` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.daemonset.ready (gauge)	The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready. Tags:`kube_daemon_set` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.daemonset.scheduled (gauge)	The number of nodes running at least one daemon pod and are supposed to. Tags:`kube_daemon_set` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.daemonset.updated (gauge)	The total number of nodes that are running updated daemon pod. Tags:`kube_daemon_set` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.condition (gauge)	The current status conditions of a deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.count (gauge)	Number of deployments. Tags:`kube_namespace`.
kubernetes_state.deployment.paused (gauge)	Whether the deployment is paused and will not be processed by the deployment controller. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.replicas (gauge)	The number of replicas per deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.replicas_available (gauge)	The number of available replicas per deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.replicas_desired (gauge)	Number of desired pods for a deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.replicas_ready (gauge)	The number of ready replicas per deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.replicas_unavailable (gauge)	The number of unavailable replicas per deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.replicas_updated (gauge)	The number of updated replicas per deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.rollingupdate.max_surge (gauge)	Maximum number of replicas that can be scheduled above the desired number of replicas during a rolling update of a deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.rollingupdate.max_unavailable (gauge)	Maximum number of unavailable replicas during a rolling update of a deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.rollout_duration (gauge)	Number of seconds since deployment rollout started. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels). Shown as second
kubernetes_state.endpoint.address_available (gauge)	Number of addresses available in endpoint. Tags:`endpoint` `kube_namespace`.
kubernetes_state.endpoint.address_not_ready (gauge)	Number of addresses not ready in endpoint. Tags:`endpoint` `kube_namespace`.
kubernetes_state.endpoint.count (gauge)	Number of endpoints. Tags:`kube_namespace`.
kubernetes_state.hpa.condition (gauge)	The condition of this autoscaler. Tags:`kube_namespace` `horizontalpodautoscaler` `condition` `status`.
kubernetes_state.hpa.count (gauge)	Number of horizontal pod autoscalers. Tags: `kube_namespace`.
kubernetes_state.hpa.current_replicas (gauge)	Current number of replicas of pods managed by this autoscaler. Tags:`kube_namespace` `horizontalpodautoscaler`.
kubernetes_state.hpa.desired_replicas (gauge)	Desired number of replicas of pods managed by this autoscaler. Tags:`kube_namespace` `horizontalpodautoscaler`.
kubernetes_state.hpa.max_replicas (gauge)	Upper limit for the number of pods that can be set by the autoscaler; cannot be smaller than MinReplicas. Tags:`kube_namespace` `horizontalpodautoscaler`.
kubernetes_state.hpa.min_replicas (gauge)	Lower limit for the number of pods that can be set by the autoscaler default 1. Tags:`kube_namespace` `horizontalpodautoscaler`.
kubernetes_state.hpa.spec_target_metric (gauge)	The metric specifications used by this autoscaler when calculating the desired replica count. Tags:`kube_namespace` `horizontalpodautoscaler` `metric_name` `metric_target_type`.
kubernetes_state.hpa.status_target_metric (gauge)	The current metric status used by this autoscaler when calculating the desired replica count. Tags:`kube_namespace` `horizontalpodautoscaler` `metric_name` `metric_target_type`.
kubernetes_state.ingress.count (gauge)	Number of ingresses. Tags:`kube_namespace`.
kubernetes_state.ingress.path (gauge)	Information about the ingress path. Tags:`kube_namespace` `kube_ingress_path` `kube_ingress` `kube_service` `kube_service_port` `kube_ingress_host` .
kubernetes_state.initcontainer.cpu_limit (gauge)	Maximum number of cpus a container can request. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels). Shown as cpu
kubernetes_state.initcontainer.cpu_requested (gauge)	Number of cpus requested by the container. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels). Shown as cpu
kubernetes_state.initcontainer.memory_limit (gauge)	Maximum number of byte a container can request. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels). Shown as byte
kubernetes_state.initcontainer.memory_requested (gauge)	Number of bytes memory requested by the container. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels). Shown as byte
kubernetes_state.initcontainer.ready (gauge)	Indicates when the container is ready. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.initcontainer.restarts (gauge)	Describes whether the number of restarts for the init container. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.initcontainer.running (gauge)	Indicates when the container is running. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.initcontainer.status_report.count.terminated (gauge)	Number of containers in a terminated state. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.initcontainer.status_report.count.waiting (gauge)	Number of containers in a waiting state. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.initcontainer.waiting (gauge)	Describes whether the init container is currently in waiting state. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.job.completion.failed (gauge)	The job has failed its execution. Tags:`kube_job` or `kube_cronjob` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.job.completion.succeeded (gauge)	The job has completed its execution. Tags:`kube_job` or `kube_cronjob` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.job.count (gauge)	Number of jobs. Tags:`kube_namespace` `kube_cronjob`.
kubernetes_state.job.duration (gauge)	Time elapsed between the start and completion time of the job or the current time if the job is still running. Tags:`kube_job` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.job.failed (gauge)	The number of pods which reached Phase Failed. Tags:`kube_job` or `kube_cronjob` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.job.succeeded (gauge)	The number of pods which reached Phase Succeeded. Tags:`kube_job` or `kube_cronjob` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.limitrange.cpu.default (gauge)	Information about CPU limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as cpu
kubernetes_state.limitrange.cpu.default_request (gauge)	Information about CPU limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as cpu
kubernetes_state.limitrange.cpu.max (gauge)	Information about CPU limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as cpu
kubernetes_state.limitrange.cpu.max_limit_request_ratio (gauge)	Information about CPU limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as cpu
kubernetes_state.limitrange.cpu.min (gauge)	Information about CPU limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as cpu
kubernetes_state.limitrange.memory.default (gauge)	Information about memory limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as byte
kubernetes_state.limitrange.memory.default_request (gauge)	Information about memory limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as byte
kubernetes_state.limitrange.memory.max (gauge)	Information about memory limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as byte
kubernetes_state.limitrange.memory.max_limit_request_ratio (gauge)	Information about memory limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as byte
kubernetes_state.limitrange.memory.min (gauge)	Information about memory limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as byte
kubernetes_state.namespace.count (gauge)	Number of namespaces. Tags:`phase`.
kubernetes_state.node.age (gauge)	The time in seconds since the creation of the node. Tags:`node`. Shown as second
kubernetes_state.node.by_condition (gauge)	The condition of a cluster node. Tags:`condition` `node` `status`.
kubernetes_state.node.count (gauge)	Number of nodes. Tags:`kernel_version` `os_image` `container_runtime_version` `kubelet_version`.
kubernetes_state.node.cpu_allocatable (gauge)	The allocatable CPU of a node that is available for scheduling. Tags:`node` `resource` `unit`. Shown as cpu
kubernetes_state.node.cpu_allocatable.total (gauge)	The total allocatable CPU of all nodes in the cluster that is available for scheduling. Shown as cpu
kubernetes_state.node.cpu_capacity (gauge)	The CPU capacity of a node. Tags:`node` `resource` `unit`. Shown as cpu
kubernetes_state.node.cpu_capacity.total (gauge)	The total CPU capacity of all nodes in the cluster. Shown as cpu
kubernetes_state.node.ephemeral_storage_allocatable (gauge)	The allocatable ephemeral-storage of a node that is available for scheduling. Tags:`node` `resource` `unit`.
kubernetes_state.node.ephemeral_storage_capacity (gauge)	The ephemeral-storage capacity of a node. Tags:`node` `resource` `unit`.
kubernetes_state.node.gpu_allocatable (gauge)	The allocatable GPU of a node that is available for scheduling. Tags:`node` `resource` `mig_profile` `unit`.
kubernetes_state.node.gpu_allocatable.total (gauge)	The total allocatable GPU of all nodes in the cluster that is available for scheduling.
kubernetes_state.node.gpu_capacity (gauge)	The GPU capacity of a node. Tags:`node` `resource` `mig_profile` `unit`.
kubernetes_state.node.gpu_capacity.total (gauge)	The total GPU capacity of all nodes in the cluster.
kubernetes_state.node.memory_allocatable (gauge)	The allocatable memory of a node that is available for scheduling. Tags:`node` `resource` `unit`. Shown as byte
kubernetes_state.node.memory_allocatable.total (gauge)	The total allocatable memory of all nodes in the cluster that is available for scheduling. Shown as byte
kubernetes_state.node.memory_capacity (gauge)	The memory capacity of a node. Tags:`node` `resource` `unit`. Shown as byte
kubernetes_state.node.memory_capacity.total (gauge)	The total memory capacity of all nodes in the cluster. Shown as byte
kubernetes_state.node.network_bandwidth_allocatable (gauge)	The allocatable network bandwidth of a node that is available for scheduling. Tags:`node` `resource` `unit`.
kubernetes_state.node.network_bandwidth_capacity (gauge)	The network bandwidth capacity of a node. Tags:`node` `resource` `unit`.
kubernetes_state.node.pods_allocatable (gauge)	The allocatable memory of a node that is available for scheduling. Tags:`node` `resource` `unit`.
kubernetes_state.node.pods_capacity (gauge)	The pods capacity of a node. Tags:`node` `resource` `unit`.
kubernetes_state.node.status (gauge)	Whether the node can schedule new pods. Tags:`node` `status`.
kubernetes_state.pdb.disruptions_allowed (gauge)	Number of pod disruptions that are currently allowed. Tags:`kube_namespace` `poddisruptionbudget`.
kubernetes_state.pdb.pods_desired (gauge)	Minimum desired number of healthy pods. Tags:`kube_namespace` `poddisruptionbudget`.
kubernetes_state.pdb.pods_healthy (gauge)	Current number of healthy pods. Tags:`kube_namespace` `poddisruptionbudget`.
kubernetes_state.pdb.pods_total (gauge)	Total number of pods counted by this disruption budget. Tags:`kube_namespace` `poddisruptionbudget`.
kubernetes_state.persistentvolume.by_phase (gauge)	The phase indicates if a volume is available bound to a claim or released by a claim. Tags:`persistentvolume` `storageclass` `phase`.
kubernetes_state.persistentvolume.capacity (gauge)	Persistentvolume capacity in bytes. Tags:`persistentvolume` `storageclass`.
kubernetes_state.persistentvolumeclaim.access_mode (gauge)	The access mode(s) specified by the persistent volume claim. Tags:`kube_namespace` `persistentvolumeclaim` `access_mode` `storageclass`.
kubernetes_state.persistentvolumeclaim.request_storage (gauge)	The capacity of storage requested by the persistent volume claim. Tags:`kube_namespace` `persistentvolumeclaim` `storageclass`.
kubernetes_state.persistentvolumeclaim.status (gauge)	The phase the persistent volume claim is currently in. Tags:`kube_namespace` `persistentvolumeclaim` `phase` `storageclass`.
kubernetes_state.pod.age (gauge)	The time in seconds since the creation of the pod. Tags:`node` `kube_namespace` `pod_name` `pod_phase` (`env` `service` `version` from standard labels). Shown as second
kubernetes_state.pod.count (gauge)	Number of Pods. Tags:`node` `kube_namespace` `kube_<owner kind>`.
kubernetes_state.pod.ready (gauge)	Describes whether the pod is ready to serve requests. Tags:`node` `kube_namespace` `pod_name` `condition` (`env` `service` `version` from standard labels).
kubernetes_state.pod.scheduled (gauge)	Describes the status of the scheduling process for the pod. Tags:`node` `kube_namespace` `pod_name` `condition` (`env` `service` `version` from standard labels).
kubernetes_state.pod.status_phase (gauge)	The pods current phase. Tags:`node` `kube_namespace` `pod_name` `pod_phase` (`env` `service` `version` from standard labels).
kubernetes_state.pod.tolerations (gauge)	Information about the pod tolerations
kubernetes_state.pod.unschedulable (gauge)	Describes the unschedulable status for the pod. Tags:`kube_namespace` `pod_name` (`env` `service` `version` from standard labels).
kubernetes_state.pod.uptime (gauge)	The time in seconds since the pod has been scheduled and acknowledged by the Kubelet. Tags:`node` `kube_namespace` `pod_name` `pod_phase` (`env` `service` `version` from standard labels).
kubernetes_state.pod.volumes.persistentvolumeclaims_readonly (gauge)	Describes whether a persistentvolumeclaim is mounted read only. Tags:`node` `kube_namespace` `pod_name` `volume` `persistentvolumeclaim` (`env` `service` `version` from standard labels).
kubernetes_state.replicaset.count (gauge)	Number of ReplicaSets Tags:`kube_namespace` `kube_deployment`.
kubernetes_state.replicaset.fully_labeled_replicas (gauge)	The number of fully labeled replicas per ReplicaSet. Tags:`kube_namespace` `kube_replica_set` (`env` `service` `version` from standard labels).
kubernetes_state.replicaset.replicas (gauge)	The number of replicas per ReplicaSet. Tags:`kube_namespace` `kube_replica_set` (`env` `service` `version` from standard labels).
kubernetes_state.replicaset.replicas_desired (gauge)	Number of desired pods for a ReplicaSet. Tags:`kube_namespace` `kube_replica_set` (`env` `service` `version` from standard labels).
kubernetes_state.replicaset.replicas_ready (gauge)	The number of ready replicas per ReplicaSet. Tags:`kube_namespace` `kube_replica_set` (`env` `service` `version` from standard labels).
kubernetes_state.replicationcontroller.fully_labeled_replicas (gauge)	The number of fully labeled replicas per ReplicationController. Tags:`kube_namespace` `kube_replication_controller`.
kubernetes_state.replicationcontroller.replicas (gauge)	The number of replicas per ReplicationController. Tags:`kube_namespace` `kube_replication_controller`.
kubernetes_state.replicationcontroller.replicas_available (gauge)	The number of available replicas per ReplicationController. Tags:`kube_namespace` `kube_replication_controller`.
kubernetes_state.replicationcontroller.replicas_desired (gauge)	Number of desired pods for a ReplicationController. Tags:`kube_namespace` `kube_replication_controller`.
kubernetes_state.replicationcontroller.replicas_ready (gauge)	The number of ready replicas per ReplicationController. Tags:`kube_namespace` `kube_replication_controller`.
kubernetes_state.resourcequota.count_configmaps.limit (gauge)	Information about resource quota limits by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.resourcequota.count_configmaps.used (gauge)	Information about resource quota usage by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.resourcequota.count_secrets.limit (gauge)	Information about resource quota limits by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.resourcequota.count_secrets.used (gauge)	Information about resource quota usage by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.resourcequota.pods.limit (gauge)	Information about resource quota limits by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.resourcequota.pods.used (gauge)	Information about resource quota usage by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.resourcequota.requests.cpu.limit (gauge)	Information about resource quota limits by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.resourcequota.requests.cpu.used (gauge)	Information about resource quota usage by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.secret.count (gauge)	Number of Secrets. Requires Secrets to be added to Cluster Agent collector. Tags: `kube_namespace`.
kubernetes_state.secret.type (gauge)	Type about secret. Tags:`kube_namespace` `secret` `type`.
kubernetes_state.service.count (gauge)	Number of services. Tags:`kube_namespace` `type`.
kubernetes_state.service.type (gauge)	Service types. Tags:`kube_namespace` `kube_service` `type`.
kubernetes_state.statefulset.count (gauge)	Number of StatefulSets Tags:`kube_namespace`.
kubernetes_state.statefulset.replicas (gauge)	The number of replicas per StatefulSet. Tags:`kube_namespace` `kube_stateful_set` (`env` `service` `version` from standard labels).
kubernetes_state.statefulset.replicas_current (gauge)	The number of current replicas per StatefulSet. Tags:`kube_namespace` `kube_stateful_set` (`env` `service` `version` from standard labels).
kubernetes_state.statefulset.replicas_desired (gauge)	Number of desired pods for a StatefulSet. Tags:`kube_namespace` `kube_stateful_set` (`env` `service` `version` from standard labels).
kubernetes_state.statefulset.replicas_ready (gauge)	The number of ready replicas per StatefulSet. Tags:`kube_namespace` `kube_stateful_set` (`env` `service` `version` from standard labels).
kubernetes_state.statefulset.replicas_updated (gauge)	The number of updated replicas per StatefulSet. Tags:`kube_namespace` `kube_stateful_set` (`env` `service` `version` from standard labels).
kubernetes_state.vpa.count (gauge)	Number of vertical pod autoscalers. Tags: `kube_namespace`.
kubernetes_state.vpa.lower_bound (gauge)	Minimum resources the container can use before the VerticalPodAutoscaler updater evicts it. Tags:`kube_namespace` `verticalpodautoscaler` `kube_container_name` `resource` `target_api_version` `target_kind` `target_name` `unit`.
kubernetes_state.vpa.spec_container_maxallowed (gauge)	Maximum resources the VerticalPodAutoscaler can set for containers matching the name. Tags:`kube_namespace` `verticalpodautoscaler` `kube_container_name` `resource` `target_api_version` `target_kind` `target_name` `unit`.
kubernetes_state.vpa.spec_container_minallowed (gauge)	Minimum resources the VerticalPodAutoscaler can set for containers matching the name. Tags:`kube_namespace` `verticalpodautoscaler` `kube_container_name` `resource` `target_api_version` `target_kind` `target_name` `unit`.
kubernetes_state.vpa.target (gauge)	Target resources the VerticalPodAutoscaler recommends for the container. Tags:`kube_namespace` `verticalpodautoscaler` `kube_container_name` `resource` `target_api_version` `target_kind` `target_name` `unit`.
kubernetes_state.vpa.uncapped_target (gauge)	Target resources the VerticalPodAutoscaler recommends for the container ignoring bounds. Tags:`kube_namespace` `verticalpodautoscaler` `kube_container_name` `resource` `target_api_version` `target_kind` `target_name` `unit`.
kubernetes_state.vpa.update_mode (gauge)	Update mode of the VerticalPodAutoscaler. Tags:`kube_namespace` `verticalpodautoscaler` `target_api_version` `target_kind` `target_name` `update_mode`.
kubernetes_state.vpa.upperbound (gauge)	Maximum resources the container can use before the VerticalPodAutoscaler updater evicts it. Tags:`kube_namespace` `verticalpodautoscaler` `kube_container_name` `resource` `target_api_version` `target_kind` `target_name` `unit`.

Note: You can configure Datadog Standard labels on your Kubernetes objects to get the env service version tags.

Kubernetes state

Note: kubernetes_state.* metrics are gathered from the kube-state-metrics API. The kubernetes_state check is a legacy check. For an alternative, see Kubernetes state metrics core. Datadog recommends that you do not enable both checks simultaneously.


kubernetes_state.container.ready (gauge)	Whether the containers readiness check succeeded
kubernetes_state.container.running (gauge)	Whether the container is currently in running state
kubernetes_state.container.terminated (gauge)	Whether the container is currently in terminated state
kubernetes_state.container.status_report.count.terminated (gauge)	Count of the containers currently reporting a in terminated state with the reason as a tag
kubernetes_state.container.waiting (gauge)	Whether the container is currently in waiting state
kubernetes_state.container.status_report.count.waiting (gauge)	Count of the containers currently reporting a in waiting state with the reason as a tag
kubernetes_state.container.gpu.request (gauge)	The number of requested gpu devices by a container
kubernetes_state.container.gpu.limit (gauge)	The limit on gpu devices to be used by a container
kubernetes_state.container.restarts (gauge)	The number of restarts per container
kubernetes_state.container.cpu_requested (gauge)	The number of requested cpu cores by a container Shown as cpu
kubernetes_state.container.memory_requested (gauge)	The number of requested memory bytes by a container Shown as byte
kubernetes_state.container.cpu_limit (gauge)	The limit on cpu cores to be used by a container Shown as cpu
kubernetes_state.container.memory_limit (gauge)	The limit on memory to be used by a container Shown as byte
kubernetes_state.daemonset.scheduled (gauge)	The number of nodes running at least one daemon pod and that are supposed to
kubernetes_state.daemonset.misscheduled (gauge)	The number of nodes running a daemon pod but are not supposed to
kubernetes_state.daemonset.desired (gauge)	The number of nodes that should be running the daemon pod
kubernetes_state.daemonset.ready (gauge)	The number of nodes that should be running the daemon pod and have one or more running and ready
kubernetes_state.daemonset.updated (gauge)	The number of nodes that run the updated daemon pod spec
kubernetes_state.deployment.count (gauge)	The number of deployments
kubernetes_state.deployment.replicas (gauge)	The number of replicas per deployment
kubernetes_state.deployment.replicas_available (gauge)	The number of available replicas per deployment
kubernetes_state.deployment.replicas_unavailable (gauge)	The number of unavailable replicas per deployment
kubernetes_state.deployment.replicas_updated (gauge)	The number of updated replicas per deployment
kubernetes_state.deployment.replicas_desired (gauge)	The number of desired replicas per deployment
kubernetes_state.deployment.paused (gauge)	Whether a deployment is paused
kubernetes_state.deployment.rollingupdate.max_unavailable (gauge)	Maximum number of unavailable replicas during a rolling update
kubernetes_state.endpoint.address_available (gauge)	Number of addresses available in endpoint
kubernetes_state.endpoint.address_not_ready (gauge)	Number of addresses not ready in endpoint
kubernetes_state.endpoint.created (gauge)	Unix creation timestamp
kubernetes_state.job.count (gauge)	The number of jobs
kubernetes_state.job.failed (count)	Observed number of failed pods in a job
kubernetes_state.job.succeeded (count)	Observed number of succeeded pods in a job
kubernetes_state.limitrange.cpu.min (gauge)	Minimum CPU request for this type
kubernetes_state.limitrange.cpu.max (gauge)	Maximum CPU limit for this type
kubernetes_state.limitrange.cpu.default (gauge)	Default CPU limit if not specified
kubernetes_state.limitrange.cpu.default_request (gauge)	Default CPU request if not specified
kubernetes_state.limitrange.cpu.max_limit_request_ratio (gauge)	Maximum CPU limit / request ratio
kubernetes_state.limitrange.memory.min (gauge)	Minimum memory request for this type
kubernetes_state.limitrange.memory.max (gauge)	Maximum memory limit for this type
kubernetes_state.limitrange.memory.default (gauge)	Default memory limit if not specified
kubernetes_state.limitrange.memory.default_request (gauge)	Default memory request if not specified
kubernetes_state.limitrange.memory.max_limit_request_ratio (gauge)	Maximum memory limit / request ratio
kubernetes_state.node.count (count)	The number of nodes Shown as node
kubernetes_state.node.cpu_capacity (gauge)	The total CPU resources of the node Shown as cpu
kubernetes_state.node.memory_capacity (gauge)	The total memory resources of the node Shown as byte
kubernetes_state.node.pods_capacity (gauge)	The total pod resources of the node
kubernetes_state.node.gpu.cards_allocatable (gauge)	The GPU resources of a node that are available for scheduling
kubernetes_state.node.gpu.cards_capacity (gauge)	The total GPU resources of the node
kubernetes_state.persistentvolumeclaim.status (gauge)	The phase the persistent volume claim is currently in
kubernetes_state.persistentvolumeclaim.request_storage (gauge)	Storage space request for a given pvc Shown as byte
kubernetes_state.persistentvolumes.by_phase (gauge)	Number of persistent volumes to sum by phase and storageclass
kubernetes_state.namespace.count (gauge)	The number of namespaces Shown as cpu
kubernetes_state.node.cpu_allocatable (gauge)	The CPU resources of a node that are available for scheduling Shown as cpu
kubernetes_state.node.memory_allocatable (gauge)	The memory resources of a node that are available for scheduling Shown as byte
kubernetes_state.node.pods_allocatable (gauge)	The pod resources of a node that are available for scheduling
kubernetes_state.node.status (gauge)	Submitted with a value of 1 for each node and tagged either ‘status:schedulable’ or ‘status:unschedulable’; Sum this metric by either status to get the number of nodes in that status.
kubernetes_state.node.by_condition (gauge)	The condition of a cluster node
kubernetes_state.nodes.by_condition (gauge)	To sum by `condition` and `status` to get number of nodes in a given condition.
kubernetes_state.hpa.min_replicas (gauge)	Lower limit for the number of pods that can be set by the autoscaler
kubernetes_state.hpa.max_replicas (gauge)	Upper limit for the number of pods that can be set by the autoscaler
kubernetes_state.hpa.desired_replicas (gauge)	Desired number of replicas of pods managed by this autoscaler
kubernetes_state.hpa.condition (gauge)	Observed condition of autoscalers to sum by condition and status
kubernetes_state.pdb.pods_desired (gauge)	Minimum desired number of healthy pods
kubernetes_state.pdb.disruptions_allowed (gauge)	Number of pod disruptions that are currently allowed
kubernetes_state.pdb.pods_healthy (gauge)	Current number of healthy pods
kubernetes_state.pdb.pods_total (gauge)	Total number of pods counted by this disruption budget
kubernetes_state.pod.ready (gauge)	In association with the `condition` tag, whether the pod is ready to serve requests, e.g. `condition:true` keeps the pods that are in a ready state
kubernetes_state.pod.scheduled (gauge)	Reports the status of the scheduling process for the pod with its tags
kubernetes_state.pod.unschedulable (gauge)	Reports PODs that Kube scheduler cannot schedule on any node
kubernetes_state.pod.status_phase (gauge)	To sum by `phase` to get number of pods in a given phase, and `namespace` to break this down by namespace
kubernetes_state.replicaset.count (gauge)	The number of replicasets
kubernetes_state.replicaset.replicas (gauge)	The number of replicas per ReplicaSet
kubernetes_state.replicaset.fully_labeled_replicas (gauge)	The number of fully labeled replicas per ReplicaSet
kubernetes_state.replicaset.replicas_ready (gauge)	The number of ready replicas per ReplicaSet
kubernetes_state.replicaset.replicas_desired (gauge)	Number of desired pods for a ReplicaSet
kubernetes_state.replicationcontroller.replicas (gauge)	The number of replicas per ReplicationController
kubernetes_state.replicationcontroller.fully_labeled_replicas (gauge)	The number of fully labeled replicas per ReplicationController
kubernetes_state.replicationcontroller.replicas_ready (gauge)	The number of ready replicas per ReplicationController
kubernetes_state.replicationcontroller.replicas_desired (gauge)	Number of desired replicas for a ReplicationController
kubernetes_state.replicationcontroller.replicas_available (gauge)	The number of available replicas per ReplicationController
kubernetes_state.resourcequota.pods.used (gauge)	Observed number of pods used for a resource quota
kubernetes_state.resourcequota.services.used (gauge)	Observed number of services used for a resource quota
kubernetes_state.resourcequota.persistentvolumeclaims.used (gauge)	Observed number of persistent volume claims used for a resource quota
kubernetes_state.resourcequota.services.nodeports.used (gauge)	Observed number of node ports used for a resource quota
kubernetes_state.resourcequota.services.loadbalancers.used (gauge)	Observed number of loadbalancers used for a resource quota
kubernetes_state.resourcequota.requests.cpu.used (gauge)	Observed sum of CPU cores requested for a resource quota Shown as cpu
kubernetes_state.resourcequota.requests.memory.used (gauge)	Observed sum of memory bytes requested for a resource quota Shown as byte
kubernetes_state.resourcequota.requests.storage.used (gauge)	Observed sum of storage bytes requested for a resource quota Shown as byte
kubernetes_state.resourcequota.limits.cpu.used (gauge)	Observed sum of limits for CPU cores for a resource quota Shown as cpu
kubernetes_state.resourcequota.limits.memory.used (gauge)	Observed sum of limits for memory bytes for a resource quota Shown as byte
kubernetes_state.resourcequota.pods.limit (gauge)	Hard limit of the number of pods for a resource quota
kubernetes_state.resourcequota.services.limit (gauge)	Hard limit of the number of services for a resource quota
kubernetes_state.resourcequota.persistentvolumeclaims.limit (gauge)	Hard limit of the number of PVC for a resource quota
kubernetes_state.resourcequota.services.nodeports.limit (gauge)	Hard limit of the number of node ports for a resource quota
kubernetes_state.resourcequota.services.loadbalancers.limit (gauge)	Hard limit of the number of loadbalancers for a resource quota
kubernetes_state.resourcequota.requests.cpu.limit (gauge)	Hard limit on the total of CPU core requested for a resource quota Shown as cpu
kubernetes_state.resourcequota.requests.memory.limit (gauge)	Hard limit on the total of memory bytes requested for a resource quota Shown as byte
kubernetes_state.resourcequota.requests.storage.limit (gauge)	Hard limit on the total of storage bytes requested for a resource quota Shown as byte
kubernetes_state.resourcequota.limits.cpu.limit (gauge)	Hard limit on the sum of CPU core limits for a resource quota Shown as cpu
kubernetes_state.resourcequota.limits.memory.limit (gauge)	Hard limit on the sum of memory bytes limits for a resource quota Shown as byte
kubernetes_state.service.count (gauge)	Sum by namespace and type to count active services
kubernetes_state.statefulset.count (gauge)	The number of statefulsets
kubernetes_state.statefulset.replicas (gauge)	The number of replicas per statefulset
kubernetes_state.statefulset.replicas_desired (gauge)	The number of desired replicas per statefulset
kubernetes_state.statefulset.replicas_current (gauge)	The number of current replicas per StatefulSet
kubernetes_state.statefulset.replicas_ready (gauge)	The number of ready replicas per StatefulSet
kubernetes_state.statefulset.replicas_updated (gauge)	The number of updated replicas per StatefulSet
kubernetes_state.telemetry.payload.size (gauge)	The message size received from kube-state-metrics Shown as byte
kubernetes_state.telemetry.metrics.processed.count (count)	The number of metrics processed
kubernetes_state.telemetry.metrics.input.count (count)	The number of metrics received
kubernetes_state.telemetry.metrics.blacklist.count (count)	The number of metrics blacklisted by the check
kubernetes_state.telemetry.metrics.ignored.count (count)	The number of metrics ignored by the check
kubernetes_state.telemetry.collector.metrics.count (count)	The number of metrics by collector (kubernetes object kind) by kubernetes namespaces
kubernetes_state.vpa.lower_bound (gauge)	The vpa lower bound recommendation
kubernetes_state.vpa.target (gauge)	The vpa target recommendation
kubernetes_state.vpa.uncapped_target (gauge)	The vpa uncapped recommendation recommendation
kubernetes_state.vpa.upperbound (gauge)	The vpa upper bound recommendation
kubernetes_state.vpa.update_mode (gauge)	The vpa update mode

Kubernetes DNS


kubedns.cachemiss_count (gauge)	Number of DNS requests resulting in a cache miss. Shown as request
kubedns.cachemiss_count.count (count)	Instant number of DNS requests made resulting in a cache miss. Shown as request
kubedns.error_count (gauge)	Number of DNS requests resulting in an error. Shown as error
kubedns.error_count.count (count)	Instant number of DNS requests made resulting in an error. Shown as error
kubedns.request_count (gauge)	Total number of DNS requests made. Shown as request
kubedns.request_count.count (count)	Instant number of DNS requests made. Shown as request
kubedns.request_duration.seconds.count (gauge)	Number of requests on which the kubedns.request_duration.seconds.sum metric is evaluated. Shown as request
kubedns.request_duration.seconds.sum (gauge)	Time (in seconds) each request took to resolve. Shown as second
kubedns.response_size.bytes.count (gauge)	Number of responses on which the kubedns.response_size.bytes.sum metric is evaluated. Shown as response
kubedns.response_size.bytes.sum (gauge)	Size of the returns response in bytes. Shown as byte

Kubernetes proxy


kubeproxy.cpu.time (gauge)	Total user and system CPU time spent in seconds Shown as second
kubeproxy.mem.resident (gauge)	Resident memory size in bytes Shown as byte
kubeproxy.mem.virtual (gauge)	Virtual memory size in bytes Shown as byte
kubeproxy.rest.client.exec_plugin.certificate.rotation (gauge)	Histogram of the number of seconds the last auth exec plugin client certificate lived before being rotated. If auth exec plugin client certificates are unused, histogram will contain no data. Shown as second
kubeproxy.rest.client.exec_plugin.ttl (gauge)	Gauge of the shortest TTL (time-to-live) of the client certificate(s) managed by the auth exec plugin. The value is in seconds until certificate expiry (negative if already expired). If auth exec plugins are unused or manage no TLS certificates, the value will be +INF. (alpha) Shown as second
kubeproxy.rest.client.request.duration (gauge)	Request latency in seconds. Broken down by verb and URL. Shown as second
kubeproxy.rest.client.requests (gauge)	Number of HTTP requests partitioned by status code method and host Shown as request
kubeproxy.sync_proxy.rules.duration (gauge)	SyncProxyRules latency in seconds (alpha) Shown as second
kubeproxy.sync_proxy.rules.endpoint_changes.pending (gauge)	Pending proxy rules Endpoint changes (alpha)
kubeproxy.sync_proxy.rules.endpoint_changes.total (gauge)	Cumulative proxy rules Endpoint changes (alpha)
kubeproxy.sync_proxy.rules.iptables (gauge)	Number of proxy iptables rules programmed (alpha)
kubeproxy.sync_proxy.rules.iptables.restore_failures (gauge)	Cumulative proxy iptables restore failures (alpha)
kubeproxy.sync_proxy.rules.last_queued_timestamp (gauge)	The last time a sync of proxy rules was queued (alpha) Shown as second
kubeproxy.sync_proxy.rules.last_timestamp (gauge)	The last time proxy rules were successfully synced (alpha) Shown as second
kubeproxy.sync_proxy.rules.latency.count (gauge)	SyncProxyRules latency count (alpha)
kubeproxy.sync_proxy.rules.latency.sum (gauge)	SyncProxyRules latency sum (alpha) Shown as microsecond
kubeproxy.sync_proxy.rules.service_changes.pending (gauge)	Pending proxy rules Service changes (alpha)
kubeproxy.sync_proxy.rules.service_changes.total (gauge)	Cumulative proxy rules Service changes (alpha)

Kubernetes API server

For more information, see the documentation for the Kubernetes API server integration.


kube_apiserver.APIServiceRegistrationController_depth (gauge)	The current depth of workqueue: APIServiceRegistrationController
kube_apiserver.admission_controller_admission_duration_seconds.count (count)	The admission controller latency histogram in seconds identified by name and broken out for each operation and API resource and type (validate or admit) count
kube_apiserver.admission_controller_admission_duration_seconds.sum (gauge)	The admission controller latency histogram in seconds identified by name and broken out for each operation and API resource and type (validate or admit) Shown as second
kube_apiserver.admission_step_admission_latencies_seconds.count (count)	The admission sub-step latency histogram broken out for each operation and API resource and step type (validate or admit) count
kube_apiserver.admission_step_admission_latencies_seconds.sum (gauge)	The admission sub-step latency broken out for each operation and API resource and step type (validate or admit) Shown as second
kube_apiserver.admission_step_admission_latencies_seconds_summary.count (count)	The admission sub-step latency summary broken out for each operation and API resource and step type (validate or admit) count
kube_apiserver.admission_step_admission_latencies_seconds_summary.quantile (gauge)	The admission sub-step latency summary broken out for each operation and API resource and step type (validate or admit) quantile Shown as second
kube_apiserver.admission_step_admission_latencies_seconds_summary.sum (gauge)	The admission sub-step latency summary broken out for each operation and API resource and step type (validate or admit) Shown as second
kube_apiserver.admission_webhook_admission_latencies_seconds.count (count)	The admission webhook latency identified by name and broken out for each operation and API resource and type (validate or admit) count
kube_apiserver.admission_webhook_admission_latencies_seconds.sum (gauge)	The admission webhook latency identified by name and broken out for each operation and API resource and type (validate or admit) Shown as second
kube_apiserver.aggregator_unavailable_apiservice (gauge)	Gauge of APIServices which are marked as unavailable broken down by APIService name (alpha; Kubernetes 1.14+)
kube_apiserver.apiserver_admission_webhook_fail_open_count (gauge)	Admission webhook fail open count, identified by name and broken out for each admission type (validating or mutating).
kube_apiserver.apiserver_admission_webhook_fail_open_count.count (count)	Admission webhook fail open count, identified by name and broken out for each admission type (validating or mutating).
kube_apiserver.apiserver_admission_webhook_request_total (gauge)	Admission webhook request total, identified by name and broken out for each admission type (alpha; Kubernetes 1.23+)
kube_apiserver.apiserver_admission_webhook_request_total.count (count)	Admission webhook request total, identified by name and broken out for each admission type (alpha; Kubernetes 1.23+)
kube_apiserver.apiserver_dropped_requests_total (gauge)	The accumulated number of requests dropped with ‘Try again later’ response Shown as request
kube_apiserver.apiserver_dropped_requests_total.count (count)	The monotonic count of requests dropped with ‘Try again later’ response Shown as request
kube_apiserver.apiserver_request_count (gauge)	The accumulated number of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (deprecated in Kubernetes 1.15) Shown as request
kube_apiserver.apiserver_request_count.count (count)	The monotonic count of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (deprecated in Kubernetes 1.15) Shown as request
kube_apiserver.apiserver_request_terminations_total.count (count)	The number of requests the apiserver terminated in self-defense (Kubernetes 1.17+) Shown as request
kube_apiserver.apiserver_request_total (gauge)	The accumulated number of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (Kubernetes 1.15+; replaces apiserver_request_count) Shown as request
kube_apiserver.apiserver_request_total.count (count)	The monotonic count of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (Kubernetes 1.15+; replaces apiserver_request_count.count) Shown as request
kube_apiserver.audit_event (gauge)	The accumulated number audit events generated and sent to the audit backend Shown as event
kube_apiserver.audit_event.count (count)	The monotonic count of audit events generated and sent to the audit backend Shown as event
kube_apiserver.authenticated_user_requests (gauge)	The accumulated number of authenticated requests broken out by username Shown as request
kube_apiserver.authenticated_user_requests.count (count)	The monotonic count of authenticated requests broken out by username Shown as request
kube_apiserver.authentication_attempts.count (count)	The counter of authenticated attempts (Kubernetes 1.16+) Shown as request
kube_apiserver.authentication_duration_seconds.count (count)	The authentication duration histogram broken out by result (Kubernetes 1.17+)
kube_apiserver.authentication_duration_seconds.sum (gauge)	The authentication duration histogram broken out by result (Kubernetes 1.17+) Shown as second
kube_apiserver.current_inflight_requests (gauge)	The maximal number of currently used inflight request limit of this apiserver per request kind in last second.
kube_apiserver.envelope_encryption_dek_cache_fill_percent (gauge)	Percent of the cache slots currently occupied by cached DEKs.
kube_apiserver.etcd.db.total_size (gauge)	The total size of the etcd database file physically allocated in bytes (alpha; Kubernetes 1.19+) Shown as byte
kube_apiserver.etcd_object_counts (gauge)	The number of stored objects at the time of last check split by kind (alpha; deprecated in Kubernetes 1.22) Shown as object
kube_apiserver.etcd_request_duration_seconds.count (count)	Etcd request latencies count for each operation and object type (alpha)
kube_apiserver.etcd_request_duration_seconds.sum (gauge)	Etcd request latencies for each operation and object type (alpha) Shown as second
kube_apiserver.etcd_request_errors_total (count)	Etcd failed request counts for each operation and object type Shown as request
kube_apiserver.etcd_requests_total (count)	Etcd request counts for each operation and object type Shown as request
kube_apiserver.flowcontrol_current_executing_requests (gauge)	Number of requests in initial (for a WATCH) or any (for a non-WATCH) execution stage in the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_current_executing_seats (gauge)	Number of seats (concurrency units) currently occupied by executing requests in the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_current_inqueue_requests (count)	Number of requests currently pending in queues of the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_dispatched_requests_total (count)	Number of requests executed by API Priority and Fairness subsystem
kube_apiserver.flowcontrol_nominal_limit_seats (gauge)	Nominal limit on the number of execution seats available to requests in the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_rejected_requests_total.count (count)	Number of requests rejected by API Priority and Fairness subsystem
kube_apiserver.flowcontrol_request_concurrency_limit (gauge)	Shared concurrency limit in the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_request_wait_duration_seconds.count (count)	The request wait duration histogram count in the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_request_wait_duration_seconds.sum (gauge)	The request wait duration histogram sum in the API Priority and Fairness subsystem Shown as second
kube_apiserver.go_goroutines (gauge)	The number of goroutines that currently exist
kube_apiserver.go_threads (gauge)	The number of OS threads created Shown as thread
kube_apiserver.grpc_client_handled_total (count)	The total number of RPCs completed by the client regardless of success or failure Shown as request
kube_apiserver.grpc_client_msg_received_total (count)	The total number of gRPC stream messages received by the client Shown as message
kube_apiserver.grpc_client_msg_sent_total (count)	The total number of gRPC stream messages sent by the client Shown as message
kube_apiserver.grpc_client_started_total (count)	The total number of RPCs started on the client Shown as request
kube_apiserver.http_requests_total (gauge)	The accumulated number of HTTP requests made Shown as request
kube_apiserver.http_requests_total.count (count)	The monotonic count of the number of HTTP requests made Shown as request
kube_apiserver.kubernetes_feature_enabled (gauge)	Whether a Kubernetes feature gate is enabled or not, identified by name and stage (alpha; Kubernetes 1.26+)
kube_apiserver.longrunning_gauge (gauge)	The gauge of all active long-running apiserver requests broken out by verb, group, version, resource, scope, and component. Not all requests are tracked this way. Shown as request
kube_apiserver.process_cpu_total (count)	Total user and system CPU time spent in seconds. Shown as second
kube_apiserver.process_resident_memory_bytes (gauge)	The resident memory size in bytes Shown as byte
kube_apiserver.process_virtual_memory_bytes (gauge)	The virtual memory size in bytes Shown as byte
kube_apiserver.registered_watchers (gauge)	The number of currently registered watchers for a given resource Shown as object
kube_apiserver.request_duration_seconds.count (count)	The response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope, and component count
kube_apiserver.request_duration_seconds.sum (gauge)	The response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope, and component Shown as second
kube_apiserver.request_latencies.count (count)	The response latency distribution in microseconds for each verb, resource, and subresource count
kube_apiserver.request_latencies.sum (gauge)	The response latency distribution in microseconds for each verb, resource and subresource Shown as microsecond
kube_apiserver.requested_deprecated_apis (gauge)	Gauge of deprecated APIs that have been requested, broken out by API group, version, resource, subresource, and removed_release Shown as request
kube_apiserver.rest_client_request_latency_seconds.count (count)	The request latency in seconds broken down by verb and URL count
kube_apiserver.rest_client_request_latency_seconds.sum (gauge)	The request latency in seconds broken down by verb and URL Shown as second
kube_apiserver.rest_client_requests_total (gauge)	The accumulated number of HTTP requests partitioned by status code method and host Shown as request
kube_apiserver.rest_client_requests_total.count (count)	The monotonic count of HTTP requests partitioned by status code method and host Shown as request
kube_apiserver.slis.kubernetes_healthcheck (gauge)	Result of a single kubernetes apiserver healthcheck (alpha; requires k8s v1.26+)
kube_apiserver.slis.kubernetes_healthcheck_total (count)	The monotonic count of all kubernetes apiserver healthchecks (alpha; requires k8s v1.26+)
kube_apiserver.storage_list_evaluated_objects_total (gauge)	The number of objects tested in the course of serving a LIST request from storage (alpha; Kubernetes 1.23+) Shown as object
kube_apiserver.storage_list_fetched_objects_total (gauge)	The number of objects read from storage in the course of serving a LIST request (alpha; Kubernetes 1.23+) Shown as object
kube_apiserver.storage_list_returned_objects_total (gauge)	The number of objects returned for a LIST request from storage (alpha; Kubernetes 1.23+) Shown as object
kube_apiserver.storage_list_total (gauge)	The number of LIST requests served from storage (alpha; Kubernetes 1.23+) Shown as object
kube_apiserver.storage_objects (gauge)	The number of stored objects at the time of last check split by kind (Kubernetes 1.21+; replaces etcd_object_counts) Shown as object
kube_apiserver.watch_events_sizes.count (count)	The watch event size distribution (Kubernetes 1.16+)
kube_apiserver.watch_events_sizes.sum (gauge)	The watch event size distribution (Kubernetes 1.16+) Shown as byte

Kubernetes controller manager

For more information, see the documentation for the Kubernetes controller manager integration.


kube_controller_manager.goroutines (gauge)	Number of goroutines that currently exist
kube_controller_manager.job_controller.terminated_pods_tracking_finalizer (count)	Used to monitor whether the job controller is removing Pod finalizers from terminated Pods after accounting them in Job status
kube_controller_manager.leader_election.lease_duration (gauge)	Duration of the leadership lease
kube_controller_manager.leader_election.transitions (count)	Number of leadership transitions observed
kube_controller_manager.max_fds (gauge)	Maximum allowed open file descriptors
kube_controller_manager.nodes.count (gauge)	Number of registered nodes, per zone
kube_controller_manager.nodes.evictions (count)	Count of node eviction events, per zone
kube_controller_manager.nodes.unhealthy (gauge)	Number of unhealthy nodes, per zone
kube_controller_manager.open_fds (gauge)	Number of open file descriptors
kube_controller_manager.queue.adds (count)	Elements added, by queue
kube_controller_manager.queue.depth (gauge)	Current depth, by queue
kube_controller_manager.queue.latency.count (gauge)	Processing latency count, by queue (deprecated in kubernetes v1.14)
kube_controller_manager.queue.latency.quantile (gauge)	Processing latency quantiles, by queue (deprecated in kubernetes v1.14) Shown as microsecond
kube_controller_manager.queue.latency.sum (gauge)	Processing latency sum, by queue (deprecated in kubernetes v1.14) Shown as microsecond
kube_controller_manager.queue.process_duration.count (gauge)	How long processing an item from workqueue takes, by queue
kube_controller_manager.queue.process_duration.sum (gauge)	Total workqueue processing time, by queue Shown as second
kube_controller_manager.queue.queue_duration.count (gauge)	How long item stays in a queue before being requested, by queue
kube_controller_manager.queue.queue_duration.sum (gauge)	Total time of items stays in a queue before being requested, by queue Shown as second
kube_controller_manager.queue.retries (count)	Retries handled, by queue
kube_controller_manager.queue.work_duration.count (gauge)	Work duration, by queue (deprecated in kubernetes v1.14)
kube_controller_manager.queue.work_duration.quantile (gauge)	Work duration quantiles, by queue (deprecated in kubernetes v1.14) Shown as microsecond
kube_controller_manager.queue.work_duration.sum (gauge)	Work duration sum, by queue (deprecated in kubernetes v1.14) Shown as microsecond
kube_controller_manager.queue.work_longest_duration (gauge)	How many seconds has the longest running processor been running, by queue Shown as second
kube_controller_manager.queue.work_unfinished_duration (gauge)	How many seconds of work has done that is in progress and hasn’t been observed by process_duration, by queue Shown as second
kube_controller_manager.rate_limiter.use (gauge)	Usage of the rate limiter, by limiter
kube_controller_manager.slis.kubernetes_healthcheck (gauge)	Result of a single controller manager healthcheck (alpha; requires k8s v1.26+)
kube_controller_manager.slis.kubernetes_healthcheck_total (count)	Cumulative results of all controller manager healthchecks (alpha; requires k8s v1.26+)
kube_controller_manager.threads (gauge)	Number of OS threads created

Kubernetes metrics server

For more information, see the documentation for the Kubernetes metrics server integration.


kube_metrics_server.authenticated_user.requests (count)	Counter of authenticated requests broken out by username
kube_metrics_server.go.gc_duration_seconds.count (gauge)	Number of the GC invocation
kube_metrics_server.go.gc_duration_seconds.quantile (gauge)	GC invocation durations quantiles
kube_metrics_server.go.gc_duration_seconds.sum (gauge)	GC invocation durations sum
kube_metrics_server.go.goroutines (gauge)	Number of goroutines that currently exist
kube_metrics_server.kubelet_summary_request_duration.count (gauge)	Number of Kubelet summary request
kube_metrics_server.kubelet_summary_request_duration.sum (gauge)	The Kubelet summary request latencies sum
kube_metrics_server.kubelet_summary_scrapes_total (count)	Total number of attempted Summary API scrapes done by Metrics Server
kube_metrics_server.manager_tick_duration.count (gauge)	The total time spent collecting and storing metrics
kube_metrics_server.manager_tick_duration.sum (gauge)	The total time spent collecting and storing metrics
kube_metrics_server.process.max_fds (gauge)	Maximum number of open file descriptors
kube_metrics_server.process.open_fds (gauge)	Number of open file descriptors
kube_metrics_server.scraper_duration.count (gauge)	Time spent scraping sources
kube_metrics_server.scraper_duration.sum (gauge)	Time spent scraping sources
kube_metrics_server.scraper_last_time (gauge)	Last time metrics-server performed a scrape since unix epoch

Kubernetes scheduler

For more information, see the documentation for the Kubernetes scheduler integration.


kube_scheduler.binding_duration.count (gauge)	Number of latency in seconds
kube_scheduler.binding_duration.sum (gauge)	Total binding latency in seconds
kube_scheduler.cache.lookups (count)	Number of equivalence cache lookups, by whether or not a cache entry was found
kube_scheduler.client.http.requests (count)	Number of HTTP requests, partitioned by status code, method, and host
kube_scheduler.client.http.requests_duration.count (gauge)	Number of client requests. Broken down by verb and URL
kube_scheduler.client.http.requests_duration.sum (gauge)	Total latency. Broken down by verb and URL
kube_scheduler.gc_duration_seconds.count (gauge)	Number of the GC invocation
kube_scheduler.gc_duration_seconds.quantile (gauge)	GC invocation durations quantiles
kube_scheduler.gc_duration_seconds.sum (gauge)	GC invocation durations sum
kube_scheduler.goroutine_by_scheduling_operation (gauge)	Number of running goroutines split by the work they do such as binding (alpha; requires k8s v1.26+)
kube_scheduler.goroutines (gauge)	Number of goroutines that currently exist
kube_scheduler.max_fds (gauge)	Maximum allowed open file descriptors
kube_scheduler.open_fds (gauge)	Number of open file descriptors
kube_scheduler.pending_pods (gauge)	Number of pending pods, by the queue type (requires k8s v1.15+)
kube_scheduler.pod_preemption.attempts (count)	Number of preemption attempts in the cluster till now
kube_scheduler.pod_preemption.victims.count (gauge)	Number of selected pods during the latest preemption round
kube_scheduler.pod_preemption.victims.sum (gauge)	Total selected pods during the latest preemption round
kube_scheduler.queue.incoming_pods (count)	Number of pods added to scheduling queues by event and queue type (requires k8s v1.17+)
kube_scheduler.schedule_attempts (gauge)	Number of attempts to schedule pods, by the result. ‘unschedulable’ means a pod could not be scheduled, while ’error’ means an internal scheduler problem.
kube_scheduler.scheduling.algorithm.predicate_duration.count (gauge)	Number of scheduling algorithm predicate evaluation
kube_scheduler.scheduling.algorithm.predicate_duration.sum (gauge)	Total scheduling algorithm predicate evaluation duration
kube_scheduler.scheduling.algorithm.preemption_duration.count (gauge)	Number of scheduling algorithm preemption evaluation
kube_scheduler.scheduling.algorithm.preemption_duration.sum (gauge)	Total scheduling algorithm preemption evaluation duration
kube_scheduler.scheduling.algorithm.priority_duration.count (gauge)	Number of scheduling algorithm priority evaluation
kube_scheduler.scheduling.algorithm.priority_duration.sum (gauge)	Total scheduling algorithm priority evaluation duration
kube_scheduler.scheduling.algorithm_duration.count (gauge)	Number of scheduling algorithm latency
kube_scheduler.scheduling.algorithm_duration.sum (gauge)	Total scheduling algorithm latency
kube_scheduler.scheduling.attempt_duration.count (gauge)	Scheduling attempt latency in seconds (scheduling algorithm + binding) (requires k8s v1.23+)
kube_scheduler.scheduling.attempt_duration.sum (gauge)	Total scheduling attempt latency in seconds (scheduling algorithm + binding) (requires k8s v1.23+)
kube_scheduler.scheduling.e2e_scheduling_duration.count (gauge)	Number of E2e scheduling latency (scheduling algorithm + binding)
kube_scheduler.scheduling.e2e_scheduling_duration.sum (gauge)	Total E2e scheduling latency (scheduling algorithm + binding)
kube_scheduler.scheduling.pod.scheduling_attempts.count (gauge)	Number of attempts to successfully schedule a pod (requires k8s v1.23+)
kube_scheduler.scheduling.pod.scheduling_attempts.sum (gauge)	Total number of attempts to successfully schedule a pod (requires k8s v1.23+)
kube_scheduler.scheduling.pod.scheduling_duration.count (gauge)	E2e latency for a pod being scheduled which may include multiple scheduling attempts (requires k8s v1.23+)
kube_scheduler.scheduling.pod.scheduling_duration.sum (gauge)	Total e2e latency for a pod being scheduled which may include multiple scheduling attempts (requires k8s v1.23+)
kube_scheduler.scheduling.scheduling_duration.count (gauge)	Number of scheduling split by sub-parts of the scheduling operation
kube_scheduler.scheduling.scheduling_duration.quantile (gauge)	Scheduling latency quantiles split by sub-parts of the scheduling operation
kube_scheduler.scheduling.scheduling_duration.sum (gauge)	Total scheduling latency split by sub-parts of the scheduling operation
kube_scheduler.slis.kubernetes_healthcheck (gauge)	Result of a single scheduler healthcheck (alpha; requires k8s v1.26+)
kube_scheduler.slis.kubernetes_healthcheck_total (count)	Cumulative results of all scheduler healthchecks (alpha; requires k8s v1.26+)
kube_scheduler.threads (gauge)	Number of OS threads created
kube_scheduler.volume_scheduling_duration.count (gauge)	Number of Volume scheduling
kube_scheduler.volume_scheduling_duration.sum (gauge)	Total Volume scheduling stage latency

Events

Backoff
Conflict
Delete
DeletingAllPods
Didn’t have enough resource
Error
Failed
FailedCreate
FailedDelete
FailedMount
FailedSync
Failedvalidation
FreeDiskSpaceFailed
HostPortConflict
InsufficientFreeCPU
InsufficientFreeMemory
InvalidDiskCapacity
Killing
KubeletsetupFailed
NodeNotReady
NodeoutofDisk
OutofDisk
Rebooted
TerminatedAllPods
Unable
Unhealthy

Service checks

Kubelet

For more information, see the documentation for the Kubelet integration.

kubernetes.kubelet.check.ping

Returns CRITICAL if the Kubelet doesn’t respond to Ping. OK, otherwise

Statuses: ok, critical

kubernetes.kubelet.check.docker

Returns CRITICAL if the Docker service doesn’t run on the Kubelet. OK, otherwise

Statuses: ok, critical

kubernetes.kubelet.check.syncloop

Returns CRITICAL if the syncloop health check is down. OK, otherwise

Statuses: ok, critical

kubernetes.kubelet.check

Returns CRITICAL if the overall Kubelet health check is down. OK, otherwise

Statuses: ok, critical

Kubernetes controller manager

For more information, see the documentation for the Kubernetes controller manager integration.

kube_controller_manager.prometheus.health

Returns CRITICAL if the check cannot access the metrics endpoint.

Statuses: ok, critical

kube_controller_manager.leader_election.status

Returns CRITICAL if no replica is currently set as leader.

Statuses: ok, critical

kube_controller_manager.up

Returns CRITICAL if Kube Controller Manager is not healthy.

Statuses: ok, critical

Kubernetes metrics server

For more information, see the documentation for the Kubernetes metrics server integration.

kube_metrics_server.prometheus.health

Returns CRITICAL if the check cannot access the metrics endpoint.

Statuses: ok, critical

kube_metrics_server.up

Returns CRITICAL if Kubernetes Metrics Server is not healthy.

Statuses: ok, critical

Kubernetes scheduler

For more information, see the documentation for the Kubernetes scheduler integration.

kube_scheduler.prometheus.health

Returns CRITICAL if the check cannot access the metrics endpoint.

Statuses: ok, critical

kube_scheduler.leader_election.status

Returns CRITICAL if no replica is currently set as leader.

Statuses: ok, critical

kube_scheduler.up

Returns CRITICAL if Kube Scheduler is not healthy.

Statuses: ok, critical

Kubernetes state metrics core

For more information, see the documentation for the Kubernetes state metrics core integration.

kubernetes_state.cronjob.complete: Whether the last job of the cronjob is failed or not. Tags:kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.cronjob.on_schedule_check: Alert if the cronjob’s next schedule is in the past. Tags:kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.job.complete: Whether the job is failed or not. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.node.ready: Whether the node is ready. Tags:node condition status.
kubernetes_state.node.out_of_disk: Whether the node is out of disk. Tags:node condition status.
kubernetes_state.node.disk_pressure: Whether the node is under disk pressure. Tags:node condition status.
kubernetes_state.node.network_unavailable: Whether the node network is unavailable. Tags:node condition status.
kubernetes_state.node.memory_pressure: Whether the node network is under memory pressure. Tags:node condition status.