Kubernetes Data Collected

This page lists data collected by the Datadog Agent when deployed on a Kubernetes cluster.

The set of metrics collected may vary depending on the version of Kubernetes in use.

Metrics

Kubernetes

kubernetes.cpu.capacity
(gauge)
The number of cores in this machine
Shown as core
kubernetes.cpu.limits
(gauge)
The limit of cpu cores set
Shown as core
kubernetes.cpu.requests
(gauge)
The requested cpu cores
Shown as core
kubernetes.cpu.usage.total
(gauge)
The number of cores used
Shown as nanocore
kubernetes.diskio.io_service_bytes.stats.total
(gauge)
The amount of disk space the container uses.
Shown as byte
kubernetes.filesystem.usage
(gauge)
The amount of disk used. Requires Docker container runtime.
Shown as byte
kubernetes.filesystem.usage_pct
(gauge)
The percentage of disk used. Requires Docker container runtime.
Shown as fraction
kubernetes.memory.capacity
(gauge)
The amount of memory (in bytes) in this machine
Shown as byte
kubernetes.memory.limits
(gauge)
The limit of memory set
Shown as byte
kubernetes.memory.requests
(gauge)
The requested memory
Shown as byte
kubernetes.memory.usage
(gauge)
The amount of memory used
Shown as byte
kubernetes.network.rx_bytes
(gauge)
The amount of bytes per second received
Shown as byte
kubernetes.network.tx_bytes
(gauge)
The amount of bytes per second transmitted
Shown as byte
kubernetes.network_errors
(gauge)
The amount of network errors per second
Shown as error

Kubelet

For more information, see the documentation for the Kubelet integration.

kubernetes.containers.last_state.terminated
(gauge)
The number of containers that were previously terminated
kubernetes.pods.running
(gauge)
The number of running pods
kubernetes.pods.expired
(gauge)
The number of expired pods the check ignored
kubernetes.containers.running
(gauge)
The number of running containers
kubernetes.containers.restarts
(gauge)
The number of times the container has been restarted
kubernetes.containers.state.terminated
(gauge)
The number of currently terminated containers
kubernetes.containers.state.waiting
(gauge)
The number of currently waiting containers
kubernetes.cpu.load.10s.avg
(gauge)
Container cpu load average over the last 10 seconds
kubernetes.cpu.system.total
(gauge)
The number of cores used for system time
Shown as core
kubernetes.cpu.user.total
(gauge)
The number of cores used for user time
Shown as core
kubernetes.cpu.cfs.periods
(gauge)
Number of elapsed enforcement period intervals
kubernetes.cpu.cfs.throttled.periods
(gauge)
Number of throttled period intervals
kubernetes.cpu.cfs.throttled.seconds
(gauge)
Total time duration the container has been throttled
kubernetes.cpu.capacity
(gauge)
The number of cores in this machine (available until kubernetes v1.18)
Shown as core
kubernetes.cpu.usage.total
(gauge)
The number of cores used
Shown as nanocore
kubernetes.cpu.limits
(gauge)
The limit of cpu cores set
Shown as core
kubernetes.cpu.requests
(gauge)
The requested cpu cores
Shown as core
kubernetes.filesystem.usage
(gauge)
The amount of disk used
Shown as byte
kubernetes.filesystem.usage_pct
(gauge)
The percentage of disk used
Shown as fraction
kubernetes.io.read_bytes
(gauge)
The amount of bytes read from the disk
Shown as byte
kubernetes.io.write_bytes
(gauge)
The amount of bytes written to the disk
Shown as byte
kubernetes.memory.capacity
(gauge)
The amount of memory (in bytes) in this machine (available until kubernetes v1.18)
Shown as byte
kubernetes.memory.limits
(gauge)
The limit of memory set
Shown as byte
kubernetes.memory.sw_limit
(gauge)
The limit of swap space set
Shown as byte
kubernetes.memory.requests
(gauge)
The requested memory
Shown as byte
kubernetes.memory.usage
(gauge)
Current memory usage in bytes including all memory regardless of when it was accessed
Shown as byte
kubernetes.memory.working_set
(gauge)
Current working set in bytes - this is what the OOM killer is watching for
Shown as byte
kubernetes.memory.cache
(gauge)
The amount of memory that is being used to cache data from disk (e.g. memory contents that can be associated precisely with a block on a block device)
Shown as byte
kubernetes.memory.rss
(gauge)
Size of RSS in bytes
Shown as byte
kubernetes.memory.swap
(gauge)
The amount of swap currently used by by processes in this cgroup
Shown as byte
kubernetes.memory.usage_pct
(gauge)
The percentage of memory used per pod (memory limit must be set)
Shown as fraction
kubernetes.memory.sw_in_use
(gauge)
The percentage of swap space used
Shown as fraction
kubernetes.network.rx_bytes
(gauge)
The amount of bytes per second received
Shown as byte
kubernetes.network.rx_dropped
(gauge)
The amount of rx packets dropped per second
Shown as packet
kubernetes.network.rx_errors
(gauge)
The amount of rx errors per second
Shown as error
kubernetes.network.tx_bytes
(gauge)
The amount of bytes per second transmitted
Shown as byte
kubernetes.network.tx_dropped
(gauge)
The amount of tx packets dropped per second
Shown as packet
kubernetes.network.tx_errors
(gauge)
The amount of tx errors per second
Shown as error
kubernetes.diskio.io_service_bytes.stats.total
(gauge)
The amount of disk space the container uses
Shown as byte
kubernetes.apiserver.certificate.expiration.count
(gauge)
The count of remaining lifetime on the certificate used to authenticate a request
Shown as second
kubernetes.apiserver.certificate.expiration.sum
(gauge)
The sum of remaining lifetime on the certificate used to authenticate a request
Shown as second
kubernetes.rest.client.requests
(gauge)
The number of HTTP requests
Shown as operation
kubernetes.rest.client.latency.count
(gauge)
The count of request latency in seconds broken down by verb and URL
kubernetes.rest.client.latency.sum
(gauge)
The sum of request latency in seconds broken down by verb and URL
Shown as second
kubernetes.kubelet.pleg.discard_events
(count)
The number of discard events in PLEG
kubernetes.kubelet.pleg.last_seen
(gauge)
Timestamp in seconds when PLEG was last seen active
Shown as second
kubernetes.kubelet.pleg.relist_duration.count
(gauge)
The count of relisting pods in PLEG
kubernetes.kubelet.pleg.relist_duration.sum
(gauge)
The sum of duration in seconds for relisting pods in PLEG
Shown as second
kubernetes.kubelet.pleg.relist_interval.count
(gauge)
The count of relisting pods in PLEG
Shown as second
kubernetes.kubelet.pleg.relist_interval.sum
(gauge)
The sum of interval in seconds between relisting in PLEG
kubernetes.kubelet.runtime.operations
(count)
The number of runtime operations
Shown as operation
kubernetes.kubelet.runtime.errors
(gauge)
Cumulative number of runtime operations errors
Shown as operation
kubernetes.kubelet.runtime.operations.duration.sum
(gauge)
The sum of duration of operations
Shown as operation
kubernetes.kubelet.runtime.operations.duration.count
(gauge)
The count of operations
kubernetes.kubelet.network_plugin.latency.sum
(gauge)
The sum of latency in microseconds of network plugin operations
Shown as microsecond
kubernetes.kubelet.network_plugin.latency.count
(gauge)
The count of network plugin operations by latency
kubernetes.kubelet.network_plugin.latency.quantile
(gauge)
The quantiles of network plugin operations by latency
kubernetes.kubelet.volume.stats.available_bytes
(gauge)
The number of available bytes in the volume
Shown as byte
kubernetes.kubelet.volume.stats.capacity_bytes
(gauge)
The capacity in bytes of the volume
Shown as byte
kubernetes.kubelet.volume.stats.used_bytes
(gauge)
The number of used bytes in the volume
Shown as byte
kubernetes.kubelet.volume.stats.inodes
(gauge)
The maximum number of inodes in the volume
Shown as inode
kubernetes.kubelet.volume.stats.inodes_free
(gauge)
The number of free inodes in the volume
Shown as inode
kubernetes.kubelet.volume.stats.inodes_used
(gauge)
The number of used inodes in the volume
Shown as inode
kubernetes.ephemeral_storage.limits
(gauge)
Ephemeral storage limit of the container (requires kubernetes v1.8+)
Shown as byte
kubernetes.ephemeral_storage.requests
(gauge)
Ephemeral storage request of the container (requires kubernetes v1.8+)
Shown as byte
kubernetes.ephemeral_storage.usage
(gauge)
Ephemeral storage usage of the POD
Shown as byte
kubernetes.kubelet.evictions
(count)
The number of pods that have been evicted from the kubelet (ALPHA in kubernetes v1.16)
kubernetes.kubelet.cpu.usage
(gauge)
The number of cores used by kubelet
Shown as nanocore
kubernetes.kubelet.memory.usage
(gauge)
Current kubelet memory usage in bytes
Shown as byte
kubernetes.kubelet.memory.rss
(gauge)
Size of kubelet RSS in bytes
Shown as byte
kubernetes.runtime.cpu.usage
(gauge)
The number of cores used by the runtime
Shown as nanocore
kubernetes.runtime.memory.usage
(gauge)
Current runtime memory usage in bytes
Shown as byte
kubernetes.runtime.memory.rss
(gauge)
Size of runtime RSS in bytes
Shown as byte
kubernetes.kubelet.container.log_filesystem.used_bytes
(gauge)
Bytes used by the container's logs on the filesystem (requires kubernetes 1.14+)
Shown as byte
kubernetes.kubelet.pod.start.duration
(gauge)
Duration in microseconds for a single pod to go from pending to running
Shown as microsecond
kubernetes.kubelet.pod.worker.duration
(gauge)
Duration in microseconds to sync a single pod. Broken down by operation type: create, update, or sync
Shown as microsecond
kubernetes.kubelet.pod.worker.start.duration
(gauge)
Duration in microseconds from seeing a pod to starting a worker
Shown as microsecond
kubernetes.kubelet.docker.operations
(count)
The number of docker operations
Shown as operation
kubernetes.kubelet.docker.errors
(count)
The number of docker operations errors
Shown as operation
kubernetes.kubelet.docker.operations.duration.sum
(gauge)
The sum of duration of docker operations
Shown as operation
kubernetes.kubelet.docker.operations.duration.count
(gauge)
The count of docker operations
kubernetes.go_threads
(gauge)
Number of OS threads created
kubernetes.go_goroutines
(gauge)
Number of goroutines that currently exist
kubernetes.liveness_probe.success.total
(gauge)
Cumulative number of successful liveness probe for a container (ALPHA in kubernetes v1.15)
kubernetes.liveness_probe.failure.total
(gauge)
Cumulative number of failed liveness probe for a container (ALPHA in kubernetes v1.15)
kubernetes.readiness_probe.success.total
(gauge)
Cumulative number of successful readiness probe for a container (ALPHA in kubernetes v1.15)
kubernetes.readiness_probe.failure.total
(gauge)
Cumulative number of failed readiness probe for a container (ALPHA in kubernetes v1.15)
kubernetes.startup_probe.success.total
(gauge)
Cumulative number of successful startup probe for a container (ALPHA in kubernetes v1.15)
kubernetes.startup_probe.failure.total
(gauge)
Cumulative number of failed startup probe for a container (ALPHA in kubernetes v1.15)
kubernetes.node.filesystem.usage
(gauge)
The amount of disk used at node level
Shown as byte
kubernetes.node.filesystem.usage_pct
(gauge)
The percentage of disk space used at node level
Shown as fraction
kubernetes.node.image.filesystem.usage
(gauge)
The amount of disk used on image filesystem (node level)
Shown as byte
kubernetes.node.image.filesystem.usage_pct
(gauge)
The percentage of disk used (node level)
Shown as fraction

Kubernetes state metrics core

For more information, see the documentation for the Kubernetes state metrics core integration. This check requires Datadog Cluster Agent v1.12 or later.

kubernetes_state.apiservice.condition
(gauge)
The current condition of this apiservice. Tags:kube_namespace apiservice condition status.
kubernetes_state.apiservice.count
(gauge)
The current count of apiservices.
kubernetes_state.configmap.count
(gauge)
Number of ConfigMaps. Requires ConfigMaps to be added to Cluster Agent collector. Tags: kube_namespace.
kubernetes_state.container.cpu_limit
(gauge)
The value of CPU limit by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
Shown as cpu
kubernetes_state.container.cpu_limit.total
(gauge)
The total value of CPU limits by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
Shown as cpu
kubernetes_state.container.cpu_requested
(gauge)
The value of CPU requested by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
Shown as cpu
kubernetes_state.container.cpu_requested.total
(gauge)
The total value of CPU requested by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
Shown as cpu
kubernetes_state.container.gpu_limit
(gauge)
The value of GPU limit by a container. Tags:kube_namespace pod_name kube_container_name node resource mig_profile unit (env service version from standard labels).
kubernetes_state.container.gpu_limit.total
(gauge)
The total value of GPU limits by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
kubernetes_state.container.gpu_requested
(gauge)
The value of GPU requested by a container. Tags:kube_namespace pod_name kube_container_name node resource mig_profile unit (env service version from standard labels).
kubernetes_state.container.gpu_requested.total
(gauge)
The total value of GPU requested by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
kubernetes_state.container.memory_limit
(gauge)
The value of memory limit by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
Shown as byte
kubernetes_state.container.memory_limit.total
(gauge)
The total value of memory limits by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
Shown as byte
kubernetes_state.container.memory_requested
(gauge)
The value of memory requested by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
Shown as byte
kubernetes_state.container.memory_requested.total
(gauge)
The total value of memory requested by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
Shown as byte
kubernetes_state.container.network_bandwidth_limit
(gauge)
The value of network bandwidth limit for a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
kubernetes_state.container.network_bandwidth_requested
(gauge)
The value of network bandwidth requested by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
kubernetes_state.container.ready
(gauge)
Describes whether the containers readiness check succeeded. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.container.restarts
(gauge)
The number of container restarts per container. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.container.running
(gauge)
Describes whether the container is currently in running state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.container.status_report.count.terminated
(gauge)
Describes the reason the container is currently in terminated state. Tags:kube_namespace pod_name kube_container_name reason (env service version from standard labels).
kubernetes_state.container.status_report.count.waiting
(gauge)
Describes the reason the container is currently in waiting state. Tags:kube_namespace pod_name kube_container_name reason (env service version from standard labels).
kubernetes_state.container.terminated
(gauge)
Describes whether the container is currently in terminated state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.container.waiting
(gauge)
Describes whether the container is currently in waiting state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.crd.condition
(gauge)
The current condition of this custom resource definition. Tags: customresourcedefinition condition status.
kubernetes_state.crd.count
(gauge)
Number of custom resource definitions.
kubernetes_state.cronjob.count
(gauge)
Number of cronjobs. Tags:kube_namespace.
kubernetes_state.cronjob.duration_since_last_schedule
(gauge)
The duration since the last time the cronjob was scheduled. Tags:kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.cronjob.spec_suspend
(gauge)
Suspend flag tells the controller to suspend subsequent executions. Tags:kube_namespace kube_cronjob (env service version from standard labels).
kubernetes_state.daemonset.count
(gauge)
Number of DaemonSets. Tags:kube_namespace.
kubernetes_state.daemonset.daemons_available
(gauge)
The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and available. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.daemons_unavailable
(gauge)
The number of nodes that should be running the daemon pod and have none of the daemon pod running and available. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.desired
(gauge)
The number of nodes that should be running the daemon pod. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.misscheduled
(gauge)
The number of nodes running a daemon pod but are not supposed to. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.ready
(gauge)
The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.scheduled
(gauge)
The number of nodes running at least one daemon pod and are supposed to. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.updated
(gauge)
The total number of nodes that are running updated daemon pod. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.deployment.condition
(gauge)
The current status conditions of a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.count
(gauge)
Number of deployments. Tags:kube_namespace.
kubernetes_state.deployment.paused
(gauge)
Whether the deployment is paused and will not be processed by the deployment controller. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas
(gauge)
The number of replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas_available
(gauge)
The number of available replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas_desired
(gauge)
Number of desired pods for a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas_ready
(gauge)
The number of ready replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas_unavailable
(gauge)
The number of unavailable replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas_updated
(gauge)
The number of updated replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.rollingupdate.max_surge
(gauge)
Maximum number of replicas that can be scheduled above the desired number of replicas during a rolling update of a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.rollingupdate.max_unavailable
(gauge)
Maximum number of unavailable replicas during a rolling update of a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.endpoint.address_available
(gauge)
Number of addresses available in endpoint. Tags:endpoint kube_namespace.
kubernetes_state.endpoint.address_not_ready
(gauge)
Number of addresses not ready in endpoint. Tags:endpoint kube_namespace.
kubernetes_state.endpoint.count
(gauge)
Number of endpoints. Tags:kube_namespace.
kubernetes_state.hpa.condition
(gauge)
The condition of this autoscaler. Tags:kube_namespace horizontalpodautoscaler condition status.
kubernetes_state.hpa.count
(gauge)
Number of horizontal pod autoscaler. Tags: kube_namespace.
kubernetes_state.hpa.current_replicas
(gauge)
Current number of replicas of pods managed by this autoscaler. Tags:kube_namespace horizontalpodautoscaler.
kubernetes_state.hpa.desired_replicas
(gauge)
Desired number of replicas of pods managed by this autoscaler. Tags:kube_namespace horizontalpodautoscaler.
kubernetes_state.hpa.max_replicas
(gauge)
Upper limit for the number of pods that can be set by the autoscaler; cannot be smaller than MinReplicas. Tags:kube_namespace horizontalpodautoscaler.
kubernetes_state.hpa.min_replicas
(gauge)
Lower limit for the number of pods that can be set by the autoscaler default 1. Tags:kube_namespace horizontalpodautoscaler.
kubernetes_state.hpa.spec_target_metric
(gauge)
The metric specifications used by this autoscaler when calculating the desired replica count. Tags:kube_namespace horizontalpodautoscaler metric_name metric_target_type.
kubernetes_state.hpa.status_target_metric
(gauge)
The current metric status used by this autoscaler when calculating the desired replica count. Tags:kube_namespace horizontalpodautoscaler metric_name metric_target_type.
kubernetes_state.ingress.count
(gauge)
Number of ingresses. Tags:kube_namespace.
kubernetes_state.ingress.path
(gauge)
Information about the ingress path. Tags:kube_namespace kube_ingress_path kube_ingress kube_service kube_service_port kube_ingress_host .
kubernetes_state.initcontainer.restarts
(gauge)
Describes whether the number of restarts for the init container. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.initcontainer.waiting
(gauge)
Describes whether the init container is currently in waiting state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.job.completion.failed
(gauge)
The job has failed its execution. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.job.completion.succeeded
(gauge)
The job has completed its execution. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.job.count
(gauge)
Number of jobs. Tags:kube_namespace kube_cronjob.
kubernetes_state.job.duration
(gauge)
Time elapsed between the start and completion time of the job or the current time if the job is still running. Tags:kube_job kube_namespace (env service version from standard labels).
kubernetes_state.job.failed
(gauge)
The number of pods which reached Phase Failed. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.job.succeeded
(gauge)
The number of pods which reached Phase Succeeded. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.limitrange.cpu.default
(gauge)
Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as cpu
kubernetes_state.limitrange.cpu.default_request
(gauge)
Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as cpu
kubernetes_state.limitrange.cpu.max
(gauge)
Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as cpu
kubernetes_state.limitrange.cpu.max_limit_request_ratio
(gauge)
Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as cpu
kubernetes_state.limitrange.cpu.min
(gauge)
Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as cpu
kubernetes_state.limitrange.memory.default
(gauge)
Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as byte
kubernetes_state.limitrange.memory.default_request
(gauge)
Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as byte
kubernetes_state.limitrange.memory.max
(gauge)
Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as byte
kubernetes_state.limitrange.memory.max_limit_request_ratio
(gauge)
Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as byte
kubernetes_state.limitrange.memory.min
(gauge)
Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as byte
kubernetes_state.namespace.count
(gauge)
Number of namespaces. Tags:phase.
kubernetes_state.node.age
(gauge)
The time in seconds since the creation of the node. Tags:node.
Shown as second
kubernetes_state.node.by_condition
(gauge)
The condition of a cluster node. Tags:condition node status.
kubernetes_state.node.count
(gauge)
Number of nodes. Tags:kernel_version os_image container_runtime_version kubelet_version.
kubernetes_state.node.cpu_allocatable
(gauge)
The allocatable CPU of a node that is available for scheduling. Tags:node resource unit.
Shown as cpu
kubernetes_state.node.cpu_allocatable.total
(gauge)
The total allocatable CPU of all nodes in the cluster that is available for scheduling.
Shown as cpu
kubernetes_state.node.cpu_capacity
(gauge)
The CPU capacity of a node. Tags:node resource unit.
Shown as cpu
kubernetes_state.node.cpu_capacity.total
(gauge)
The total CPU capacity of all nodes in the cluster.
Shown as cpu
kubernetes_state.node.ephemeral_storage_allocatable
(gauge)
The allocatable ephemeral-storage of a node that is available for scheduling. Tags:node resource unit.
kubernetes_state.node.ephemeral_storage_capacity
(gauge)
The ephemeral-storage capacity of a node. Tags:node resource unit.
kubernetes_state.node.gpu_allocatable
(gauge)
The allocatable GPU of a node that is available for scheduling. Tags:node resource mig_profile unit.
kubernetes_state.node.gpu_allocatable.total
(gauge)
The total allocatable GPU of all nodes in the cluster that is available for scheduling.
kubernetes_state.node.gpu_capacity
(gauge)
The GPU capacity of a node. Tags:node resource mig_profile unit.
kubernetes_state.node.gpu_capacity.total
(gauge)
The total GPU capacity of all nodes in the cluster.
kubernetes_state.node.memory_allocatable
(gauge)
The allocatable memory of a node that is available for scheduling. Tags:node resource unit.
Shown as byte
kubernetes_state.node.memory_allocatable.total
(gauge)
The total allocatable memory of all nodes in the cluster that is available for scheduling.
Shown as byte
kubernetes_state.node.memory_capacity
(gauge)
The memory capacity of a node. Tags:node resource unit.
Shown as byte
kubernetes_state.node.memory_capacity.total
(gauge)
The total memory capacity of all nodes in the cluster.
Shown as byte
kubernetes_state.node.network_bandwidth_allocatable
(gauge)
The allocatable network bandwidth of a node that is available for scheduling. Tags:node resource unit.
kubernetes_state.node.network_bandwidth_capacity
(gauge)
The network bandwidth capacity of a node. Tags:node resource unit.
kubernetes_state.node.pods_allocatable
(gauge)
The allocatable memory of a node that is available for scheduling. Tags:node resource unit.
kubernetes_state.node.pods_capacity
(gauge)
The pods capacity of a node. Tags:node resource unit.
kubernetes_state.node.status
(gauge)
Whether the node can schedule new pods. Tags:node status.
kubernetes_state.pdb.disruptions_allowed
(gauge)
Number of pod disruptions that are currently allowed. Tags:kube_namespace poddisruptionbudget.
kubernetes_state.pdb.pods_desired
(gauge)
Minimum desired number of healthy pods. Tags:kube_namespace poddisruptionbudget.
kubernetes_state.pdb.pods_healthy
(gauge)
Current number of healthy pods. Tags:kube_namespace poddisruptionbudget.
kubernetes_state.pdb.pods_total
(gauge)
Total number of pods counted by this disruption budget. Tags:kube_namespace poddisruptionbudget.
kubernetes_state.persistentvolume.by_phase
(gauge)
The phase indicates if a volume is available bound to a claim or released by a claim. Tags:persistentvolume storageclass phase.
kubernetes_state.persistentvolume.capacity
(gauge)
Persistentvolume capacity in bytes. Tags:persistentvolume storageclass.
kubernetes_state.persistentvolumeclaim.access_mode
(gauge)
The access mode(s) specified by the persistent volume claim. Tags:kube_namespace persistentvolumeclaim access_mode storageclass.
kubernetes_state.persistentvolumeclaim.request_storage
(gauge)
The capacity of storage requested by the persistent volume claim. Tags:kube_namespace persistentvolumeclaim storageclass.
kubernetes_state.persistentvolumeclaim.status
(gauge)
The phase the persistent volume claim is currently in. Tags:kube_namespace persistentvolumeclaim phase storageclass.
kubernetes_state.pod.age
(gauge)
The time in seconds since the creation of the pod. Tags:node kube_namespace pod_name pod_phase (env service version from standard labels).
Shown as second
kubernetes_state.pod.count
(gauge)
Number of Pods. Tags:node kube_namespace kube_<owner kind>.
kubernetes_state.pod.ready
(gauge)
Describes whether the pod is ready to serve requests. Tags:node kube_namespace pod_name condition (env service version from standard labels).
kubernetes_state.pod.scheduled
(gauge)
Describes the status of the scheduling process for the pod. Tags:node kube_namespace pod_name condition (env service version from standard labels).
kubernetes_state.pod.status_phase
(gauge)
The pods current phase. Tags:node kube_namespace pod_name pod_phase (env service version from standard labels).
kubernetes_state.pod.tolerations
(gauge)
Information about the pod tolerations
kubernetes_state.pod.unschedulable
(gauge)
Describes the unschedulable status for the pod. Tags:kube_namespace pod_name (env service version from standard labels).
kubernetes_state.pod.uptime
(gauge)
The time in seconds since the pod has been scheduled and acknowledged by the Kubelet. Tags:node kube_namespace pod_name pod_phase (env service version from standard labels).
kubernetes_state.pod.volumes.persistentvolumeclaims_readonly
(gauge)
Describes whether a persistentvolumeclaim is mounted read only. Tags:node kube_namespace pod_name volume persistentvolumeclaim (env service version from standard labels).
kubernetes_state.replicaset.count
(gauge)
Number of ReplicaSets Tags:kube_namespace kube_deployment.
kubernetes_state.replicaset.fully_labeled_replicas
(gauge)
The number of fully labeled replicas per ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels).
kubernetes_state.replicaset.replicas
(gauge)
The number of replicas per ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels).
kubernetes_state.replicaset.replicas_desired
(gauge)
Number of desired pods for a ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels).
kubernetes_state.replicaset.replicas_ready
(gauge)
The number of ready replicas per ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels).
kubernetes_state.replicationcontroller.fully_labeled_replicas
(gauge)
The number of fully labeled replicas per ReplicationController. Tags:kube_namespace kube_replication_controller.
kubernetes_state.replicationcontroller.replicas
(gauge)
The number of replicas per ReplicationController. Tags:kube_namespace kube_replication_controller.
kubernetes_state.replicationcontroller.replicas_available
(gauge)
The number of available replicas per ReplicationController. Tags:kube_namespace kube_replication_controller.
kubernetes_state.replicationcontroller.replicas_desired
(gauge)
Number of desired pods for a ReplicationController. Tags:kube_namespace kube_replication_controller.
kubernetes_state.replicationcontroller.replicas_ready
(gauge)
The number of ready replicas per ReplicationController. Tags:kube_namespace kube_replication_controller.
kubernetes_state.resourcequota.count_configmaps.limit
(gauge)
Information about resource quota limits by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.count_configmaps.used
(gauge)
Information about resource quota usage by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.count_secrets.limit
(gauge)
Information about resource quota limits by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.count_secrets.used
(gauge)
Information about resource quota usage by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.pods.limit
(gauge)
Information about resource quota limits by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.pods.used
(gauge)
Information about resource quota usage by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.requests.cpu.limit
(gauge)
Information about resource quota limits by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.requests.cpu.used
(gauge)
Information about resource quota usage by resource. Tags:kube_namespace resourcequota.
kubernetes_state.secret.count
(gauge)
Number of Secrets. Requires Secrets to be added to Cluster Agent collector. Tags: kube_namespace.
kubernetes_state.secret.type
(gauge)
Type about secret. Tags:kube_namespace secret type.
kubernetes_state.service.count
(gauge)
Number of services. Tags:kube_namespace type.
kubernetes_state.service.type
(gauge)
Service types. Tags:kube_namespace kube_service type.
kubernetes_state.statefulset.count
(gauge)
Number of StatefulSets Tags:kube_namespace.
kubernetes_state.statefulset.replicas
(gauge)
The number of replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels).
kubernetes_state.statefulset.replicas_current
(gauge)
The number of current replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels).
kubernetes_state.statefulset.replicas_desired
(gauge)
Number of desired pods for a StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels).
kubernetes_state.statefulset.replicas_ready
(gauge)
The number of ready replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels).
kubernetes_state.statefulset.replicas_updated
(gauge)
The number of updated replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels).
kubernetes_state.vpa.count
(gauge)
Number of vertical pod autoscaler. Tags: kube_namespace.
kubernetes_state.vpa.lower_bound
(gauge)
Minimum resources the container can use before the VerticalPodAutoscaler updater evicts it. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.
kubernetes_state.vpa.spec_container_maxallowed
(gauge)
Maximum resources the VerticalPodAutoscaler can set for containers matching the name. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.
kubernetes_state.vpa.spec_container_minallowed
(gauge)
Minimum resources the VerticalPodAutoscaler can set for containers matching the name. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.
kubernetes_state.vpa.target
(gauge)
Target resources the VerticalPodAutoscaler recommends for the container. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.
kubernetes_state.vpa.uncapped_target
(gauge)
Target resources the VerticalPodAutoscaler recommends for the container ignoring bounds. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.
kubernetes_state.vpa.update_mode
(gauge)
Update mode of the VerticalPodAutoscaler. Tags:kube_namespace verticalpodautoscaler target_api_version target_kind target_name update_mode.
kubernetes_state.vpa.upperbound
(gauge)
Maximum resources the container can use before the VerticalPodAutoscaler updater evicts it. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.

Kubernetes state

Note: kubernetes_state.* metrics are gathered from the kube-state-metrics API. The kubernetes_state check is a legacy check. For an alternative, see Kubernetes state metrics core. Datadog recommends that you do not enable both checks simultaneously.

kubernetes_state.container.ready
(gauge)
Whether the containers readiness check succeeded
kubernetes_state.container.running
(gauge)
Whether the container is currently in running state
kubernetes_state.container.terminated
(gauge)
Whether the container is currently in terminated state
kubernetes_state.container.status_report.count.terminated
(gauge)
Count of the containers currently reporting a in terminated state with the reason as a tag
kubernetes_state.container.waiting
(gauge)
Whether the container is currently in waiting state
kubernetes_state.container.status_report.count.waiting
(gauge)
Count of the containers currently reporting a in waiting state with the reason as a tag
kubernetes_state.container.gpu.request
(gauge)
The number of requested gpu devices by a container
kubernetes_state.container.gpu.limit
(gauge)
The limit on gpu devices to be used by a container
kubernetes_state.container.restarts
(gauge)
The number of restarts per container
kubernetes_state.container.cpu_requested
(gauge)
The number of requested cpu cores by a container
Shown as cpu
kubernetes_state.container.memory_requested
(gauge)
The number of requested memory bytes by a container
Shown as byte
kubernetes_state.container.cpu_limit
(gauge)
The limit on cpu cores to be used by a container
Shown as cpu
kubernetes_state.container.memory_limit
(gauge)
The limit on memory to be used by a container
Shown as byte
kubernetes_state.daemonset.scheduled
(gauge)
The number of nodes running at least one daemon pod and that are supposed to
kubernetes_state.daemonset.misscheduled
(gauge)
The number of nodes running a daemon pod but are not supposed to
kubernetes_state.daemonset.desired
(gauge)
The number of nodes that should be running the daemon pod
kubernetes_state.daemonset.ready
(gauge)
The number of nodes that should be running the daemon pod and have one or more running and ready
kubernetes_state.daemonset.updated
(gauge)
The number of nodes that run the updated daemon pod spec
kubernetes_state.deployment.count
(gauge)
The number of deployments
kubernetes_state.deployment.replicas
(gauge)
The number of replicas per deployment
kubernetes_state.deployment.replicas_available
(gauge)
The number of available replicas per deployment
kubernetes_state.deployment.replicas_unavailable
(gauge)
The number of unavailable replicas per deployment
kubernetes_state.deployment.replicas_updated
(gauge)
The number of updated replicas per deployment
kubernetes_state.deployment.replicas_desired
(gauge)
The number of desired replicas per deployment
kubernetes_state.deployment.paused
(gauge)
Whether a deployment is paused
kubernetes_state.deployment.rollingupdate.max_unavailable
(gauge)
Maximum number of unavailable replicas during a rolling update
kubernetes_state.endpoint.address_available
(gauge)
Number of addresses available in endpoint
kubernetes_state.endpoint.address_not_ready
(gauge)
Number of addresses not ready in endpoint
kubernetes_state.endpoint.created
(gauge)
Unix creation timestamp
kubernetes_state.job.count
(gauge)
The number of jobs
kubernetes_state.job.failed
(count)
Observed number of failed pods in a job
kubernetes_state.job.succeeded
(count)
Observed number of succeeded pods in a job
kubernetes_state.limitrange.cpu.min
(gauge)
Minimum CPU request for this type
kubernetes_state.limitrange.cpu.max
(gauge)
Maximum CPU limit for this type
kubernetes_state.limitrange.cpu.default
(gauge)
Default CPU limit if not specified
kubernetes_state.limitrange.cpu.default_request
(gauge)
Default CPU request if not specified
kubernetes_state.limitrange.cpu.max_limit_request_ratio
(gauge)
Maximum CPU limit / request ratio
kubernetes_state.limitrange.memory.min
(gauge)
Minimum memory request for this type
kubernetes_state.limitrange.memory.max
(gauge)
Maximum memory limit for this type
kubernetes_state.limitrange.memory.default
(gauge)
Default memory limit if not specified
kubernetes_state.limitrange.memory.default_request
(gauge)
Default memory request if not specified
kubernetes_state.limitrange.memory.max_limit_request_ratio
(gauge)
Maximum memory limit / request ratio
kubernetes_state.node.count
(count)
The number of nodes
Shown as node
kubernetes_state.node.cpu_capacity
(gauge)
The total CPU resources of the node
Shown as cpu
kubernetes_state.node.memory_capacity
(gauge)
The total memory resources of the node
Shown as byte
kubernetes_state.node.pods_capacity
(gauge)
The total pod resources of the node
kubernetes_state.node.gpu.cards_allocatable
(gauge)
The GPU resources of a node that are available for scheduling
kubernetes_state.node.gpu.cards_capacity
(gauge)
The total GPU resources of the node
kubernetes_state.persistentvolumeclaim.status
(gauge)
The phase the persistent volume claim is currently in
kubernetes_state.persistentvolumeclaim.request_storage
(gauge)
Storage space request for a given pvc
Shown as byte
kubernetes_state.persistentvolumes.by_phase
(gauge)
Number of persistent volumes to sum by phase and storageclass
kubernetes_state.namespace.count
(gauge)
The number of namespaces
Shown as cpu
kubernetes_state.node.cpu_allocatable
(gauge)
The CPU resources of a node that are available for scheduling
Shown as cpu
kubernetes_state.node.memory_allocatable
(gauge)
The memory resources of a node that are available for scheduling
Shown as byte
kubernetes_state.node.pods_allocatable
(gauge)
The pod resources of a node that are available for scheduling
kubernetes_state.node.status
(gauge)
Submitted with a value of 1 for each node and tagged either 'status:schedulable' or 'status:unschedulable'; Sum this metric by either status to get the number of nodes in that status.
kubernetes_state.node.by_condition
(gauge)
The condition of a cluster node
kubernetes_state.nodes.by_condition
(gauge)
To sum by condition and status to get number of nodes in a given condition.
kubernetes_state.hpa.min_replicas
(gauge)
Lower limit for the number of pods that can be set by the autoscaler
kubernetes_state.hpa.max_replicas
(gauge)
Upper limit for the number of pods that can be set by the autoscaler
kubernetes_state.hpa.desired_replicas
(gauge)
Desired number of replicas of pods managed by this autoscaler
kubernetes_state.hpa.condition
(gauge)
Observed condition of autoscalers to sum by condition and status
kubernetes_state.pdb.pods_desired
(gauge)
Minimum desired number of healthy pods
kubernetes_state.pdb.disruptions_allowed
(gauge)
Number of pod disruptions that are currently allowed
kubernetes_state.pdb.pods_healthy
(gauge)
Current number of healthy pods
kubernetes_state.pdb.pods_total
(gauge)
Total number of pods counted by this disruption budget
kubernetes_state.pod.ready
(gauge)
In association with the condition tag, whether the pod is ready to serve requests, e.g. condition:true keeps the pods that are in a ready state
kubernetes_state.pod.scheduled
(gauge)
Reports the status of the scheduling process for the pod with its tags
kubernetes_state.pod.unschedulable
(gauge)
Reports PODs that Kube scheduler cannot schedule on any node
kubernetes_state.pod.status_phase
(gauge)
To sum by phase to get number of pods in a given phase, and namespace to break this down by namespace
kubernetes_state.replicaset.count
(gauge)
The number of replicasets
kubernetes_state.replicaset.replicas
(gauge)
The number of replicas per ReplicaSet
kubernetes_state.replicaset.fully_labeled_replicas
(gauge)
The number of fully labeled replicas per ReplicaSet
kubernetes_state.replicaset.replicas_ready
(gauge)
The number of ready replicas per ReplicaSet
kubernetes_state.replicaset.replicas_desired
(gauge)
Number of desired pods for a ReplicaSet
kubernetes_state.replicationcontroller.replicas
(gauge)
The number of replicas per ReplicationController
kubernetes_state.replicationcontroller.fully_labeled_replicas
(gauge)
The number of fully labeled replicas per ReplicationController
kubernetes_state.replicationcontroller.replicas_ready
(gauge)
The number of ready replicas per ReplicationController
kubernetes_state.replicationcontroller.replicas_desired
(gauge)
Number of desired replicas for a ReplicationController
kubernetes_state.replicationcontroller.replicas_available
(gauge)
The number of available replicas per ReplicationController
kubernetes_state.resourcequota.pods.used
(gauge)
Observed number of pods used for a resource quota
kubernetes_state.resourcequota.services.used
(gauge)
Observed number of services used for a resource quota
kubernetes_state.resourcequota.persistentvolumeclaims.used
(gauge)
Observed number of persistent volume claims used for a resource quota
kubernetes_state.resourcequota.services.nodeports.used
(gauge)
Observed number of node ports used for a resource quota
kubernetes_state.resourcequota.services.loadbalancers.used
(gauge)
Observed number of loadbalancers used for a resource quota
kubernetes_state.resourcequota.requests.cpu.used
(gauge)
Observed sum of CPU cores requested for a resource quota
Shown as cpu
kubernetes_state.resourcequota.requests.memory.used
(gauge)
Observed sum of memory bytes requested for a resource quota
Shown as byte
kubernetes_state.resourcequota.requests.storage.used
(gauge)
Observed sum of storage bytes requested for a resource quota
Shown as byte
kubernetes_state.resourcequota.limits.cpu.used
(gauge)
Observed sum of limits for CPU cores for a resource quota
Shown as cpu
kubernetes_state.resourcequota.limits.memory.used
(gauge)
Observed sum of limits for memory bytes for a resource quota
Shown as byte
kubernetes_state.resourcequota.pods.limit
(gauge)
Hard limit of the number of pods for a resource quota
kubernetes_state.resourcequota.services.limit
(gauge)
Hard limit of the number of services for a resource quota
kubernetes_state.resourcequota.persistentvolumeclaims.limit
(gauge)
Hard limit of the number of PVC for a resource quota
kubernetes_state.resourcequota.services.nodeports.limit
(gauge)
Hard limit of the number of node ports for a resource quota
kubernetes_state.resourcequota.services.loadbalancers.limit
(gauge)
Hard limit of the number of loadbalancers for a resource quota
kubernetes_state.resourcequota.requests.cpu.limit
(gauge)
Hard limit on the total of CPU core requested for a resource quota
Shown as cpu
kubernetes_state.resourcequota.requests.memory.limit
(gauge)
Hard limit on the total of memory bytes requested for a resource quota
Shown as byte
kubernetes_state.resourcequota.requests.storage.limit
(gauge)
Hard limit on the total of storage bytes requested for a resource quota
Shown as byte
kubernetes_state.resourcequota.limits.cpu.limit
(gauge)
Hard limit on the sum of CPU core limits for a resource quota
Shown as cpu
kubernetes_state.resourcequota.limits.memory.limit
(gauge)
Hard limit on the sum of memory bytes limits for a resource quota
Shown as byte
kubernetes_state.service.count
(gauge)
Sum by namespace and type to count active services
kubernetes_state.statefulset.count
(gauge)
The number of statefulsets
kubernetes_state.statefulset.replicas
(gauge)
The number of replicas per statefulset
kubernetes_state.statefulset.replicas_desired
(gauge)
The number of desired replicas per statefulset
kubernetes_state.statefulset.replicas_current
(gauge)
The number of current replicas per StatefulSet
kubernetes_state.statefulset.replicas_ready
(gauge)
The number of ready replicas per StatefulSet
kubernetes_state.statefulset.replicas_updated
(gauge)
The number of updated replicas per StatefulSet
kubernetes_state.telemetry.payload.size
(gauge)
The message size received from kube-state-metrics
Shown as byte
kubernetes_state.telemetry.metrics.processed.count
(count)
The number of metrics processed
kubernetes_state.telemetry.metrics.input.count
(count)
The number of metrics received
kubernetes_state.telemetry.metrics.blacklist.count
(count)
The number of metrics blacklisted by the check
kubernetes_state.telemetry.metrics.ignored.count
(count)
The number of metrics ignored by the check
kubernetes_state.telemetry.collector.metrics.count
(count)
The number of metrics by collector (kubernetes object kind) by kubernetes namespaces
kubernetes_state.vpa.lower_bound
(gauge)
The vpa lower bound recommendation
kubernetes_state.vpa.target
(gauge)
The vpa target recommendation
kubernetes_state.vpa.uncapped_target
(gauge)
The vpa uncapped recommendation recommendation
kubernetes_state.vpa.upperbound
(gauge)
The vpa upper bound recommendation
kubernetes_state.vpa.update_mode
(gauge)
The vpa update mode

Kubernetes DNS

kubedns.cachemiss_count
(gauge)
Number of DNS requests resulting in a cache miss.
Shown as request
kubedns.cachemiss_count.count
(count)
Instant number of DNS requests made resulting in a cache miss.
Shown as request
kubedns.error_count
(gauge)
Number of DNS requests resulting in an error.
Shown as error
kubedns.error_count.count
(count)
Instant number of DNS requests made resulting in an error.
Shown as error
kubedns.request_count
(gauge)
Total number of DNS requests made.
Shown as request
kubedns.request_count.count
(count)
Instant number of DNS requests made.
Shown as request
kubedns.request_duration.seconds.count
(gauge)
Number of requests on which the kubedns.request_duration.seconds.sum metric is evaluated.
Shown as request
kubedns.request_duration.seconds.sum
(gauge)
Time (in seconds) each request took to resolve.
Shown as second
kubedns.response_size.bytes.count
(gauge)
Number of responses on which the kubedns.response_size.bytes.sum metric is evaluated.
Shown as response
kubedns.response_size.bytes.sum
(gauge)
Size of the returns response in bytes.
Shown as byte

Kubernetes proxy

kubeproxy.cpu.time
(gauge)
Total user and system CPU time spent in seconds
Shown as second
kubeproxy.mem.resident
(gauge)
Resident memory size in bytes
Shown as byte
kubeproxy.mem.virtual
(gauge)
Virtual memory size in bytes
Shown as byte
kubeproxy.rest.client.exec_plugin.certificate.rotation
(gauge)
Histogram of the number of seconds the last auth exec plugin client certificate lived before being rotated. If auth exec plugin client certificates are unused, histogram will contain no data.
Shown as second
kubeproxy.rest.client.exec_plugin.ttl
(gauge)
Gauge of the shortest TTL (time-to-live) of the client certificate(s) managed by the auth exec plugin. The value is in seconds until certificate expiry (negative if already expired). If auth exec plugins are unused or manage no TLS certificates, the value will be +INF. (alpha)
Shown as second
kubeproxy.rest.client.request.duration
(gauge)
Request latency in seconds. Broken down by verb and URL.
Shown as second
kubeproxy.rest.client.requests
(gauge)
Number of HTTP requests partitioned by status code method and host
Shown as request
kubeproxy.sync_proxy.rules.duration
(gauge)
SyncProxyRules latency in seconds (alpha)
Shown as second
kubeproxy.sync_proxy.rules.endpoint_changes.pending
(gauge)
Pending proxy rules Endpoint changes (alpha)
kubeproxy.sync_proxy.rules.endpoint_changes.total
(gauge)
Cumulative proxy rules Endpoint changes (alpha)
kubeproxy.sync_proxy.rules.iptables
(gauge)
Number of proxy iptables rules programmed (alpha)
kubeproxy.sync_proxy.rules.iptables.restore_failures
(gauge)
Cumulative proxy iptables restore failures (alpha)
kubeproxy.sync_proxy.rules.last_queued_timestamp
(gauge)
The last time a sync of proxy rules was queued (alpha)
Shown as second
kubeproxy.sync_proxy.rules.last_timestamp
(gauge)
The last time proxy rules were successfully synced (alpha)
Shown as second
kubeproxy.sync_proxy.rules.latency.count
(gauge)
SyncProxyRules latency count (alpha)
kubeproxy.sync_proxy.rules.latency.sum
(gauge)
SyncProxyRules latency sum (alpha)
Shown as microsecond
kubeproxy.sync_proxy.rules.service_changes.pending
(gauge)
Pending proxy rules Service changes (alpha)
kubeproxy.sync_proxy.rules.service_changes.total
(gauge)
Cumulative proxy rules Service changes (alpha)

Kubernetes API server

For more information, see the documentation for the Kubernetes API server integration.

kube_apiserver.APIServiceRegistrationController_depth
(gauge)
The current depth of workqueue: APIServiceRegistrationController
kube_apiserver.admission_controller_admission_duration_seconds.count
(count)
The admission controller latency histogram in seconds identified by name and broken out for each operation and API resource and type (validate or admit) count
kube_apiserver.admission_controller_admission_duration_seconds.sum
(gauge)
The admission controller latency histogram in seconds identified by name and broken out for each operation and API resource and type (validate or admit)
Shown as second
kube_apiserver.admission_step_admission_latencies_seconds.count
(count)
The admission sub-step latency histogram broken out for each operation and API resource and step type (validate or admit) count
kube_apiserver.admission_step_admission_latencies_seconds.sum
(gauge)
The admission sub-step latency broken out for each operation and API resource and step type (validate or admit)
Shown as second
kube_apiserver.admission_step_admission_latencies_seconds_summary.count
(count)
The admission sub-step latency summary broken out for each operation and API resource and step type (validate or admit) count
kube_apiserver.admission_step_admission_latencies_seconds_summary.quantile
(gauge)
The admission sub-step latency summary broken out for each operation and API resource and step type (validate or admit) quantile
Shown as second
kube_apiserver.admission_step_admission_latencies_seconds_summary.sum
(gauge)
The admission sub-step latency summary broken out for each operation and API resource and step type (validate or admit)
Shown as second
kube_apiserver.admission_webhook_admission_latencies_seconds.count
(count)
The admission webhook latency identified by name and broken out for each operation and API resource and type (validate or admit) count
kube_apiserver.admission_webhook_admission_latencies_seconds.sum
(gauge)
The admission webhook latency identified by name and broken out for each operation and API resource and type (validate or admit)
Shown as second
kube_apiserver.aggregator_unavailable_apiservice
(gauge)
Gauge of APIServices which are marked as unavailable broken down by APIService name (alpha; Kubernetes 1.14+)
kube_apiserver.apiserver_admission_webhook_fail_open_count
(gauge)
Admission webhook fail open count, identified by name and broken out for each admission type (validating or mutating).
kube_apiserver.apiserver_admission_webhook_fail_open_count.count
(count)
Admission webhook fail open count, identified by name and broken out for each admission type (validating or mutating).
kube_apiserver.apiserver_dropped_requests_total
(gauge)
The accumulated number of requests dropped with 'Try again later' response
Shown as request
kube_apiserver.apiserver_dropped_requests_total.count
(count)
The monotonic count of requests dropped with 'Try again later' response
Shown as request
kube_apiserver.apiserver_request_count
(gauge)
The accumulated number of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (deprecated in Kubernetes 1.15)
Shown as request
kube_apiserver.apiserver_request_count.count
(count)
The monotonic count of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (deprecated in Kubernetes 1.15)
Shown as request
kube_apiserver.apiserver_request_terminations_total.count
(count)
The number of requests the apiserver terminated in self-defense (Kubernetes 1.17+)
Shown as request
kube_apiserver.apiserver_request_total
(gauge)
The accumulated number of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (Kubernetes 1.15+; replaces apiserverrequestcount)
Shown as request
kube_apiserver.apiserver_request_total.count
(count)
The monotonic count of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (Kubernetes 1.15+; replaces apiserverrequestcount.count)
Shown as request
kube_apiserver.audit_event
(gauge)
The accumulated number audit events generated and sent to the audit backend
Shown as event
kube_apiserver.audit_event.count
(count)
The monotonic count of audit events generated and sent to the audit backend
Shown as event
kube_apiserver.authenticated_user_requests
(gauge)
The accumulated number of authenticated requests broken out by username
Shown as request
kube_apiserver.authenticated_user_requests.count
(count)
The monotonic count of authenticated requests broken out by username
Shown as request
kube_apiserver.authentication_attempts.count
(count)
The counter of authenticated attempts (Kubernetes 1.16+)
Shown as request
kube_apiserver.authentication_duration_seconds.count
(count)
The authentication duration histogram broken out by result (Kubernetes 1.17+)
kube_apiserver.authentication_duration_seconds.sum
(gauge)
The authentication duration histogram broken out by result (Kubernetes 1.17+)
Shown as second
kube_apiserver.current_inflight_requests
(gauge)
The maximal number of currently used inflight request limit of this apiserver per request kind in last second.
kube_apiserver.envelope_encryption_dek_cache_fill_percent
(gauge)
Percent of the cache slots currently occupied by cached DEKs.
kube_apiserver.etcd.db.total_size
(gauge)
The total size of the etcd database file physically allocated in bytes (alpha; Kubernetes 1.19+)
Shown as byte
kube_apiserver.etcd_object_counts
(gauge)
The number of stored objects at the time of last check split by kind (alpha; deprecated in Kubernetes 1.22)
Shown as object
kube_apiserver.etcd_request_duration_seconds.count
(count)
Etcd request latencies count for each operation and object type (alpha)
kube_apiserver.etcd_request_duration_seconds.sum
(gauge)
Etcd request latencies for each operation and object type (alpha)
Shown as second
kube_apiserver.etcd_request_errors_total
(count)
Etcd failed request counts for each operation and object type
Shown as request
kube_apiserver.etcd_requests_total
(count)
Etcd request counts for each operation and object type
Shown as request
kube_apiserver.flowcontrol_current_executing_requests
(gauge)
Number of requests in initial (for a WATCH) or any (for a non-WATCH) execution stage in the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_current_inqueue_requests
(count)
Number of requests currently pending in queues of the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_dispatched_requests_total
(count)
Number of requests executed by API Priority and Fairness subsystem
kube_apiserver.flowcontrol_rejected_requests_total.count
(count)
Number of requests rejected by API Priority and Fairness subsystem
kube_apiserver.flowcontrol_request_concurrency_limit
(gauge)
Shared concurrency limit in the API Priority and Fairness subsystem
kube_apiserver.go_goroutines
(gauge)
The number of goroutines that currently exist
kube_apiserver.go_threads
(gauge)
The number of OS threads created
Shown as thread
kube_apiserver.grpc_client_handled_total
(count)
The total number of RPCs completed by the client regardless of success or failure
Shown as request
kube_apiserver.grpc_client_msg_received_total
(count)
The total number of gRPC stream messages received by the client
Shown as message
kube_apiserver.grpc_client_msg_sent_total
(count)
The total number of gRPC stream messages sent by the client
Shown as message
kube_apiserver.grpc_client_started_total
(count)
The total number of RPCs started on the client
Shown as request
kube_apiserver.http_requests_total
(gauge)
The accumulated number of HTTP requests made
Shown as request
kube_apiserver.http_requests_total.count
(count)
The monotonic count of the number of HTTP requests made
Shown as request
kube_apiserver.kubernetes_feature_enabled
(gauge)
Whether a Kubernetes feature gate is enabled or not, identified by name and stage (alpha; Kubernetes 1.26+)
kube_apiserver.longrunning_gauge
(gauge)
The gauge of all active long-running apiserver requests broken out by verb, group, version, resource, scope, and component. Not all requests are tracked this way.
Shown as request
kube_apiserver.process_resident_memory_bytes
(gauge)
The resident memory size in bytes
Shown as byte
kube_apiserver.process_virtual_memory_bytes
(gauge)
The virtual memory size in bytes
Shown as byte
kube_apiserver.registered_watchers
(gauge)
The number of currently registered watchers for a given resource
Shown as object
kube_apiserver.request_duration_seconds.count
(count)
The response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope, and component count
kube_apiserver.request_duration_seconds.sum
(gauge)
The response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope, and component
Shown as second
kube_apiserver.request_latencies.count
(count)
The response latency distribution in microseconds for each verb, resource, and subresource count
kube_apiserver.request_latencies.sum
(gauge)
The response latency distribution in microseconds for each verb, resource and subresource
Shown as microsecond
kube_apiserver.requested_deprecated_apis
(gauge)
Gauge of deprecated APIs that have been requested, broken out by API group, version, resource, subresource, and removed_release
Shown as request
kube_apiserver.rest_client_request_latency_seconds.count
(count)
The request latency in seconds broken down by verb and URL count
kube_apiserver.rest_client_request_latency_seconds.sum
(gauge)
The request latency in seconds broken down by verb and URL
Shown as second
kube_apiserver.rest_client_requests_total
(gauge)
The accumulated number of HTTP requests partitioned by status code method and host
Shown as request
kube_apiserver.rest_client_requests_total.count
(count)
The monotonic count of HTTP requests partitioned by status code method and host
Shown as request
kube_apiserver.slis.kubernetes_healthcheck
(gauge)
Result of a single kubernetes apiserver healthcheck (alpha; requires k8s v1.26+)
kube_apiserver.slis.kubernetes_healthcheck_total
(count)
The monotonic count of all kubernetes apiserver healthchecks (alpha; requires k8s v1.26+)
kube_apiserver.storage_list_evaluated_objects_total
(gauge)
The number of objects tested in the course of serving a LIST request from storage (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.storage_list_fetched_objects_total
(gauge)
The number of objects read from storage in the course of serving a LIST request (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.storage_list_returned_objects_total
(gauge)
The number of objects returned for a LIST request from storage (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.storage_list_total
(gauge)
The number of LIST requests served from storage (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.storage_objects
(gauge)
The number of stored objects at the time of last check split by kind (Kubernetes 1.21+; replaces etcdobjectcounts)
Shown as object
kube_apiserver.watch_events_sizes.count
(count)
The watch event size distribution (Kubernetes 1.16+)
kube_apiserver.watch_events_sizes.sum
(gauge)
The watch event size distribution (Kubernetes 1.16+)
Shown as byte

Kubernetes controller manager

For more information, see the documentation for the Kubernetes controller manager integration.

kube_controller_manager.goroutines
(gauge)
Number of goroutines that currently exist
kube_controller_manager.job_controller.terminated_pods_tracking_finalizer
(count)
Used to monitor whether the job controller is removing Pod finalizers from terminated Pods after accounting them in Job status
kube_controller_manager.leader_election.lease_duration
(gauge)
Duration of the leadership lease
kube_controller_manager.leader_election.transitions
(count)
Number of leadership transitions observed
kube_controller_manager.max_fds
(gauge)
Maximum allowed open file descriptors
kube_controller_manager.nodes.count
(gauge)
Number of registered nodes, per zone
kube_controller_manager.nodes.evictions
(count)
Count of node eviction events, per zone
kube_controller_manager.nodes.unhealthy
(gauge)
Number of unhealthy nodes, per zone
kube_controller_manager.open_fds
(gauge)
Number of open file descriptors
kube_controller_manager.queue.adds
(count)
Elements added, by queue
kube_controller_manager.queue.depth
(gauge)
Current depth, by queue
kube_controller_manager.queue.latency.count
(gauge)
Processing latency count, by queue (deprecated in kubernetes v1.14)
kube_controller_manager.queue.latency.quantile
(gauge)
Processing latency quantiles, by queue (deprecated in kubernetes v1.14)
Shown as microsecond
kube_controller_manager.queue.latency.sum
(gauge)
Processing latency sum, by queue (deprecated in kubernetes v1.14)
Shown as microsecond
kube_controller_manager.queue.process_duration.count
(gauge)
How long processing an item from workqueue takes, by queue
kube_controller_manager.queue.process_duration.sum
(gauge)
Total workqueue processing time, by queue
Shown as second
kube_controller_manager.queue.queue_duration.count
(gauge)
How long item stays in a queue before being requested, by queue
kube_controller_manager.queue.queue_duration.sum
(gauge)
Total time of items stays in a queue before being requested, by queue
Shown as second
kube_controller_manager.queue.retries
(count)
Retries handled, by queue
kube_controller_manager.queue.work_duration.count
(gauge)
Work duration, by queue (deprecated in kubernetes v1.14)
kube_controller_manager.queue.work_duration.quantile
(gauge)
Work duration quantiles, by queue (deprecated in kubernetes v1.14)
Shown as microsecond
kube_controller_manager.queue.work_duration.sum
(gauge)
Work duration sum, by queue (deprecated in kubernetes v1.14)
Shown as microsecond
kube_controller_manager.queue.work_longest_duration
(gauge)
How many seconds has the longest running processor been running, by queue
Shown as second
kube_controller_manager.queue.work_unfinished_duration
(gauge)
How many seconds of work has done that is in progress and hasn't been observed by process_duration, by queue
Shown as second
kube_controller_manager.rate_limiter.use
(gauge)
Usage of the rate limiter, by limiter
kube_controller_manager.slis.kubernetes_healthcheck
(gauge)
Result of a single controller manager healthcheck (alpha; requires k8s v1.26+)
kube_controller_manager.slis.kubernetes_healthcheck_total
(count)
Cumulative results of all controller manager healthchecks (alpha; requires k8s v1.26+)
kube_controller_manager.threads
(gauge)
Number of OS threads created

Kubernetes metrics server

For more information, see the documentation for the Kubernetes metrics server integration.

kube_metrics_server.authenticated_user.requests
(count)
Counter of authenticated requests broken out by username
kube_metrics_server.go.gc_duration_seconds.count
(gauge)
Number of the GC invocation
kube_metrics_server.go.gc_duration_seconds.quantile
(gauge)
GC invocation durations quantiles
kube_metrics_server.go.gc_duration_seconds.sum
(gauge)
GC invocation durations sum
kube_metrics_server.go.goroutines
(gauge)
Number of goroutines that currently exist
kube_metrics_server.kubelet_summary_request_duration.count
(gauge)
Number of Kubelet summary request
kube_metrics_server.kubelet_summary_request_duration.sum
(gauge)
The Kubelet summary request latencies sum
kube_metrics_server.kubelet_summary_scrapes_total
(count)
Total number of attempted Summary API scrapes done by Metrics Server
kube_metrics_server.manager_tick_duration.count
(gauge)
The total time spent collecting and storing metrics
kube_metrics_server.manager_tick_duration.sum
(gauge)
The total time spent collecting and storing metrics
kube_metrics_server.process.max_fds
(gauge)
Maximum number of open file descriptors
kube_metrics_server.process.open_fds
(gauge)
Number of open file descriptors
kube_metrics_server.scraper_duration.count
(gauge)
Time spent scraping sources
kube_metrics_server.scraper_duration.sum
(gauge)
Time spent scraping sources
kube_metrics_server.scraper_last_time
(gauge)
Last time metrics-server performed a scrape since unix epoch

Kubernetes scheduler

For more information, see the documentation for the Kubernetes scheduler integration.

kube_scheduler.binding_duration.count
(gauge)
Number of latency in seconds
kube_scheduler.binding_duration.sum
(gauge)
Total binding latency in seconds
kube_scheduler.cache.lookups
(count)
Number of equivalence cache lookups, by whether or not a cache entry was found
kube_scheduler.client.http.requests
(count)
Number of HTTP requests, partitioned by status code, method, and host
kube_scheduler.client.http.requests_duration.count
(gauge)
Number of client requests. Broken down by verb and URL
kube_scheduler.client.http.requests_duration.sum
(gauge)
Total latency. Broken down by verb and URL
kube_scheduler.gc_duration_seconds.count
(gauge)
Number of the GC invocation
kube_scheduler.gc_duration_seconds.quantile
(gauge)
GC invocation durations quantiles
kube_scheduler.gc_duration_seconds.sum
(gauge)
GC invocation durations sum
kube_scheduler.goroutine_by_scheduling_operation
(gauge)
Number of running goroutines split by the work they do such as binding (alpha; requires k8s v1.26+)
kube_scheduler.goroutines
(gauge)
Number of goroutines that currently exist
kube_scheduler.max_fds
(gauge)
Maximum allowed open file descriptors
kube_scheduler.open_fds
(gauge)
Number of open file descriptors
kube_scheduler.pending_pods
(gauge)
Number of pending pods, by the queue type (requires k8s v1.15+)
kube_scheduler.pod_preemption.attempts
(count)
Number of preemption attempts in the cluster till now
kube_scheduler.pod_preemption.victims.count
(gauge)
Number of selected pods during the latest preemption round
kube_scheduler.pod_preemption.victims.sum
(gauge)
Total selected pods during the latest preemption round
kube_scheduler.queue.incoming_pods
(count)
Number of pods added to scheduling queues by event and queue type (requires k8s v1.17+)
kube_scheduler.schedule_attempts
(gauge)
Number of attempts to schedule pods, by the result. 'unschedulable' means a pod could not be scheduled, while 'error' means an internal scheduler problem.
kube_scheduler.scheduling.algorithm.predicate_duration.count
(gauge)
Number of scheduling algorithm predicate evaluation
kube_scheduler.scheduling.algorithm.predicate_duration.sum
(gauge)
Total scheduling algorithm predicate evaluation duration
kube_scheduler.scheduling.algorithm.preemption_duration.count
(gauge)
Number of scheduling algorithm preemption evaluation
kube_scheduler.scheduling.algorithm.preemption_duration.sum
(gauge)
Total scheduling algorithm preemption evaluation duration
kube_scheduler.scheduling.algorithm.priority_duration.count
(gauge)
Number of scheduling algorithm priority evaluation
kube_scheduler.scheduling.algorithm.priority_duration.sum
(gauge)
Total scheduling algorithm priority evaluation duration
kube_scheduler.scheduling.algorithm_duration.count
(gauge)
Number of scheduling algorithm latency
kube_scheduler.scheduling.algorithm_duration.sum
(gauge)
Total scheduling algorithm latency
kube_scheduler.scheduling.attempt_duration.count
(gauge)
Scheduling attempt latency in seconds (scheduling algorithm + binding) (requires k8s v1.23+)
kube_scheduler.scheduling.attempt_duration.sum
(gauge)
Total scheduling attempt latency in seconds (scheduling algorithm + binding) (requires k8s v1.23+)
kube_scheduler.scheduling.e2e_scheduling_duration.count
(gauge)
Number of E2e scheduling latency (scheduling algorithm + binding)
kube_scheduler.scheduling.e2e_scheduling_duration.sum
(gauge)
Total E2e scheduling latency (scheduling algorithm + binding)
kube_scheduler.scheduling.pod.scheduling_attempts.count
(gauge)
Number of attempts to successfully schedule a pod (requires k8s v1.23+)
kube_scheduler.scheduling.pod.scheduling_attempts.sum
(gauge)
Total number of attempts to successfully schedule a pod (requires k8s v1.23+)
kube_scheduler.scheduling.pod.scheduling_duration.count
(gauge)
E2e latency for a pod being scheduled which may include multiple scheduling attempts (requires k8s v1.23+)
kube_scheduler.scheduling.pod.scheduling_duration.sum
(gauge)
Total e2e latency for a pod being scheduled which may include multiple scheduling attempts (requires k8s v1.23+)
kube_scheduler.scheduling.scheduling_duration.count
(gauge)
Number of scheduling split by sub-parts of the scheduling operation
kube_scheduler.scheduling.scheduling_duration.quantile
(gauge)
Scheduling latency quantiles split by sub-parts of the scheduling operation
kube_scheduler.scheduling.scheduling_duration.sum
(gauge)
Total scheduling latency split by sub-parts of the scheduling operation
kube_scheduler.slis.kubernetes_healthcheck
(gauge)
Result of a single scheduler healthcheck (alpha; requires k8s v1.26+)
kube_scheduler.slis.kubernetes_healthcheck_total
(count)
Cumulative results of all scheduler healthchecks (alpha; requires k8s v1.26+)
kube_scheduler.threads
(gauge)
Number of OS threads created
kube_scheduler.volume_scheduling_duration.count
(gauge)
Number of Volume scheduling
kube_scheduler.volume_scheduling_duration.sum
(gauge)
Total Volume scheduling stage latency

Events

  • Backoff
  • Conflict
  • Delete
  • DeletingAllPods
  • Didn’t have enough resource
  • Error
  • Failed
  • FailedCreate
  • FailedDelete
  • FailedMount
  • FailedSync
  • Failedvalidation
  • FreeDiskSpaceFailed
  • HostPortConflict
  • InsufficientFreeCPU
  • InsufficientFreeMemory
  • InvalidDiskCapacity
  • Killing
  • KubeletsetupFailed
  • NodeNotReady
  • NodeoutofDisk
  • OutofDisk
  • Rebooted
  • TerminatedAllPods
  • Unable
  • Unhealthy

Service checks

Kubelet

For more information, see the documentation for the Kubelet integration.

kubernetes.kubelet.check.ping
Returns CRITICAL if the Kubelet doesn’t respond to Ping. OK, otherwise
Statuses: ok, critical

kubernetes.kubelet.check.docker
Returns CRITICAL if the Docker service doesn’t run on the Kubelet. OK, otherwise
Statuses: ok, critical

kubernetes.kubelet.check.syncloop
Returns CRITICAL if the syncloop health check is down. OK, otherwise
Statuses: ok, critical

kubernetes.kubelet.check
Returns CRITICAL if the overall Kubelet health check is down. OK, otherwise
Statuses: ok, critical

Kubernetes controller manager

For more information, see the documentation for the Kubernetes controller manager integration.

kube_controller_manager.prometheus.health
Returns CRITICAL if the check cannot access the metrics endpoint.
Statuses: ok, critical

kube_controller_manager.leader_election.status
Returns CRITICAL if no replica is currently set as leader.
Statuses: ok, critical

kube_controller_manager.up
Returns CRITICAL if Kube Controller Manager is not healthy.
Statuses: ok, critical

Kubernetes metrics server

For more information, see the documentation for the Kubernetes metrics server integration.

kube_metrics_server.prometheus.health
Returns CRITICAL if the check cannot access the metrics endpoint.
Statuses: ok, critical

kube_metrics_server.up
Returns CRITICAL if Kubernetes Metrics Server is not healthy.
Statuses: ok, critical

Kubernetes scheduler

For more information, see the documentation for the Kubernetes scheduler integration.

kube_scheduler.prometheus.health
Returns CRITICAL if the check cannot access the metrics endpoint.
Statuses: ok, critical

kube_scheduler.leader_election.status
Returns CRITICAL if no replica is currently set as leader.
Statuses: ok, critical

kube_scheduler.up
Returns CRITICAL if Kube Scheduler is not healthy.
Statuses: ok, critical

Kubernetes state metrics core

For more information, see the documentation for the Kubernetes state metrics core integration.

kubernetes_state.cronjob.complete
Whether the last job of the cronjob is failed or not. Tags:kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.cronjob.on_schedule_check
Alert if the cronjob’s next schedule is in the past. Tags:kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.job.complete
Whether the job is failed or not. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.node.ready
Whether the node is ready. Tags:node condition status.
kubernetes_state.node.out_of_disk
Whether the node is out of disk. Tags:node condition status.
kubernetes_state.node.disk_pressure
Whether the node is under disk pressure. Tags:node condition status.
kubernetes_state.node.network_unavailable
Whether the node network is unavailable. Tags:node condition status.
kubernetes_state.node.memory_pressure
Whether the node network is under memory pressure. Tags:node condition status.

Further Reading