Kubernetes State Core

Supported OS Linux Mac OS Windows

통합 버전1.0.0

이 페이지는 아직 영어로 제공되지 않습니다. 번역 작업 중입니다.
현재 번역 프로젝트에 대한 질문이나 피드백이 있으신 경우 언제든지 연락주시기 바랍니다.

Overview

Get metrics from Kubernetes service in real-time to:

Visualize and monitor Kubernetes states.
Be notified about Kubernetes failovers and events.

The Kubernetes State Metrics Core check leverages kube-state-metrics version 2+ and includes major performance and tagging improvements compared to the legacy kubernetes_state check.

As opposed to the legacy check, with the Kubernetes State Metrics Core check, you no longer need to deploy kube-state-metrics in your cluster.

Kubernetes State Metrics Core provides a better alternative to the legacy kubernetes_state check as it offers more granular metrics and tags. See the Major Changes and Data Collected for more details.

Setup

Installation

The Kubernetes State Metrics Core check is included in the Datadog Cluster Agent image, so you don’t need to install anything else on your Kubernetes servers.

Requirements

Datadog Cluster Agent v1.12+

Configuration

In your Helm values.yaml, add the following:

datadog:
  # (...)
  kubeStateMetricsCore:
    enabled: true

To enable the kubernetes_state_core check, the setting spec.features.kubeStateMetricsCore.enabled must be set to true in the DatadogAgent resource:

kind: DatadogAgent
apiVersion: datadoghq.com/v2alpha1
metadata:
  name: datadog
spec:
  global:
    credentials:
      apiKey: <DATADOG_API_KEY>
  features:
    kubeStateMetricsCore:
      enabled: true

Note: Datadog Operator v0.7.0 or greater is required.

Migration from kubernetes_state to kubernetes_state_core

Tags removal

In the original kubernetes_state check, several tags have been flagged as deprecated and replaced by new tags. To determine your migration path, check which tags are submitted with your metrics.

In the kubernetes_state_core check, only the non-deprecated tags are submitted. Before migrating from kubernetes_state to kubernetes_state_core, verify that only official tags are used in monitors and dashboards.

Here is the mapping between deprecated tags and the official tags that have replaced them:

deprecated tag	official tag
cluster_name	kube_cluster_name
container	kube_container_name
cronjob	kube_cronjob
daemonset	kube_daemon_set
deployment	kube_deployment
hpa	horizontalpodautoscaler
image	image_name
job	kube_job
job_name	kube_job
namespace	kube_namespace
phase	pod_phase
pod	pod_name
replicaset	kube_replica_set
replicationcontroller	kube_replication_controller
statefulset	kube_stateful_set

Backward incompatibility changes

The Kubernetes State Metrics Core check is not backward compatible, be sure to read the changes carefully before migrating from the legacy kubernetes_state check.

kubernetes_state.node.by_condition: A new metric with node name granularity. The legacy metric kubernetes_state.nodes.by_condition is deprecated in favor of this one. Note: This metric is backported into the legacy check, where both metrics (it and the legacy metric it replaces) are available.
kubernetes_state.persistentvolume.by_phase: A new metric with persistentvolume name granularity. It replaces kubernetes_state.persistentvolumes.by_phase.
kubernetes_state.pod.status_phase: The metric is tagged with pod level tags, like pod_name.
kubernetes_state.node.count: The metric is not tagged with host anymore. It aggregates the nodes count by kernel_version os_image container_runtime_version kubelet_version.
kubernetes_state.container.waiting and kubernetes_state.container.status_report.count.waiting: These metrics no longer emit a 0 value if no pods are waiting. They only report non-zero values.
kube_job: In kubernetes_state, the kube_job tag value is the CronJob name if the Job had CronJob as an owner, otherwise it is the Job name. In kubernetes_state_core, the kube_job tag value is always the Job name, and a new kube_cronjob tag key is added with the CronJob name as the tag value. When migrating to kubernetes_state_core, it’s recommended to use the new tag or kube_job:foo*, where foo is the CronJob name, for query filters.
kubernetes_state.job.succeeded: In kubernetes_state, the kubernetes.job.succeeded was count type. In kubernetes_state_core it is gauge type.

Node-level tag assignment

Host or node-level tags no longer appear on cluster-centric metrics. Only metrics relative to an actual node in the cluster, like kubernetes_state.node.by_condition or kubernetes_state.container.restarts, continue to inherit their respective host or node level tags.

To add tags globally, use the DD_TAGS environment variable, or use the respective Helm or Operator configurations. Instance-only level tags can be specified by mounting a custom kubernetes_state_core.yaml into the Cluster Agent.

datadog:
  kubeStateMetricsCore:
    enabled: true
  tags: 
    - "<TAG_KEY>:<TAG_VALUE>"

kind: DatadogAgent
apiVersion: datadoghq.com/v2alpha1
metadata:
  name: datadog
spec:
  global:
    credentials:
      apiKey: <DATADOG_API_KEY>
    tags:
      - "<TAG_KEY>:<TAG_VALUE>"
  features:
    kubeStateMetricsCore:
      enabled: true

Metrics like kubernetes_state.container.memory_limit.total or kubernetes_state.node.count are aggregate counts of groups within a cluster, and host or node-level tags are not added.

Legacy check

Enabling kubeStateMetricsCore in your Helm values.yaml configures the Agent to ignore the auto configuration file for legacy kubernetes_state check. The goal is to avoid running both checks simultaneously.

If you still want to enable both checks simultaneously for the migration phase, disable the ignoreLegacyKSMCheck field in your values.yaml.

Note: ignoreLegacyKSMCheck makes the Agent only ignore the auto configuration for the legacy kubernetes_state check. Custom kubernetes_state configurations need to be removed manually.

The Kubernetes State Metrics Core check does not require deploying kube-state-metrics in your cluster anymore, you can disable deploying kube-state-metrics as part of the Datadog Helm Chart. To do this, add the following in your Helm values.yaml:

datadog:
  # (...)
  kubeStateMetricsEnabled: false

Important Note: The Kubernetes State Metrics Core check is an alternative to the legacy kubernetes_state check. Datadog recommends not enabling both checks simultaneously to guarantee consistent metrics.

Data Collected

Metrics


kubernetes_state.apiservice.condition (gauge)	The current condition of this apiservice. Tags:`kube_namespace` `apiservice` `condition` `status`.
kubernetes_state.apiservice.count (gauge)	The current count of apiservices.
kubernetes_state.configmap.count (gauge)	Number of ConfigMaps. Requires ConfigMaps to be added to Cluster Agent collector. Tags: `kube_namespace`.
kubernetes_state.container.cpu_limit (gauge)	The value of CPU limit by a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `unit` (`env` `service` `version` from standard labels). Shown as cpu
kubernetes_state.container.cpu_limit.total (gauge)	The total value of CPU limits by all containers in the cluster. Tags:`kube_namespace` `kube_container_name` `kube_<owner kind>`. Shown as cpu
kubernetes_state.container.cpu_requested (gauge)	The value of CPU requested by a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `unit` (`env` `service` `version` from standard labels). Shown as cpu
kubernetes_state.container.cpu_requested.total (gauge)	The total value of CPU requested by all containers in the cluster. Tags:`kube_namespace` `kube_container_name` `kube_<owner kind>`. Shown as cpu
kubernetes_state.container.gpu_limit (gauge)	The value of GPU limit by a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `mig_profile` `unit` (`env` `service` `version` from standard labels).
kubernetes_state.container.gpu_limit.total (gauge)	The total value of GPU limits by all containers in the cluster. Tags:`kube_namespace` `kube_container_name` `kube_<owner kind>`.
kubernetes_state.container.gpu_requested (gauge)	The value of GPU requested by a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `mig_profile` `unit` (`env` `service` `version` from standard labels).
kubernetes_state.container.gpu_requested.total (gauge)	The total value of GPU requested by all containers in the cluster. Tags:`kube_namespace` `kube_container_name` `kube_<owner kind>`.
kubernetes_state.container.memory_limit (gauge)	The value of memory limit by a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `unit` (`env` `service` `version` from standard labels). Shown as byte
kubernetes_state.container.memory_limit.total (gauge)	The total value of memory limits by all containers in the cluster. Tags:`kube_namespace` `kube_container_name` `kube_<owner kind>`. Shown as byte
kubernetes_state.container.memory_requested (gauge)	The value of memory requested by a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `unit` (`env` `service` `version` from standard labels). Shown as byte
kubernetes_state.container.memory_requested.total (gauge)	The total value of memory requested by all containers in the cluster. Tags:`kube_namespace` `kube_container_name` `kube_<owner kind>`. Shown as byte
kubernetes_state.container.network_bandwidth_limit (gauge)	The value of network bandwidth limit for a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `unit` (`env` `service` `version` from standard labels).
kubernetes_state.container.network_bandwidth_requested (gauge)	The value of network bandwidth requested by a container. Tags:`kube_namespace` `pod_name` `kube_container_name` `node` `resource` `unit` (`env` `service` `version` from standard labels).
kubernetes_state.container.ready (gauge)	Describes whether the containers readiness check succeeded. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.container.restarts (gauge)	The number of container restarts per container. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.container.running (gauge)	Describes whether the container is currently in running state. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.container.status_report.count.terminated (gauge)	Describes the reason the container is currently in terminated state. Tags:`kube_namespace` `pod_name` `kube_container_name` `reason` (`env` `service` `version` from standard labels).
kubernetes_state.container.status_report.count.waiting (gauge)	Describes the reason the container is currently in waiting state. Tags:`kube_namespace` `pod_name` `kube_container_name` `reason` (`env` `service` `version` from standard labels).
kubernetes_state.container.terminated (gauge)	Describes whether the container is currently in terminated state. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.container.waiting (gauge)	Describes whether the container is currently in waiting state. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.crd.condition (gauge)	The current condition of this custom resource definition. Tags: `customresourcedefinition` `condition` `status`.
kubernetes_state.crd.count (gauge)	Number of custom resource definitions.
kubernetes_state.cronjob.count (gauge)	Number of cronjobs. Tags:`kube_namespace`.
kubernetes_state.cronjob.duration_since_last_schedule (gauge)	The duration since the last time the cronjob was scheduled. Tags:`kube_cronjob` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.cronjob.duration_since_last_successful (gauge)	The duration since the last time the cronjob was successfully scheduled. Tags:`kube_cronjob` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.cronjob.spec_suspend (gauge)	Suspend flag tells the controller to suspend subsequent executions. Tags:`kube_namespace` `kube_cronjob` (`env` `service` `version` from standard labels).
kubernetes_state.daemonset.count (gauge)	Number of DaemonSets. Tags:`kube_namespace`.
kubernetes_state.daemonset.daemons_available (gauge)	The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and available. Tags:`kube_daemon_set` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.daemonset.daemons_unavailable (gauge)	The number of nodes that should be running the daemon pod and have none of the daemon pod running and available. Tags:`kube_daemon_set` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.daemonset.desired (gauge)	The number of nodes that should be running the daemon pod. Tags:`kube_daemon_set` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.daemonset.misscheduled (gauge)	The number of nodes running a daemon pod but are not supposed to. Tags:`kube_daemon_set` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.daemonset.ready (gauge)	The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready. Tags:`kube_daemon_set` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.daemonset.scheduled (gauge)	The number of nodes running at least one daemon pod and are supposed to. Tags:`kube_daemon_set` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.daemonset.updated (gauge)	The total number of nodes that are running updated daemon pod. Tags:`kube_daemon_set` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.condition (gauge)	The current status conditions of a deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.count (gauge)	Number of deployments. Tags:`kube_namespace`.
kubernetes_state.deployment.paused (gauge)	Whether the deployment is paused and will not be processed by the deployment controller. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.replicas (gauge)	The number of replicas per deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.replicas_available (gauge)	The number of available replicas per deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.replicas_desired (gauge)	Number of desired pods for a deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.replicas_ready (gauge)	The number of ready replicas per deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.replicas_unavailable (gauge)	The number of unavailable replicas per deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.replicas_updated (gauge)	The number of updated replicas per deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.rollingupdate.max_surge (gauge)	Maximum number of replicas that can be scheduled above the desired number of replicas during a rolling update of a deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.deployment.rollingupdate.max_unavailable (gauge)	Maximum number of unavailable replicas during a rolling update of a deployment. Tags:`kube_deployment` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.endpoint.address_available (gauge)	Number of addresses available in endpoint. Tags:`endpoint` `kube_namespace`.
kubernetes_state.endpoint.address_not_ready (gauge)	Number of addresses not ready in endpoint. Tags:`endpoint` `kube_namespace`.
kubernetes_state.endpoint.count (gauge)	Number of endpoints. Tags:`kube_namespace`.
kubernetes_state.hpa.condition (gauge)	The condition of this autoscaler. Tags:`kube_namespace` `horizontalpodautoscaler` `condition` `status`.
kubernetes_state.hpa.count (gauge)	Number of horizontal pod autoscaler. Tags: `kube_namespace`.
kubernetes_state.hpa.current_replicas (gauge)	Current number of replicas of pods managed by this autoscaler. Tags:`kube_namespace` `horizontalpodautoscaler`.
kubernetes_state.hpa.desired_replicas (gauge)	Desired number of replicas of pods managed by this autoscaler. Tags:`kube_namespace` `horizontalpodautoscaler`.
kubernetes_state.hpa.max_replicas (gauge)	Upper limit for the number of pods that can be set by the autoscaler; cannot be smaller than MinReplicas. Tags:`kube_namespace` `horizontalpodautoscaler`.
kubernetes_state.hpa.min_replicas (gauge)	Lower limit for the number of pods that can be set by the autoscaler default 1. Tags:`kube_namespace` `horizontalpodautoscaler`.
kubernetes_state.hpa.spec_target_metric (gauge)	The metric specifications used by this autoscaler when calculating the desired replica count. Tags:`kube_namespace` `horizontalpodautoscaler` `metric_name` `metric_target_type`.
kubernetes_state.hpa.status_target_metric (gauge)	The current metric status used by this autoscaler when calculating the desired replica count. Tags:`kube_namespace` `horizontalpodautoscaler` `metric_name` `metric_target_type`.
kubernetes_state.ingress.count (gauge)	Number of ingresses. Tags:`kube_namespace`.
kubernetes_state.ingress.path (gauge)	Information about the ingress path. Tags:`kube_namespace` `kube_ingress_path` `kube_ingress` `kube_service` `kube_service_port` `kube_ingress_host` .
kubernetes_state.initcontainer.restarts (gauge)	Describes whether the number of restarts for the init container. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.initcontainer.waiting (gauge)	Describes whether the init container is currently in waiting state. Tags:`kube_namespace` `pod_name` `kube_container_name` (`env` `service` `version` from standard labels).
kubernetes_state.job.completion.failed (gauge)	The job has failed its execution. Tags:`kube_job` or `kube_cronjob` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.job.completion.succeeded (gauge)	The job has completed its execution. Tags:`kube_job` or `kube_cronjob` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.job.count (gauge)	Number of jobs. Tags:`kube_namespace` `kube_cronjob`.
kubernetes_state.job.duration (gauge)	Time elapsed between the start and completion time of the job or the current time if the job is still running. Tags:`kube_job` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.job.failed (gauge)	The number of pods which reached Phase Failed. Tags:`kube_job` or `kube_cronjob` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.job.succeeded (gauge)	The number of pods which reached Phase Succeeded. Tags:`kube_job` or `kube_cronjob` `kube_namespace` (`env` `service` `version` from standard labels).
kubernetes_state.limitrange.cpu.default (gauge)	Information about CPU limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as cpu
kubernetes_state.limitrange.cpu.default_request (gauge)	Information about CPU limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as cpu
kubernetes_state.limitrange.cpu.max (gauge)	Information about CPU limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as cpu
kubernetes_state.limitrange.cpu.max_limit_request_ratio (gauge)	Information about CPU limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as cpu
kubernetes_state.limitrange.cpu.min (gauge)	Information about CPU limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as cpu
kubernetes_state.limitrange.memory.default (gauge)	Information about memory limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as byte
kubernetes_state.limitrange.memory.default_request (gauge)	Information about memory limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as byte
kubernetes_state.limitrange.memory.max (gauge)	Information about memory limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as byte
kubernetes_state.limitrange.memory.max_limit_request_ratio (gauge)	Information about memory limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as byte
kubernetes_state.limitrange.memory.min (gauge)	Information about memory limit range usage by constraint. Tags:`kube_namespace` `limitrange` `type`. Shown as byte
kubernetes_state.namespace.count (gauge)	Number of namespaces. Tags:`phase`.
kubernetes_state.node.age (gauge)	The time in seconds since the creation of the node. Tags:`node`. Shown as second
kubernetes_state.node.by_condition (gauge)	The condition of a cluster node. Tags:`condition` `node` `status`.
kubernetes_state.node.count (gauge)	Number of nodes. Tags:`kernel_version` `os_image` `container_runtime_version` `kubelet_version`.
kubernetes_state.node.cpu_allocatable (gauge)	The allocatable CPU of a node that is available for scheduling. Tags:`node` `resource` `unit`. Shown as cpu
kubernetes_state.node.cpu_allocatable.total (gauge)	The total allocatable CPU of all nodes in the cluster that is available for scheduling. Shown as cpu
kubernetes_state.node.cpu_capacity (gauge)	The CPU capacity of a node. Tags:`node` `resource` `unit`. Shown as cpu
kubernetes_state.node.cpu_capacity.total (gauge)	The total CPU capacity of all nodes in the cluster. Shown as cpu
kubernetes_state.node.ephemeral_storage_allocatable (gauge)	The allocatable ephemeral-storage of a node that is available for scheduling. Tags:`node` `resource` `unit`.
kubernetes_state.node.ephemeral_storage_capacity (gauge)	The ephemeral-storage capacity of a node. Tags:`node` `resource` `unit`.
kubernetes_state.node.gpu_allocatable (gauge)	The allocatable GPU of a node that is available for scheduling. Tags:`node` `resource` `mig_profile` `unit`.
kubernetes_state.node.gpu_allocatable.total (gauge)	The total allocatable GPU of all nodes in the cluster that is available for scheduling.
kubernetes_state.node.gpu_capacity (gauge)	The GPU capacity of a node. Tags:`node` `resource` `mig_profile` `unit`.
kubernetes_state.node.gpu_capacity.total (gauge)	The total GPU capacity of all nodes in the cluster.
kubernetes_state.node.memory_allocatable (gauge)	The allocatable memory of a node that is available for scheduling. Tags:`node` `resource` `unit`. Shown as byte
kubernetes_state.node.memory_allocatable.total (gauge)	The total allocatable memory of all nodes in the cluster that is available for scheduling. Shown as byte
kubernetes_state.node.memory_capacity (gauge)	The memory capacity of a node. Tags:`node` `resource` `unit`. Shown as byte
kubernetes_state.node.memory_capacity.total (gauge)	The total memory capacity of all nodes in the cluster. Shown as byte
kubernetes_state.node.network_bandwidth_allocatable (gauge)	The allocatable network bandwidth of a node that is available for scheduling. Tags:`node` `resource` `unit`.
kubernetes_state.node.network_bandwidth_capacity (gauge)	The network bandwidth capacity of a node. Tags:`node` `resource` `unit`.
kubernetes_state.node.pods_allocatable (gauge)	The allocatable memory of a node that is available for scheduling. Tags:`node` `resource` `unit`.
kubernetes_state.node.pods_capacity (gauge)	The pods capacity of a node. Tags:`node` `resource` `unit`.
kubernetes_state.node.status (gauge)	Whether the node can schedule new pods. Tags:`node` `status`.
kubernetes_state.pdb.disruptions_allowed (gauge)	Number of pod disruptions that are currently allowed. Tags:`kube_namespace` `poddisruptionbudget`.
kubernetes_state.pdb.pods_desired (gauge)	Minimum desired number of healthy pods. Tags:`kube_namespace` `poddisruptionbudget`.
kubernetes_state.pdb.pods_healthy (gauge)	Current number of healthy pods. Tags:`kube_namespace` `poddisruptionbudget`.
kubernetes_state.pdb.pods_total (gauge)	Total number of pods counted by this disruption budget. Tags:`kube_namespace` `poddisruptionbudget`.
kubernetes_state.persistentvolume.by_phase (gauge)	The phase indicates if a volume is available bound to a claim or released by a claim. Tags:`persistentvolume` `storageclass` `phase`.
kubernetes_state.persistentvolume.capacity (gauge)	Persistentvolume capacity in bytes. Tags:`persistentvolume` `storageclass`.
kubernetes_state.persistentvolumeclaim.access_mode (gauge)	The access mode(s) specified by the persistent volume claim. Tags:`kube_namespace` `persistentvolumeclaim` `access_mode` `storageclass`.
kubernetes_state.persistentvolumeclaim.request_storage (gauge)	The capacity of storage requested by the persistent volume claim. Tags:`kube_namespace` `persistentvolumeclaim` `storageclass`.
kubernetes_state.persistentvolumeclaim.status (gauge)	The phase the persistent volume claim is currently in. Tags:`kube_namespace` `persistentvolumeclaim` `phase` `storageclass`.
kubernetes_state.pod.age (gauge)	The time in seconds since the creation of the pod. Tags:`node` `kube_namespace` `pod_name` `pod_phase` (`env` `service` `version` from standard labels). Shown as second
kubernetes_state.pod.count (gauge)	Number of Pods. Tags:`node` `kube_namespace` `kube_<owner kind>`.
kubernetes_state.pod.ready (gauge)	Describes whether the pod is ready to serve requests. Tags:`node` `kube_namespace` `pod_name` `condition` (`env` `service` `version` from standard labels).
kubernetes_state.pod.scheduled (gauge)	Describes the status of the scheduling process for the pod. Tags:`node` `kube_namespace` `pod_name` `condition` (`env` `service` `version` from standard labels).
kubernetes_state.pod.status_phase (gauge)	The pods current phase. Tags:`node` `kube_namespace` `pod_name` `pod_phase` (`env` `service` `version` from standard labels).
kubernetes_state.pod.tolerations (gauge)	Information about the pod tolerations
kubernetes_state.pod.unschedulable (gauge)	Describes the unschedulable status for the pod. Tags:`kube_namespace` `pod_name` (`env` `service` `version` from standard labels).
kubernetes_state.pod.uptime (gauge)	The time in seconds since the pod has been scheduled and acknowledged by the Kubelet. Tags:`node` `kube_namespace` `pod_name` `pod_phase` (`env` `service` `version` from standard labels).
kubernetes_state.pod.volumes.persistentvolumeclaims_readonly (gauge)	Describes whether a persistentvolumeclaim is mounted read only. Tags:`node` `kube_namespace` `pod_name` `volume` `persistentvolumeclaim` (`env` `service` `version` from standard labels).
kubernetes_state.replicaset.count (gauge)	Number of ReplicaSets Tags:`kube_namespace` `kube_deployment`.
kubernetes_state.replicaset.fully_labeled_replicas (gauge)	The number of fully labeled replicas per ReplicaSet. Tags:`kube_namespace` `kube_replica_set` (`env` `service` `version` from standard labels).
kubernetes_state.replicaset.replicas (gauge)	The number of replicas per ReplicaSet. Tags:`kube_namespace` `kube_replica_set` (`env` `service` `version` from standard labels).
kubernetes_state.replicaset.replicas_desired (gauge)	Number of desired pods for a ReplicaSet. Tags:`kube_namespace` `kube_replica_set` (`env` `service` `version` from standard labels).
kubernetes_state.replicaset.replicas_ready (gauge)	The number of ready replicas per ReplicaSet. Tags:`kube_namespace` `kube_replica_set` (`env` `service` `version` from standard labels).
kubernetes_state.replicationcontroller.fully_labeled_replicas (gauge)	The number of fully labeled replicas per ReplicationController. Tags:`kube_namespace` `kube_replication_controller`.
kubernetes_state.replicationcontroller.replicas (gauge)	The number of replicas per ReplicationController. Tags:`kube_namespace` `kube_replication_controller`.
kubernetes_state.replicationcontroller.replicas_available (gauge)	The number of available replicas per ReplicationController. Tags:`kube_namespace` `kube_replication_controller`.
kubernetes_state.replicationcontroller.replicas_desired (gauge)	Number of desired pods for a ReplicationController. Tags:`kube_namespace` `kube_replication_controller`.
kubernetes_state.replicationcontroller.replicas_ready (gauge)	The number of ready replicas per ReplicationController. Tags:`kube_namespace` `kube_replication_controller`.
kubernetes_state.resourcequota.count_configmaps.limit (gauge)	Information about resource quota limits by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.resourcequota.count_configmaps.used (gauge)	Information about resource quota usage by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.resourcequota.count_secrets.limit (gauge)	Information about resource quota limits by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.resourcequota.count_secrets.used (gauge)	Information about resource quota usage by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.resourcequota.pods.limit (gauge)	Information about resource quota limits by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.resourcequota.pods.used (gauge)	Information about resource quota usage by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.resourcequota.requests.cpu.limit (gauge)	Information about resource quota limits by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.resourcequota.requests.cpu.used (gauge)	Information about resource quota usage by resource. Tags:`kube_namespace` `resourcequota`.
kubernetes_state.secret.count (gauge)	Number of Secrets. Requires Secrets to be added to Cluster Agent collector. Tags: `kube_namespace`.
kubernetes_state.secret.type (gauge)	Type about secret. Tags:`kube_namespace` `secret` `type`.
kubernetes_state.service.count (gauge)	Number of services. Tags:`kube_namespace` `type`.
kubernetes_state.service.type (gauge)	Service types. Tags:`kube_namespace` `kube_service` `type`.
kubernetes_state.statefulset.count (gauge)	Number of StatefulSets Tags:`kube_namespace`.
kubernetes_state.statefulset.replicas (gauge)	The number of replicas per StatefulSet. Tags:`kube_namespace` `kube_stateful_set` (`env` `service` `version` from standard labels).
kubernetes_state.statefulset.replicas_current (gauge)	The number of current replicas per StatefulSet. Tags:`kube_namespace` `kube_stateful_set` (`env` `service` `version` from standard labels).
kubernetes_state.statefulset.replicas_desired (gauge)	Number of desired pods for a StatefulSet. Tags:`kube_namespace` `kube_stateful_set` (`env` `service` `version` from standard labels).
kubernetes_state.statefulset.replicas_ready (gauge)	The number of ready replicas per StatefulSet. Tags:`kube_namespace` `kube_stateful_set` (`env` `service` `version` from standard labels).
kubernetes_state.statefulset.replicas_updated (gauge)	The number of updated replicas per StatefulSet. Tags:`kube_namespace` `kube_stateful_set` (`env` `service` `version` from standard labels).
kubernetes_state.vpa.count (gauge)	Number of vertical pod autoscaler. Tags: `kube_namespace`.
kubernetes_state.vpa.lower_bound (gauge)	Minimum resources the container can use before the VerticalPodAutoscaler updater evicts it. Tags:`kube_namespace` `verticalpodautoscaler` `kube_container_name` `resource` `target_api_version` `target_kind` `target_name` `unit`.
kubernetes_state.vpa.spec_container_maxallowed (gauge)	Maximum resources the VerticalPodAutoscaler can set for containers matching the name. Tags:`kube_namespace` `verticalpodautoscaler` `kube_container_name` `resource` `target_api_version` `target_kind` `target_name` `unit`.
kubernetes_state.vpa.spec_container_minallowed (gauge)	Minimum resources the VerticalPodAutoscaler can set for containers matching the name. Tags:`kube_namespace` `verticalpodautoscaler` `kube_container_name` `resource` `target_api_version` `target_kind` `target_name` `unit`.
kubernetes_state.vpa.target (gauge)	Target resources the VerticalPodAutoscaler recommends for the container. Tags:`kube_namespace` `verticalpodautoscaler` `kube_container_name` `resource` `target_api_version` `target_kind` `target_name` `unit`.
kubernetes_state.vpa.uncapped_target (gauge)	Target resources the VerticalPodAutoscaler recommends for the container ignoring bounds. Tags:`kube_namespace` `verticalpodautoscaler` `kube_container_name` `resource` `target_api_version` `target_kind` `target_name` `unit`.
kubernetes_state.vpa.update_mode (gauge)	Update mode of the VerticalPodAutoscaler. Tags:`kube_namespace` `verticalpodautoscaler` `target_api_version` `target_kind` `target_name` `update_mode`.
kubernetes_state.vpa.upperbound (gauge)	Maximum resources the container can use before the VerticalPodAutoscaler updater evicts it. Tags:`kube_namespace` `verticalpodautoscaler` `kube_container_name` `resource` `target_api_version` `target_kind` `target_name` `unit`.

Note: You can configure Datadog Standard labels on your Kubernetes objects to get the env service version tags.

Events

The Kubernetes State Metrics Core check does not include any events.

Default labels as tags

Default recommended Kubernetes and Helm labels

Recommended Label	Tag
`app.kubernetes.io/name`	`kube_app_name`
`app.kubernetes.io/instance`	`kube_app_instance`
`app.kubernetes.io/version`	`kube_app_version`
`app.kubernetes.io/component`	`kube_app_component`
`app.kubernetes.io/part-of`	`kube_app_part_of`
`app.kubernetes.io/managed-by`	`kube_app_managed_by`
`helm.sh/chart`	`helm_chart`

Default recommended Kubernetes node labels

Recommended Label	Tag
`topology.kubernetes.io/region`	`kube_region`
`topology.kubernetes.io/zone`	`kube_zone`
`failure-domain.beta.kubernetes.io/region`	`kube_region`
`failure-domain.beta.kubernetes.io/zone`	`kube_zone`

Datadog labels (Unified Service Tagging)

Datadog Label	Tag
`tags.datadoghq.com/env`	`env`
`tags.datadoghq.com/service`	`service`
`tags.datadoghq.com/version`	`version`

Service Checks

kubernetes_state.cronjob.complete: Whether the last job of the cronjob is failed or not. Tags:kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.cronjob.on_schedule_check: Alert if the cronjob’s next schedule is in the past. Tags:kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.job.complete: Whether the job is failed or not. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.node.ready: Whether the node is ready. Tags:node condition status.
kubernetes_state.node.out_of_disk: Whether the node is out of disk. Tags:node condition status.
kubernetes_state.node.disk_pressure: Whether the node is under disk pressure. Tags:node condition status.
kubernetes_state.node.network_unavailable: Whether the node network is unavailable. Tags:node condition status.
kubernetes_state.node.memory_pressure: Whether the node network is under memory pressure. Tags:node condition status.

Validation

Run the Cluster Agent’s status subcommand inside your Cluster Agent container and look for kubernetes_state_core under the Checks section.

Troubleshooting

Timeout errors

By default, the Kubernetes State Metrics Core check waits 10 seconds for a response from the Kubernetes API server. For large clusters, the request may time out, resulting in missing metrics.

You can avoid this by setting the environment variable DD_KUBERNETES_APISERVER_CLIENT_TIMEOUT to a higher value than the default 10 seconds.

Update your datadog-agent.yaml with the following configuration:

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
spec:
  override:
    clusterAgent:
      env:
        - name: DD_KUBERNETES_APISERVER_CLIENT_TIMEOUT
          value: <value_greater_than_10>

Then apply the new configuration:

kubectl apply -n $DD_NAMESPACE -f datadog-agent.yaml

Update your datadog-values.yaml with the following configuration:

clusterAgent:
  env:
    - name: DD_KUBERNETES_APISERVER_CLIENT_TIMEOUT
      value: <value_greater_than_10>

Then upgrade your Helm chart:

helm upgrade -f datadog-values.yaml <RELEASE_NAME> datadog/datadog

Need help? Contact Datadog support.