Kubernetes State Core

Supported OS Linux Mac OS Windows

To find out if this integration is available in your organization, see your Datadog Integrations page or ask your organization administrator.

To initiate an exception request to enable this integration for your organization, email support@ddog-gov.com.

概要

Kubernetes サービスからメトリクスをリアルタイムに取得すると、以下のことが可能になります。

  • Kubernetes の状態を視覚化および監視できます。
  • Kubernetes のフェイルオーバーとイベントの通知を受けることができます。

Kubernetes State Metrics Core チェックは kube-state-metrics バージョン 2+ を活用し、レガシーの kubernetes_state チェックと比較してパフォーマンスとタグ付けが大幅に改善されています。

レガシーチェックとは対照的に、Kubernetes State Metrics Core チェックでは、クラスターに kube-state-metrics をデプロイする必要がなくなりました。

Kubernetes State Metrics Core は、より詳細なメトリクスとタグを提供するため、従来の kubernetes_state チェックに代わる優れたオプションです。詳しくは Major Changes および Data Collected を参照してください。

セットアップ

インストール

Kubernetes State Metrics Core チェックは Datadog Cluster Agent イメージに含まれているため、Kubernetes サーバーに追加でインストールする必要はありません。

要件

  • Datadog Cluster Agent v1.12+

構成

Helm values.yaml で、以下を追加します。

datadog:
  # (...)
  kubeStateMetricsCore:
    enabled: true

kubernetes_state_core のチェックを有効にするには、DatadogAgent リソースの設定 spec.features.kubeStateMetricsCore.enabledtrue に設定する必要があります。

kind: DatadogAgent
apiVersion: datadoghq.com/v2alpha1
metadata:
  name: datadog
spec:
  global:
    credentials:
      apiKey: <DATADOG_API_KEY>
  features:
    kubeStateMetricsCore:
      enabled: true

: Datadog Operator v0.7.0 以上が必要です。

kubernetes_state から kubernetes_state_core への移行

タグの削除

元の kubernetes_state のチェックでは、いくつかのタグが非推奨とフラグが立てられ、新しいタグに置き換えられています。移行経路を決定するために、どのタグがメトリクスで送信されるかを確認します。

kubernetes_state_core のチェックでは、非推奨のタグのみが提出されます。kubernetes_state から kubernetes_state_core に移行する前に、モニターやダッシュボードで公式タグのみが使用されているか確認します。

以下は、非推奨タグとそれに代わる公式タグの対応表です。

非推奨タグ公式タグ
cluster_namekube_cluster_name
コンテナkube_container_name
cronjobkube_cronjob
daemonsetkube_daemon_set
deploymentkube_deployment
hpahorizontalpodautoscaler
imageimage_name
jobkube_job
job_namekube_job
namespacekube_namespace
phasepod_phase
podpod_name
replicasetkube_replica_set
replicationcontrollerkube_replication_controller
statefulsetkube_stateful_set

後方の非互換性の変更

Kubernetes State Metrics Core チェックには後方互換性がありません。レガシーの kubernetes_state チェックから移行する前に、変更点を注意深くお読みください。

kubernetes_state.node.by_condition
ノード名の粒度を持つ新しいメトリクスです。従来のメトリクス kubernetes_state.nodes.by_condition はこのメトリクスに置き換えられ、非推奨となります。: このメトリクスは従来のチェックにもバックポートされており、両方のメトリクス (これと置き換えられる従来のメトリクス) が利用可能です。
kubernetes_state.persistentvolume.by_phase
永続ボリューム名の粒度を備えた新しいメトリクス。kubernetes_state.persistentvolumes.by_phase を置き換えます。
kubernetes_state.pod.status_phase
メトリクスは、pod_name のようにポッドレベルのタグでタグ付けされます。
kubernetes_state.node.count
このメトリクスには、もう host というタグは付いていません。このメトリクスは、ノード数を kernel_version os_image container_runtime_version kubelet_version によって集計します。
kubernetes_state.container.waitingkubernetes_state.container.status_report.count.waiting
待機中の Pod が存在しない場合、これらのメトリクスは 0 の値を発行しなくなりました。非ゼロの値のみを報告します。
kube_job
kubernetes_state では、JobCronJob をオーナーとしていた場合は kube_job タグの値が CronJob 名となり、それ以外の場合は Job 名となります。kubernetes_state_core では、kube_job タグの値は常に Job 名となり、新たに kube_cronjob タグキーが追加されて CronJob 名をタグ値として持つようになります。kubernetes_state_core に移行する場合、クエリフィルターには新しいタグか kube_job:foo* (fooCronJob 名) を使用することが推奨されます。
kubernetes_state.job.succeeded
従来の kubernetes_state では kuberenetes.job.succeededcount タイプでしたが、kubernetes_state_core では gauge タイプです。

ノードレベルのタグ付け

クラスター中心のメトリクスには、ホストやノードレベルのタグは表示されなくなりました。kubernetes_state.node.by_conditionkubernetes_state.container.restarts のように実際のクラスター内のノードに関連するメトリクスだけが、それぞれのホストやノード レベルのタグを引き続き継承します。

タグをグローバルに追加するには、DD_TAGS 環境変数を使用するか、対応する Helm または Operator の設定を使用してください。インスタンス専用のタグは、カスタムの kubernetes_state_core.yaml を Cluster Agent にマウントして指定できます。

datadog:
  kubeStateMetricsCore:
    enabled: true
  tags: 
    - "<TAG_KEY>:<TAG_VALUE>"
kind: DatadogAgent
apiVersion: datadoghq.com/v2alpha1
metadata:
  name: datadog
spec:
  global:
    credentials:
      apiKey: <DATADOG_API_KEY>
    tags:
      - "<TAG_KEY>:<TAG_VALUE>"
  features:
    kubeStateMetricsCore:
      enabled: true

kubernetes_state.container.memory_limit.totalkubernetes_state.node.count のようなメトリクスは、クラスター内のグループの合計数なので、ホストやノードレベルのタグは付与されません。

従来のチェック

Helm の values.yamlkubeStateMetricsCore を有効にすると、従来の kubernetes_state チェック用の自動設定ファイルを Agent が無視するように構成されます。これは両方のチェックを同時に実行しないようにするためです。

それでも移行フェーズで両方のチェックを同時に有効にする場合は、values.yamlignoreLegacyKSMCheck フィールドを無効にします。

: ignoreLegacyKSMCheck を無効にすると、Agent は従来の kubernetes_state チェック用の自動設定を無視しなくなります。カスタムの kubernetes_state 設定ファイルは手動で削除が必要です。

Kubernetes State Metrics Core チェックでは、クラスターに kube-state-metrics をデプロイする必要がなくなりました。Datadog Helm Chart の一部として kube-state-metrics のデプロイを無効にできます。これを行うには、Helm の values.yaml に以下を追加します。

datadog:
  # (...)
  kubeStateMetricsEnabled: false

重要な注意: Kubernetes State Metrics Core チェックは、レガシーの kubernetes_state チェックに代わるものです。Datadog は、一貫したメトリクスを保証するために、両方のチェックを同時に有効にしないことをお勧めします。

収集データ

メトリクス

kubernetes_state.apiservice.condition
(gauge)
The current condition of this apiservice. Tags:kube_namespace apiservice condition status.
kubernetes_state.apiservice.count
(gauge)
The current count of apiservices.
kubernetes_state.configmap.count
(gauge)
Number of ConfigMaps. Requires ConfigMaps to be added to Cluster Agent collector. Tags: kube_namespace.
kubernetes_state.container.cpu_limit
(gauge)
The value of CPU limit by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
Shown as cpu
kubernetes_state.container.cpu_limit.total
(gauge)
The total value of CPU limits by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
Shown as cpu
kubernetes_state.container.cpu_requested
(gauge)
The value of CPU requested by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
Shown as cpu
kubernetes_state.container.cpu_requested.total
(gauge)
The total value of CPU requested by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
Shown as cpu
kubernetes_state.container.gpu_limit
(gauge)
The value of GPU limit by a container. Tags:kube_namespace pod_name kube_container_name node resource mig_profile unit (env service version from standard labels).
kubernetes_state.container.gpu_limit.total
(gauge)
The total value of GPU limits by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
kubernetes_state.container.gpu_requested
(gauge)
The value of GPU requested by a container. Tags:kube_namespace pod_name kube_container_name node resource mig_profile unit (env service version from standard labels).
kubernetes_state.container.gpu_requested.total
(gauge)
The total value of GPU requested by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
kubernetes_state.container.memory_limit
(gauge)
The value of memory limit by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
Shown as byte
kubernetes_state.container.memory_limit.total
(gauge)
The total value of memory limits by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
Shown as byte
kubernetes_state.container.memory_requested
(gauge)
The value of memory requested by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
Shown as byte
kubernetes_state.container.memory_requested.total
(gauge)
The total value of memory requested by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
Shown as byte
kubernetes_state.container.network_bandwidth_limit
(gauge)
The value of network bandwidth limit for a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
kubernetes_state.container.network_bandwidth_requested
(gauge)
The value of network bandwidth requested by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
kubernetes_state.container.ready
(gauge)
Describes whether the containers readiness check succeeded. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.container.restarts
(gauge)
The number of container restarts per container. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.container.running
(gauge)
Describes whether the container is currently in running state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.container.status_report.count.terminated
(gauge)
Describes the reason the container is currently in terminated state. Tags:kube_namespace pod_name kube_container_name reason (env service version from standard labels).
kubernetes_state.container.status_report.count.waiting
(gauge)
Describes the reason the container is currently in waiting state. Tags:kube_namespace pod_name kube_container_name reason (env service version from standard labels).
kubernetes_state.container.terminated
(gauge)
Describes whether the container is currently in terminated state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.container.waiting
(gauge)
Describes whether the container is currently in waiting state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.crd.condition
(gauge)
The current condition of this custom resource definition. Tags: customresourcedefinition condition status.
kubernetes_state.crd.count
(gauge)
Number of custom resource definitions.
kubernetes_state.cronjob.count
(gauge)
Number of cronjobs. Tags:kube_namespace.
kubernetes_state.cronjob.duration_since_last_schedule
(gauge)
The duration since the last time the cronjob was scheduled. Tags:kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.cronjob.duration_since_last_successful
(gauge)
The duration since the last time the cronjob was successfully scheduled. Tags:kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.cronjob.spec_suspend
(gauge)
Suspend flag tells the controller to suspend subsequent executions. Tags:kube_namespace kube_cronjob (env service version from standard labels).
kubernetes_state.daemonset.count
(gauge)
Number of DaemonSets. Tags:kube_namespace.
kubernetes_state.daemonset.daemons_available
(gauge)
The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and available. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.daemons_unavailable
(gauge)
The number of nodes that should be running the daemon pod and have none of the daemon pod running and available. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.desired
(gauge)
The number of nodes that should be running the daemon pod. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.misscheduled
(gauge)
The number of nodes running a daemon pod but are not supposed to. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.ready
(gauge)
The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.scheduled
(gauge)
The number of nodes running at least one daemon pod and are supposed to. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.updated
(gauge)
The total number of nodes that are running updated daemon pod. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.deployment.condition
(gauge)
The current status conditions of a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.count
(gauge)
Number of deployments. Tags:kube_namespace.
kubernetes_state.deployment.paused
(gauge)
Whether the deployment is paused and will not be processed by the deployment controller. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas
(gauge)
The number of replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas_available
(gauge)
The number of available replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas_desired
(gauge)
Number of desired pods for a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas_ready
(gauge)
The number of ready replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas_unavailable
(gauge)
The number of unavailable replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas_updated
(gauge)
The number of updated replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.rollingupdate.max_surge
(gauge)
Maximum number of replicas that can be scheduled above the desired number of replicas during a rolling update of a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.rollingupdate.max_unavailable
(gauge)
Maximum number of unavailable replicas during a rolling update of a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.rollout_duration
(gauge)
Number of seconds since deployment rollout started. Tags:kube_deployment kube_namespace (env service version from standard labels).
Shown as second
kubernetes_state.endpoint.address_available
(gauge)
Number of addresses available in endpoint. Tags:endpoint kube_namespace.
kubernetes_state.endpoint.address_not_ready
(gauge)
Number of addresses not ready in endpoint. Tags:endpoint kube_namespace.
kubernetes_state.endpoint.count
(gauge)
Number of endpoints. Tags:kube_namespace.
kubernetes_state.hpa.condition
(gauge)
The condition of this autoscaler. Tags:kube_namespace horizontalpodautoscaler condition status.
kubernetes_state.hpa.count
(gauge)
Number of horizontal pod autoscalers. Tags: kube_namespace.
kubernetes_state.hpa.current_replicas
(gauge)
Current number of replicas of pods managed by this autoscaler. Tags:kube_namespace horizontalpodautoscaler.
kubernetes_state.hpa.desired_replicas
(gauge)
Desired number of replicas of pods managed by this autoscaler. Tags:kube_namespace horizontalpodautoscaler.
kubernetes_state.hpa.max_replicas
(gauge)
Upper limit for the number of pods that can be set by the autoscaler; cannot be smaller than MinReplicas. Tags:kube_namespace horizontalpodautoscaler.
kubernetes_state.hpa.min_replicas
(gauge)
Lower limit for the number of pods that can be set by the autoscaler default 1. Tags:kube_namespace horizontalpodautoscaler.
kubernetes_state.hpa.spec_target_metric
(gauge)
The metric specifications used by this autoscaler when calculating the desired replica count. Tags:kube_namespace horizontalpodautoscaler metric_name metric_target_type.
kubernetes_state.hpa.status_target_metric
(gauge)
The current metric status used by this autoscaler when calculating the desired replica count. Tags:kube_namespace horizontalpodautoscaler metric_name metric_target_type.
kubernetes_state.ingress.count
(gauge)
Number of ingresses. Tags:kube_namespace.
kubernetes_state.ingress.path
(gauge)
Information about the ingress path. Tags:kube_namespace kube_ingress_path kube_ingress kube_service kube_service_port kube_ingress_host .
kubernetes_state.initcontainer.cpu_limit
(gauge)
Maximum number of cpus a container can request. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
Shown as cpu
kubernetes_state.initcontainer.cpu_requested
(gauge)
Number of cpus requested by the container. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
Shown as cpu
kubernetes_state.initcontainer.memory_limit
(gauge)
Maximum number of byte a container can request. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
Shown as byte
kubernetes_state.initcontainer.memory_requested
(gauge)
Number of bytes memory requested by the container. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
Shown as byte
kubernetes_state.initcontainer.ready
(gauge)
Indicates when the container is ready. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.initcontainer.restarts
(gauge)
Describes whether the number of restarts for the init container. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.initcontainer.running
(gauge)
Indicates when the container is running. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.initcontainer.status_report.count.terminated
(gauge)
Number of containers in a terminated state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.initcontainer.status_report.count.waiting
(gauge)
Number of containers in a waiting state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.initcontainer.waiting
(gauge)
Describes whether the init container is currently in waiting state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.job.completion.failed
(gauge)
The job has failed its execution. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.job.completion.succeeded
(gauge)
The job has completed its execution. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.job.count
(gauge)
Number of jobs. Tags:kube_namespace kube_cronjob.
kubernetes_state.job.duration
(gauge)
Time elapsed between the start and completion time of the job or the current time if the job is still running. Tags:kube_job kube_namespace (env service version from standard labels).
kubernetes_state.job.failed
(gauge)
The number of pods which reached Phase Failed. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.job.succeeded
(gauge)
The number of pods which reached Phase Succeeded. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.limitrange.cpu.default
(gauge)
Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as cpu
kubernetes_state.limitrange.cpu.default_request
(gauge)
Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as cpu
kubernetes_state.limitrange.cpu.max
(gauge)
Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as cpu
kubernetes_state.limitrange.cpu.max_limit_request_ratio
(gauge)
Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as cpu
kubernetes_state.limitrange.cpu.min
(gauge)
Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as cpu
kubernetes_state.limitrange.memory.default
(gauge)
Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as byte
kubernetes_state.limitrange.memory.default_request
(gauge)
Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as byte
kubernetes_state.limitrange.memory.max
(gauge)
Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as byte
kubernetes_state.limitrange.memory.max_limit_request_ratio
(gauge)
Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as byte
kubernetes_state.limitrange.memory.min
(gauge)
Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as byte
kubernetes_state.namespace.count
(gauge)
Number of namespaces. Tags:phase.
kubernetes_state.node.age
(gauge)
The time in seconds since the creation of the node. Tags:node.
Shown as second
kubernetes_state.node.by_condition
(gauge)
The condition of a cluster node. Tags:condition node status.
kubernetes_state.node.count
(gauge)
Number of nodes. Tags:kernel_version os_image container_runtime_version kubelet_version.
kubernetes_state.node.cpu_allocatable
(gauge)
The allocatable CPU of a node that is available for scheduling. Tags:node resource unit.
Shown as cpu
kubernetes_state.node.cpu_allocatable.total
(gauge)
The total allocatable CPU of all nodes in the cluster that is available for scheduling.
Shown as cpu
kubernetes_state.node.cpu_capacity
(gauge)
The CPU capacity of a node. Tags:node resource unit.
Shown as cpu
kubernetes_state.node.cpu_capacity.total
(gauge)
The total CPU capacity of all nodes in the cluster.
Shown as cpu
kubernetes_state.node.ephemeral_storage_allocatable
(gauge)
The allocatable ephemeral-storage of a node that is available for scheduling. Tags:node resource unit.
kubernetes_state.node.ephemeral_storage_capacity
(gauge)
The ephemeral-storage capacity of a node. Tags:node resource unit.
kubernetes_state.node.gpu_allocatable
(gauge)
The allocatable GPU of a node that is available for scheduling. Tags:node resource mig_profile unit.
kubernetes_state.node.gpu_allocatable.total
(gauge)
The total allocatable GPU of all nodes in the cluster that is available for scheduling.
kubernetes_state.node.gpu_capacity
(gauge)
The GPU capacity of a node. Tags:node resource mig_profile unit.
kubernetes_state.node.gpu_capacity.total
(gauge)
The total GPU capacity of all nodes in the cluster.
kubernetes_state.node.memory_allocatable
(gauge)
The allocatable memory of a node that is available for scheduling. Tags:node resource unit.
Shown as byte
kubernetes_state.node.memory_allocatable.total
(gauge)
The total allocatable memory of all nodes in the cluster that is available for scheduling.
Shown as byte
kubernetes_state.node.memory_capacity
(gauge)
The memory capacity of a node. Tags:node resource unit.
Shown as byte
kubernetes_state.node.memory_capacity.total
(gauge)
The total memory capacity of all nodes in the cluster.
Shown as byte
kubernetes_state.node.network_bandwidth_allocatable
(gauge)
The allocatable network bandwidth of a node that is available for scheduling. Tags:node resource unit.
kubernetes_state.node.network_bandwidth_capacity
(gauge)
The network bandwidth capacity of a node. Tags:node resource unit.
kubernetes_state.node.pods_allocatable
(gauge)
The allocatable memory of a node that is available for scheduling. Tags:node resource unit.
kubernetes_state.node.pods_capacity
(gauge)
The pods capacity of a node. Tags:node resource unit.
kubernetes_state.node.status
(gauge)
Whether the node can schedule new pods. Tags:node status.
kubernetes_state.pdb.disruptions_allowed
(gauge)
Number of pod disruptions that are currently allowed. Tags:kube_namespace poddisruptionbudget.
kubernetes_state.pdb.pods_desired
(gauge)
Minimum desired number of healthy pods. Tags:kube_namespace poddisruptionbudget.
kubernetes_state.pdb.pods_healthy
(gauge)
Current number of healthy pods. Tags:kube_namespace poddisruptionbudget.
kubernetes_state.pdb.pods_total
(gauge)
Total number of pods counted by this disruption budget. Tags:kube_namespace poddisruptionbudget.
kubernetes_state.persistentvolume.by_phase
(gauge)
The phase indicates if a volume is available bound to a claim or released by a claim. Tags:persistentvolume storageclass phase.
kubernetes_state.persistentvolume.capacity
(gauge)
Persistentvolume capacity in bytes. Tags:persistentvolume storageclass.
kubernetes_state.persistentvolumeclaim.access_mode
(gauge)
The access mode(s) specified by the persistent volume claim. Tags:kube_namespace persistentvolumeclaim access_mode storageclass.
kubernetes_state.persistentvolumeclaim.request_storage
(gauge)
The capacity of storage requested by the persistent volume claim. Tags:kube_namespace persistentvolumeclaim storageclass.
kubernetes_state.persistentvolumeclaim.status
(gauge)
The phase the persistent volume claim is currently in. Tags:kube_namespace persistentvolumeclaim phase storageclass.
kubernetes_state.pod.age
(gauge)
The time in seconds since the creation of the pod. Tags:node kube_namespace pod_name pod_phase (env service version from standard labels).
Shown as second
kubernetes_state.pod.count
(gauge)
Number of Pods. Tags:node kube_namespace kube_<owner kind>.
kubernetes_state.pod.ready
(gauge)
Describes whether the pod is ready to serve requests. Tags:node kube_namespace pod_name condition (env service version from standard labels).
kubernetes_state.pod.scheduled
(gauge)
Describes the status of the scheduling process for the pod. Tags:node kube_namespace pod_name condition (env service version from standard labels).
kubernetes_state.pod.status_phase
(gauge)
The pods current phase. Tags:node kube_namespace pod_name pod_phase (env service version from standard labels).
kubernetes_state.pod.tolerations
(gauge)
Information about the pod tolerations
kubernetes_state.pod.unschedulable
(gauge)
Describes the unschedulable status for the pod. Tags:kube_namespace pod_name (env service version from standard labels).
kubernetes_state.pod.uptime
(gauge)
The time in seconds since the pod has been scheduled and acknowledged by the Kubelet. Tags:node kube_namespace pod_name pod_phase (env service version from standard labels).
kubernetes_state.pod.volumes.persistentvolumeclaims_readonly
(gauge)
Describes whether a persistentvolumeclaim is mounted read only. Tags:node kube_namespace pod_name volume persistentvolumeclaim (env service version from standard labels).
kubernetes_state.replicaset.count
(gauge)
Number of ReplicaSets Tags:kube_namespace kube_deployment.
kubernetes_state.replicaset.fully_labeled_replicas
(gauge)
The number of fully labeled replicas per ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels).
kubernetes_state.replicaset.replicas
(gauge)
The number of replicas per ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels).
kubernetes_state.replicaset.replicas_desired
(gauge)
Number of desired pods for a ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels).
kubernetes_state.replicaset.replicas_ready
(gauge)
The number of ready replicas per ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels).
kubernetes_state.replicationcontroller.fully_labeled_replicas
(gauge)
The number of fully labeled replicas per ReplicationController. Tags:kube_namespace kube_replication_controller.
kubernetes_state.replicationcontroller.replicas
(gauge)
The number of replicas per ReplicationController. Tags:kube_namespace kube_replication_controller.
kubernetes_state.replicationcontroller.replicas_available
(gauge)
The number of available replicas per ReplicationController. Tags:kube_namespace kube_replication_controller.
kubernetes_state.replicationcontroller.replicas_desired
(gauge)
Number of desired pods for a ReplicationController. Tags:kube_namespace kube_replication_controller.
kubernetes_state.replicationcontroller.replicas_ready
(gauge)
The number of ready replicas per ReplicationController. Tags:kube_namespace kube_replication_controller.
kubernetes_state.resourcequota.count_configmaps.limit
(gauge)
Information about resource quota limits by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.count_configmaps.used
(gauge)
Information about resource quota usage by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.count_secrets.limit
(gauge)
Information about resource quota limits by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.count_secrets.used
(gauge)
Information about resource quota usage by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.pods.limit
(gauge)
Information about resource quota limits by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.pods.used
(gauge)
Information about resource quota usage by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.requests.cpu.limit
(gauge)
Information about resource quota limits by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.requests.cpu.used
(gauge)
Information about resource quota usage by resource. Tags:kube_namespace resourcequota.
kubernetes_state.secret.count
(gauge)
Number of Secrets. Requires Secrets to be added to Cluster Agent collector. Tags: kube_namespace.
kubernetes_state.secret.type
(gauge)
Type about secret. Tags:kube_namespace secret type.
kubernetes_state.service.count
(gauge)
Number of services. Tags:kube_namespace type.
kubernetes_state.service.type
(gauge)
Service types. Tags:kube_namespace kube_service type.
kubernetes_state.statefulset.count
(gauge)
Number of StatefulSets Tags:kube_namespace.
kubernetes_state.statefulset.replicas
(gauge)
The number of replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels).
kubernetes_state.statefulset.replicas_current
(gauge)
The number of current replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels).
kubernetes_state.statefulset.replicas_desired
(gauge)
Number of desired pods for a StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels).
kubernetes_state.statefulset.replicas_ready
(gauge)
The number of ready replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels).
kubernetes_state.statefulset.replicas_updated
(gauge)
The number of updated replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels).
kubernetes_state.vpa.count
(gauge)
Number of vertical pod autoscalers. Tags: kube_namespace.
kubernetes_state.vpa.lower_bound
(gauge)
Minimum resources the container can use before the VerticalPodAutoscaler updater evicts it. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.
kubernetes_state.vpa.spec_container_maxallowed
(gauge)
Maximum resources the VerticalPodAutoscaler can set for containers matching the name. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.
kubernetes_state.vpa.spec_container_minallowed
(gauge)
Minimum resources the VerticalPodAutoscaler can set for containers matching the name. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.
kubernetes_state.vpa.target
(gauge)
Target resources the VerticalPodAutoscaler recommends for the container. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.
kubernetes_state.vpa.uncapped_target
(gauge)
Target resources the VerticalPodAutoscaler recommends for the container ignoring bounds. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.
kubernetes_state.vpa.update_mode
(gauge)
Update mode of the VerticalPodAutoscaler. Tags:kube_namespace verticalpodautoscaler target_api_version target_kind target_name update_mode.
kubernetes_state.vpa.upperbound
(gauge)
Maximum resources the container can use before the VerticalPodAutoscaler updater evicts it. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.

Note: You can configure Datadog Standard labels on your Kubernetes objects to get the env service version tags.

: Kubernetes オブジェクトに Datadog Standard labels を設定すると、envserviceversion タグを取得できます。

イベント

Kubernetes State Metrics Core チェックには、イベントは含まれません。

デフォルトのラベルをタグとして使用

デフォルト推奨の Kubernetes および Helm ラベル

推奨ラベルタグ
app.kubernetes.io/namekube_app_name
app.kubernetes.io/instancekube_app_instance
app.kubernetes.io/versionkube_app_version
app.kubernetes.io/componentkube_app_component
app.kubernetes.io/part-ofkube_app_part_of
app.kubernetes.io/managed-bykube_app_managed_by
helm.sh/charthelm_chart

デフォルト推奨の Kubernetes ノード ラベル

推奨ラベルタグ
topology.kubernetes.io/regionkube_region
topology.kubernetes.io/zonekube_zone
failure-domain.beta.kubernetes.io/regionkube_region
failure-domain.beta.kubernetes.io/zonekube_zone

Datadog ラベル (統合サービスタグ付け)

Datadog ラベルタグ
tags.datadoghq.com/envenv
tags.datadoghq.com/serviceservice
tags.datadoghq.com/versionversion

サービスチェック

kubernetes_state.cronjob.complete
cronjob の最後のジョブが失敗したかどうか。タグ:kube_cronjob kube_namespace (標準ラベルの env service version)。
kubernetes_state.cronjob.on_schedule_check
cronjob の次のスケジュールが過去である場合に警告します。タグ: kube_cronjob kube_namespace (標準ラベルの env service version)。
kubernetes_state.job.complete
ジョブが失敗したかどうか。タグ: kube_job または kube_cronjob kube_namespace (標準ラベルの env service version)。
kubernetes_state.node.ready
ノードの準備ができているかどうか。タグ: node condition status
kubernetes_state.node.out_of_disk
ノードの準備ができているかどうか。タグ: node condition status
kubernetes_state.node.disk_pressure
ノードにディスクプレッシャーがかかっているかどうか。タグ: node condition status
kubernetes_state.node.network_unavailable
ノードネットワークが利用できないかどうか。タグ: node condition status
kubernetes_state.node.memory_pressure
ノードネットワークにメモリプレッシャーがかかっているかどうか。タグ: node condition status

検証

Cluster Agent コンテナ内で status サブコマンドを実行し、Checks セクションの下に kubernetes_state_core があるか確認してください。

トラブルシューティング

タイムアウトエラー

デフォルトでは、Kubernetes State Metrics Core チェックは、Kubernetes API サーバーからの応答を 10 秒間待ちます。大規模なクラスターでは、リクエストがタイムアウトし、メトリクスが欠落する可能性があります。

環境変数 DD_KUBERNETES_APISERVER_CLIENT_TIMEOUT をデフォルトの 10 秒よりも大きな値に設定することで、これを避けることができます。

datadog-agent.yaml を以下の設定で更新してください。

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
spec:
  override:
    clusterAgent:
      env:
        - name: DD_KUBERNETES_APISERVER_CLIENT_TIMEOUT
          value: <value_greater_than_10>

次に、新しいコンフィギュレーションを適用します。

kubectl apply -n $DD_NAMESPACE -f datadog-agent.yaml

以下の構成で datadog-values.yaml を更新します。

clusterAgent:
  env:
    - name: DD_KUBERNETES_APISERVER_CLIENT_TIMEOUT
      value: <value_greater_than_10>

次に、Helm チャートをアップグレードします。

helm upgrade -f datadog-values.yaml <RELEASE_NAME> datadog/datadog

ご不明な点は、Datadog のサポートチームまでお問い合わせください。

その他の参考資料