Supported OS Linux Mac OS Windows

概要

Kubernetes サービスからメトリクスをリアルタイムに取得すると、以下のことが可能になります。

  • Kubernetes の状態を視覚化および監視できます。
  • Kubernetes のフェイルオーバーとイベントの通知を受けることができます。

Kubernetes State Metrics Core チェックは kube-state-metrics バージョン 2+ を活用し、レガシーの kubernetes_state チェックと比較してパフォーマンスとタグ付けが大幅に改善されています。

レガシーチェックとは対照的に、Kubernetes State Metrics Core チェックでは、クラスターに kube-state-metrics をデプロイする必要がなくなりました。

Kubernetes State Metrics Core は、より詳細なメトリクスとタグを提供するため、従来の kubernetes_state チェックに代わる優れたオプションです。詳しくは Major Changes および Data Collected を参照してください。

セットアップ

インストール

Kubernetes State Metrics Core チェックは Datadog Cluster Agent イメージに含まれているため、Kubernetes サーバーに追加でインストールする必要はありません。

要件

  • Datadog Cluster Agent v1.12+

構成

Helm values.yaml で、以下を追加します。

datadog:
  # (...)
  kubeStateMetricsCore:
    enabled: true

kubernetes_state_core のチェックを有効にするには、DatadogAgent リソースの設定 spec.features.kubeStateMetricsCore.enabledtrue に設定する必要があります。

kind: DatadogAgent
apiVersion: datadoghq.com/v2alpha1
metadata:
  name: datadog
spec:
  global:
    credentials:
      apiKey: <DATADOG_API_KEY>
  features:
    kubeStateMetricsCore:
      enabled: true

: Datadog Operator v0.7.0 以上が必要です。

kubernetes_state から kubernetes_state_core への移行

タグの削除

元の kubernetes_state のチェックでは、いくつかのタグが非推奨とフラグが立てられ、新しいタグに置き換えられています。移行経路を決定するために、どのタグがメトリクスで送信されるかを確認します。

kubernetes_state_core のチェックでは、非推奨のタグのみが提出されます。kubernetes_state から kubernetes_state_core に移行する前に、モニターやダッシュボードで公式タグのみが使用されているか確認します。

以下は、非推奨タグとそれに代わる公式タグの対応表です。

非推奨タグ公式タグ
cluster_namekube_cluster_name
コンテナkube_container_name
cronjobkube_cronjob
daemonsetkube_daemon_set
deploymentkube_deployment
hpahorizontalpodautoscaler
imageimage_name
jobkube_job
job_namekube_job
namespacekube_namespace
phasepod_phase
podpod_name
replicasetkube_replica_set
replicationcontrollerkube_replication_controller
statefulsetkube_stateful_set

後方の非互換性の変更

Kubernetes State Metrics Core チェックには後方互換性がありません。レガシーの kubernetes_state チェックから移行する前に、変更点を注意深くお読みください。

kubernetes_state.node.by_condition
ノード名の粒度を持つ新しいメトリクスです。従来のメトリクス kubernetes_state.nodes.by_condition はこのメトリクスに置き換えられ、非推奨となります。: このメトリクスは従来のチェックにもバックポートされており、両方のメトリクス (これと置き換えられる従来のメトリクス) が利用可能です。
kubernetes_state.persistentvolume.by_phase
永続ボリューム名の粒度を備えた新しいメトリクス。kubernetes_state.persistentvolumes.by_phase を置き換えます。
kubernetes_state.pod.status_phase
メトリクスは、pod_name のようにポッドレベルのタグでタグ付けされます。
kubernetes_state.node.count
このメトリクスには、もう host というタグは付いていません。このメトリクスは、ノード数を kernel_version os_image container_runtime_version kubelet_version によって集計します。
kubernetes_state.container.waitingkubernetes_state.container.status_report.count.waiting
待機中の Pod が存在しない場合、これらのメトリクスは 0 の値を発行しなくなりました。非ゼロの値のみを報告します。
kube_job
kubernetes_state では、JobCronJob をオーナーとしていた場合は kube_job タグの値が CronJob 名となり、それ以外の場合は Job 名となります。kubernetes_state_core では、kube_job タグの値は常に Job 名となり、新たに kube_cronjob タグキーが追加されて CronJob 名をタグ値として持つようになります。kubernetes_state_core に移行する場合、クエリフィルターには新しいタグか kube_job:foo* (fooCronJob 名) を使用することが推奨されます。
kubernetes_state.job.succeeded
従来の kubernetes_state では kuberenetes.job.succeededcount タイプでしたが、kubernetes_state_core では gauge タイプです。

ノードレベルのタグ付け

クラスター中心のメトリクスには、ホストやノードレベルのタグは表示されなくなりました。kubernetes_state.node.by_conditionkubernetes_state.container.restarts のように実際のクラスター内のノードに関連するメトリクスだけが、それぞれのホストやノード レベルのタグを引き続き継承します。

タグをグローバルに追加するには、DD_TAGS 環境変数を使用するか、対応する Helm または Operator の設定を使用してください。インスタンス専用のタグは、カスタムの kubernetes_state_core.yaml を Cluster Agent にマウントして指定できます。

datadog:
  kubeStateMetricsCore:
    enabled: true
  tags: 
    - "<TAG_KEY>:<TAG_VALUE>"
kind: DatadogAgent
apiVersion: datadoghq.com/v2alpha1
metadata:
  name: datadog
spec:
  global:
    credentials:
      apiKey: <DATADOG_API_KEY>
    tags:
      - "<TAG_KEY>:<TAG_VALUE>"
  features:
    kubeStateMetricsCore:
      enabled: true

kubernetes_state.container.memory_limit.totalkubernetes_state.node.count のようなメトリクスは、クラスター内のグループの合計数なので、ホストやノードレベルのタグは付与されません。

従来のチェック

Helm の values.yamlkubeStateMetricsCore を有効にすると、従来の kubernetes_state チェック用の自動設定ファイルを Agent が無視するように構成されます。これは両方のチェックを同時に実行しないようにするためです。

それでも移行フェーズで両方のチェックを同時に有効にする場合は、values.yamlignoreLegacyKSMCheck フィールドを無効にします。

: ignoreLegacyKSMCheck を無効にすると、Agent は従来の kubernetes_state チェック用の自動設定を無視しなくなります。カスタムの kubernetes_state 設定ファイルは手動で削除が必要です。

Kubernetes State Metrics Core チェックでは、クラスターに kube-state-metrics をデプロイする必要がなくなりました。Datadog Helm Chart の一部として kube-state-metrics のデプロイを無効にできます。これを行うには、Helm の values.yaml に以下を追加します。

datadog:
  # (...)
  kubeStateMetricsEnabled: false

重要な注意: Kubernetes State Metrics Core チェックは、レガシーの kubernetes_state チェックに代わるものです。Datadog は、一貫したメトリクスを保証するために、両方のチェックを同時に有効にしないことをお勧めします。

収集データ

メトリクス

kubernetes_state.apiservice.condition
(gauge)
The current condition of this apiservice. Tags:kube_namespace apiservice condition status.
kubernetes_state.apiservice.count
(gauge)
The current count of apiservices.
kubernetes_state.configmap.count
(gauge)
Number of ConfigMaps. Requires ConfigMaps to be added to Cluster Agent collector. Tags: kube_namespace.
kubernetes_state.container.cpu_limit
(gauge)
The value of CPU limit by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
Shown as cpu
kubernetes_state.container.cpu_limit.total
(gauge)
The total value of CPU limits by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
Shown as cpu
kubernetes_state.container.cpu_requested
(gauge)
The value of CPU requested by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
Shown as cpu
kubernetes_state.container.cpu_requested.total
(gauge)
The total value of CPU requested by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
Shown as cpu
kubernetes_state.container.gpu_limit
(gauge)
The value of GPU limit by a container. Tags:kube_namespace pod_name kube_container_name node resource mig_profile unit (env service version from standard labels).
kubernetes_state.container.gpu_limit.total
(gauge)
The total value of GPU limits by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
kubernetes_state.container.gpu_requested
(gauge)
The value of GPU requested by a container. Tags:kube_namespace pod_name kube_container_name node resource mig_profile unit (env service version from standard labels).
kubernetes_state.container.gpu_requested.total
(gauge)
The total value of GPU requested by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
kubernetes_state.container.memory_limit
(gauge)
The value of memory limit by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
Shown as byte
kubernetes_state.container.memory_limit.total
(gauge)
The total value of memory limits by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
Shown as byte
kubernetes_state.container.memory_requested
(gauge)
The value of memory requested by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
Shown as byte
kubernetes_state.container.memory_requested.total
(gauge)
The total value of memory requested by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind>.
Shown as byte
kubernetes_state.container.network_bandwidth_limit
(gauge)
The value of network bandwidth limit for a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
kubernetes_state.container.network_bandwidth_requested
(gauge)
The value of network bandwidth requested by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).
kubernetes_state.container.ready
(gauge)
Describes whether the containers readiness check succeeded. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.container.restarts
(gauge)
The number of container restarts per container. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.container.running
(gauge)
Describes whether the container is currently in running state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.container.status_report.count.terminated
(gauge)
Describes the reason the container is currently in terminated state. Tags:kube_namespace pod_name kube_container_name reason (env service version from standard labels).
kubernetes_state.container.status_report.count.waiting
(gauge)
Describes the reason the container is currently in waiting state. Tags:kube_namespace pod_name kube_container_name reason (env service version from standard labels).
kubernetes_state.container.terminated
(gauge)
Describes whether the container is currently in terminated state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.container.waiting
(gauge)
Describes whether the container is currently in waiting state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.crd.condition
(gauge)
The current condition of this custom resource definition. Tags: customresourcedefinition condition status.
kubernetes_state.crd.count
(gauge)
Number of custom resource definitions.
kubernetes_state.cronjob.count
(gauge)
Number of cronjobs. Tags:kube_namespace.
kubernetes_state.cronjob.duration_since_last_schedule
(gauge)
The duration since the last time the cronjob was scheduled. Tags:kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.cronjob.spec_suspend
(gauge)
Suspend flag tells the controller to suspend subsequent executions. Tags:kube_namespace kube_cronjob (env service version from standard labels).
kubernetes_state.daemonset.count
(gauge)
Number of DaemonSets. Tags:kube_namespace.
kubernetes_state.daemonset.daemons_available
(gauge)
The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and available. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.daemons_unavailable
(gauge)
The number of nodes that should be running the daemon pod and have none of the daemon pod running and available. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.desired
(gauge)
The number of nodes that should be running the daemon pod. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.misscheduled
(gauge)
The number of nodes running a daemon pod but are not supposed to. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.ready
(gauge)
The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.scheduled
(gauge)
The number of nodes running at least one daemon pod and are supposed to. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.daemonset.updated
(gauge)
The total number of nodes that are running updated daemon pod. Tags:kube_daemon_set kube_namespace (env service version from standard labels).
kubernetes_state.deployment.condition
(gauge)
The current status conditions of a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.count
(gauge)
Number of deployments. Tags:kube_namespace.
kubernetes_state.deployment.paused
(gauge)
Whether the deployment is paused and will not be processed by the deployment controller. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas
(gauge)
The number of replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas_available
(gauge)
The number of available replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas_desired
(gauge)
Number of desired pods for a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas_ready
(gauge)
The number of ready replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas_unavailable
(gauge)
The number of unavailable replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.replicas_updated
(gauge)
The number of updated replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.rollingupdate.max_surge
(gauge)
Maximum number of replicas that can be scheduled above the desired number of replicas during a rolling update of a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.deployment.rollingupdate.max_unavailable
(gauge)
Maximum number of unavailable replicas during a rolling update of a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels).
kubernetes_state.endpoint.address_available
(gauge)
Number of addresses available in endpoint. Tags:endpoint kube_namespace.
kubernetes_state.endpoint.address_not_ready
(gauge)
Number of addresses not ready in endpoint. Tags:endpoint kube_namespace.
kubernetes_state.endpoint.count
(gauge)
Number of endpoints. Tags:kube_namespace.
kubernetes_state.hpa.condition
(gauge)
The condition of this autoscaler. Tags:kube_namespace horizontalpodautoscaler condition status.
kubernetes_state.hpa.count
(gauge)
Number of horizontal pod autoscaler. Tags: kube_namespace.
kubernetes_state.hpa.current_replicas
(gauge)
Current number of replicas of pods managed by this autoscaler. Tags:kube_namespace horizontalpodautoscaler.
kubernetes_state.hpa.desired_replicas
(gauge)
Desired number of replicas of pods managed by this autoscaler. Tags:kube_namespace horizontalpodautoscaler.
kubernetes_state.hpa.max_replicas
(gauge)
Upper limit for the number of pods that can be set by the autoscaler; cannot be smaller than MinReplicas. Tags:kube_namespace horizontalpodautoscaler.
kubernetes_state.hpa.min_replicas
(gauge)
Lower limit for the number of pods that can be set by the autoscaler default 1. Tags:kube_namespace horizontalpodautoscaler.
kubernetes_state.hpa.spec_target_metric
(gauge)
The metric specifications used by this autoscaler when calculating the desired replica count. Tags:kube_namespace horizontalpodautoscaler metric_name metric_target_type.
kubernetes_state.hpa.status_target_metric
(gauge)
The current metric status used by this autoscaler when calculating the desired replica count. Tags:kube_namespace horizontalpodautoscaler metric_name metric_target_type.
kubernetes_state.ingress.count
(gauge)
Number of ingresses. Tags:kube_namespace.
kubernetes_state.ingress.path
(gauge)
Information about the ingress path. Tags:kube_namespace kube_ingress_path kube_ingress kube_service kube_service_port kube_ingress_host .
kubernetes_state.initcontainer.restarts
(gauge)
Describes whether the number of restarts for the init container. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.initcontainer.waiting
(gauge)
Describes whether the init container is currently in waiting state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels).
kubernetes_state.job.completion.failed
(gauge)
The job has failed its execution. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.job.completion.succeeded
(gauge)
The job has completed its execution. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.job.count
(gauge)
Number of jobs. Tags:kube_namespace kube_cronjob.
kubernetes_state.job.duration
(gauge)
Time elapsed between the start and completion time of the job or the current time if the job is still running. Tags:kube_job kube_namespace (env service version from standard labels).
kubernetes_state.job.failed
(gauge)
The number of pods which reached Phase Failed. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.job.succeeded
(gauge)
The number of pods which reached Phase Succeeded. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels).
kubernetes_state.limitrange.cpu.default
(gauge)
Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as cpu
kubernetes_state.limitrange.cpu.default_request
(gauge)
Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as cpu
kubernetes_state.limitrange.cpu.max
(gauge)
Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as cpu
kubernetes_state.limitrange.cpu.max_limit_request_ratio
(gauge)
Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as cpu
kubernetes_state.limitrange.cpu.min
(gauge)
Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as cpu
kubernetes_state.limitrange.memory.default
(gauge)
Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as byte
kubernetes_state.limitrange.memory.default_request
(gauge)
Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as byte
kubernetes_state.limitrange.memory.max
(gauge)
Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as byte
kubernetes_state.limitrange.memory.max_limit_request_ratio
(gauge)
Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as byte
kubernetes_state.limitrange.memory.min
(gauge)
Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type.
Shown as byte
kubernetes_state.namespace.count
(gauge)
Number of namespaces. Tags:phase.
kubernetes_state.node.age
(gauge)
The time in seconds since the creation of the node. Tags:node.
Shown as second
kubernetes_state.node.by_condition
(gauge)
The condition of a cluster node. Tags:condition node status.
kubernetes_state.node.count
(gauge)
Number of nodes. Tags:kernel_version os_image container_runtime_version kubelet_version.
kubernetes_state.node.cpu_allocatable
(gauge)
The allocatable CPU of a node that is available for scheduling. Tags:node resource unit.
Shown as cpu
kubernetes_state.node.cpu_allocatable.total
(gauge)
The total allocatable CPU of all nodes in the cluster that is available for scheduling.
Shown as cpu
kubernetes_state.node.cpu_capacity
(gauge)
The CPU capacity of a node. Tags:node resource unit.
Shown as cpu
kubernetes_state.node.cpu_capacity.total
(gauge)
The total CPU capacity of all nodes in the cluster.
Shown as cpu
kubernetes_state.node.ephemeral_storage_allocatable
(gauge)
The allocatable ephemeral-storage of a node that is available for scheduling. Tags:node resource unit.
kubernetes_state.node.ephemeral_storage_capacity
(gauge)
The ephemeral-storage capacity of a node. Tags:node resource unit.
kubernetes_state.node.gpu_allocatable
(gauge)
The allocatable GPU of a node that is available for scheduling. Tags:node resource mig_profile unit.
kubernetes_state.node.gpu_allocatable.total
(gauge)
The total allocatable GPU of all nodes in the cluster that is available for scheduling.
kubernetes_state.node.gpu_capacity
(gauge)
The GPU capacity of a node. Tags:node resource mig_profile unit.
kubernetes_state.node.gpu_capacity.total
(gauge)
The total GPU capacity of all nodes in the cluster.
kubernetes_state.node.memory_allocatable
(gauge)
The allocatable memory of a node that is available for scheduling. Tags:node resource unit.
Shown as byte
kubernetes_state.node.memory_allocatable.total
(gauge)
The total allocatable memory of all nodes in the cluster that is available for scheduling.
Shown as byte
kubernetes_state.node.memory_capacity
(gauge)
The memory capacity of a node. Tags:node resource unit.
Shown as byte
kubernetes_state.node.memory_capacity.total
(gauge)
The total memory capacity of all nodes in the cluster.
Shown as byte
kubernetes_state.node.network_bandwidth_allocatable
(gauge)
The allocatable network bandwidth of a node that is available for scheduling. Tags:node resource unit.
kubernetes_state.node.network_bandwidth_capacity
(gauge)
The network bandwidth capacity of a node. Tags:node resource unit.
kubernetes_state.node.pods_allocatable
(gauge)
The allocatable memory of a node that is available for scheduling. Tags:node resource unit.
kubernetes_state.node.pods_capacity
(gauge)
The pods capacity of a node. Tags:node resource unit.
kubernetes_state.node.status
(gauge)
Whether the node can schedule new pods. Tags:node status.
kubernetes_state.pdb.disruptions_allowed
(gauge)
Number of pod disruptions that are currently allowed. Tags:kube_namespace poddisruptionbudget.
kubernetes_state.pdb.pods_desired
(gauge)
Minimum desired number of healthy pods. Tags:kube_namespace poddisruptionbudget.
kubernetes_state.pdb.pods_healthy
(gauge)
Current number of healthy pods. Tags:kube_namespace poddisruptionbudget.
kubernetes_state.pdb.pods_total
(gauge)
Total number of pods counted by this disruption budget. Tags:kube_namespace poddisruptionbudget.
kubernetes_state.persistentvolume.by_phase
(gauge)
The phase indicates if a volume is available bound to a claim or released by a claim. Tags:persistentvolume storageclass phase.
kubernetes_state.persistentvolume.capacity
(gauge)
Persistentvolume capacity in bytes. Tags:persistentvolume storageclass.
kubernetes_state.persistentvolumeclaim.access_mode
(gauge)
The access mode(s) specified by the persistent volume claim. Tags:kube_namespace persistentvolumeclaim access_mode storageclass.
kubernetes_state.persistentvolumeclaim.request_storage
(gauge)
The capacity of storage requested by the persistent volume claim. Tags:kube_namespace persistentvolumeclaim storageclass.
kubernetes_state.persistentvolumeclaim.status
(gauge)
The phase the persistent volume claim is currently in. Tags:kube_namespace persistentvolumeclaim phase storageclass.
kubernetes_state.pod.age
(gauge)
The time in seconds since the creation of the pod. Tags:node kube_namespace pod_name pod_phase (env service version from standard labels).
Shown as second
kubernetes_state.pod.count
(gauge)
Number of Pods. Tags:node kube_namespace kube_<owner kind>.
kubernetes_state.pod.ready
(gauge)
Describes whether the pod is ready to serve requests. Tags:node kube_namespace pod_name condition (env service version from standard labels).
kubernetes_state.pod.scheduled
(gauge)
Describes the status of the scheduling process for the pod. Tags:node kube_namespace pod_name condition (env service version from standard labels).
kubernetes_state.pod.status_phase
(gauge)
The pods current phase. Tags:node kube_namespace pod_name pod_phase (env service version from standard labels).
kubernetes_state.pod.tolerations
(gauge)
Information about the pod tolerations
kubernetes_state.pod.unschedulable
(gauge)
Describes the unschedulable status for the pod. Tags:kube_namespace pod_name (env service version from standard labels).
kubernetes_state.pod.uptime
(gauge)
The time in seconds since the pod has been scheduled and acknowledged by the Kubelet. Tags:node kube_namespace pod_name pod_phase (env service version from standard labels).
kubernetes_state.pod.volumes.persistentvolumeclaims_readonly
(gauge)
Describes whether a persistentvolumeclaim is mounted read only. Tags:node kube_namespace pod_name volume persistentvolumeclaim (env service version from standard labels).
kubernetes_state.replicaset.count
(gauge)
Number of ReplicaSets Tags:kube_namespace kube_deployment.
kubernetes_state.replicaset.fully_labeled_replicas
(gauge)
The number of fully labeled replicas per ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels).
kubernetes_state.replicaset.replicas
(gauge)
The number of replicas per ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels).
kubernetes_state.replicaset.replicas_desired
(gauge)
Number of desired pods for a ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels).
kubernetes_state.replicaset.replicas_ready
(gauge)
The number of ready replicas per ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels).
kubernetes_state.replicationcontroller.fully_labeled_replicas
(gauge)
The number of fully labeled replicas per ReplicationController. Tags:kube_namespace kube_replication_controller.
kubernetes_state.replicationcontroller.replicas
(gauge)
The number of replicas per ReplicationController. Tags:kube_namespace kube_replication_controller.
kubernetes_state.replicationcontroller.replicas_available
(gauge)
The number of available replicas per ReplicationController. Tags:kube_namespace kube_replication_controller.
kubernetes_state.replicationcontroller.replicas_desired
(gauge)
Number of desired pods for a ReplicationController. Tags:kube_namespace kube_replication_controller.
kubernetes_state.replicationcontroller.replicas_ready
(gauge)
The number of ready replicas per ReplicationController. Tags:kube_namespace kube_replication_controller.
kubernetes_state.resourcequota.count_configmaps.limit
(gauge)
Information about resource quota limits by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.count_configmaps.used
(gauge)
Information about resource quota usage by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.count_secrets.limit
(gauge)
Information about resource quota limits by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.count_secrets.used
(gauge)
Information about resource quota usage by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.pods.limit
(gauge)
Information about resource quota limits by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.pods.used
(gauge)
Information about resource quota usage by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.requests.cpu.limit
(gauge)
Information about resource quota limits by resource. Tags:kube_namespace resourcequota.
kubernetes_state.resourcequota.requests.cpu.used
(gauge)
Information about resource quota usage by resource. Tags:kube_namespace resourcequota.
kubernetes_state.secret.count
(gauge)
Number of Secrets. Requires Secrets to be added to Cluster Agent collector. Tags: kube_namespace.
kubernetes_state.secret.type
(gauge)
Type about secret. Tags:kube_namespace secret type.
kubernetes_state.service.count
(gauge)
Number of services. Tags:kube_namespace type.
kubernetes_state.service.type
(gauge)
Service types. Tags:kube_namespace kube_service type.
kubernetes_state.statefulset.count
(gauge)
Number of StatefulSets Tags:kube_namespace.
kubernetes_state.statefulset.replicas
(gauge)
The number of replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels).
kubernetes_state.statefulset.replicas_current
(gauge)
The number of current replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels).
kubernetes_state.statefulset.replicas_desired
(gauge)
Number of desired pods for a StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels).
kubernetes_state.statefulset.replicas_ready
(gauge)
The number of ready replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels).
kubernetes_state.statefulset.replicas_updated
(gauge)
The number of updated replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels).
kubernetes_state.vpa.count
(gauge)
Number of vertical pod autoscaler. Tags: kube_namespace.
kubernetes_state.vpa.lower_bound
(gauge)
Minimum resources the container can use before the VerticalPodAutoscaler updater evicts it. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.
kubernetes_state.vpa.spec_container_maxallowed
(gauge)
Maximum resources the VerticalPodAutoscaler can set for containers matching the name. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.
kubernetes_state.vpa.spec_container_minallowed
(gauge)
Minimum resources the VerticalPodAutoscaler can set for containers matching the name. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.
kubernetes_state.vpa.target
(gauge)
Target resources the VerticalPodAutoscaler recommends for the container. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.
kubernetes_state.vpa.uncapped_target
(gauge)
Target resources the VerticalPodAutoscaler recommends for the container ignoring bounds. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.
kubernetes_state.vpa.update_mode
(gauge)
Update mode of the VerticalPodAutoscaler. Tags:kube_namespace verticalpodautoscaler target_api_version target_kind target_name update_mode.
kubernetes_state.vpa.upperbound
(gauge)
Maximum resources the container can use before the VerticalPodAutoscaler updater evicts it. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit.

: Kubernetes オブジェクトに Datadog Standard labels を設定すると、envserviceversion タグを取得できます。

イベント

Kubernetes State Metrics Core チェックには、イベントは含まれません。

デフォルトのラベルをタグとして使用

デフォルト推奨の Kubernetes および Helm ラベル

推奨ラベルタグ
app.kubernetes.io/namekube_app_name
app.kubernetes.io/instancekube_app_instance
app.kubernetes.io/versionkube_app_version
app.kubernetes.io/componentkube_app_component
app.kubernetes.io/part-ofkube_app_part_of
app.kubernetes.io/managed-bykube_app_managed_by
helm.sh/charthelm_chart

デフォルト推奨の Kubernetes ノード ラベル

推奨ラベルタグ
topology.kubernetes.io/regionkube_region
topology.kubernetes.io/zonekube_zone
failure-domain.beta.kubernetes.io/regionkube_region
failure-domain.beta.kubernetes.io/zonekube_zone

Datadog ラベル (統合サービスタグ付け)

Datadog ラベルタグ
tags.datadoghq.com/envenv
tags.datadoghq.com/serviceservice
tags.datadoghq.com/versionversion

サービスチェック

kubernetes_state.cronjob.complete
cronjob の最後のジョブが失敗したかどうか。タグ:kube_cronjob kube_namespace (標準ラベルの env service version)。
kubernetes_state.cronjob.on_schedule_check
cronjob の次のスケジュールが過去である場合に警告します。タグ: kube_cronjob kube_namespace (標準ラベルの env service version)。
kubernetes_state.job.complete
ジョブが失敗したかどうか。タグ: kube_job または kube_cronjob kube_namespace (標準ラベルの env service version)。
kubernetes_state.node.ready
ノードの準備ができているかどうか。タグ: node condition status
kubernetes_state.node.out_of_disk
ノードの準備ができているかどうか。タグ: node condition status
kubernetes_state.node.disk_pressure
ノードにディスクプレッシャーがかかっているかどうか。タグ: node condition status
kubernetes_state.node.network_unavailable
ノードネットワークが利用できないかどうか。タグ: node condition status
kubernetes_state.node.memory_pressure
ノードネットワークにメモリプレッシャーがかかっているかどうか。タグ: node condition status

検証

Cluster Agent コンテナ内で status サブコマンドを実行し、Checks セクションの下に kubernetes_state_core があるか確認してください。

トラブルシューティング

タイムアウトエラー

デフォルトでは、Kubernetes State Metrics Core チェックは、Kubernetes API サーバーからの応答を 10 秒間待ちます。大規模なクラスターでは、リクエストがタイムアウトし、メトリクスが欠落する可能性があります。

環境変数 DD_KUBERNETES_APISERVER_CLIENT_TIMEOUT をデフォルトの 10 秒よりも大きな値に設定することで、これを避けることができます。

datadog-agent.yaml を以下の設定で更新してください。

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
spec:
  override:
    clusterAgent:
      env:
        - name: DD_KUBERNETES_APISERVER_CLIENT_TIMEOUT
          value: <value_greater_than_10>

次に、新しいコンフィギュレーションを適用します。

kubectl apply -n $DD_NAMESPACE -f datadog-agent.yaml

以下の構成で datadog-values.yaml を更新します。

clusterAgent:
  env:
    - name: DD_KUBERNETES_APISERVER_CLIENT_TIMEOUT
      value: <value_greater_than_10>

次に、Helm チャートをアップグレードします。

helm upgrade -f datadog-values.yaml <RELEASE_NAME> datadog/datadog

ご不明な点は、Datadog のサポートチームまでお問い合わせください。

その他の参考資料