Kubernetes Controller Manager

Supported OS

Integration version5.1.1

Dashboard del Kube Controller Manager

Información general

Este check monitoriza el Kubernetes Controller Manager, parte del plano de control de Kubernetes.

Nota: Este check no recopila datos de clústeres Amazon EKS, ya que esos servicios no están expuestos.

Configuración

Instalación

El check del Kubernetes Controller Manager está incluido en el paquete del Datadog Agent, por lo que no necesitas instalar nada más en tu servidor.

Configuración

  1. Edita el archivo kube_controller_manager.d/conf.yaml, que se encuentra en la carpeta conf.d/ en la raíz del directorio de configuración del Agent, para empezar a recopilar los datos de rendimiento de kube_controller_manager. Para ver todas las opciones de configuración disponibles, consulta el ejemplo kube_controller_manager.d/conf.yaml.

  2. Reinicia el Agent

Esta integración requiere acceso al endpoint de la métrica del Controller Manager. Para tener acceso al endpoint de la métrica debes:

  • tener acceso a la dirección IP/al puerto del proceso del controller-manager
  • tener permisos RBAC para el endpoint de las métricas (el Helm chart por defecto de Datadog ya añade los roles y los bindings RBAC correctos)

Validación

Ejecuta el subcomando status del Agent y busca kube_controller_manager en la sección Checks.

Datos recopilados

Métricas

kube_controller_manager.goroutines
(gauge)
Number of goroutines that currently exist
kube_controller_manager.job_controller.terminated_pods_tracking_finalizer
(count)
Used to monitor whether the job controller is removing Pod finalizers from terminated Pods after accounting them in Job status
kube_controller_manager.leader_election.lease_duration
(gauge)
Duration of the leadership lease
kube_controller_manager.leader_election.transitions
(count)
Number of leadership transitions observed
kube_controller_manager.max_fds
(gauge)
Maximum allowed open file descriptors
kube_controller_manager.nodes.count
(gauge)
Number of registered nodes, per zone
kube_controller_manager.nodes.evictions
(count)
Count of node eviction events, per zone
kube_controller_manager.nodes.unhealthy
(gauge)
Number of unhealthy nodes, per zone
kube_controller_manager.open_fds
(gauge)
Number of open file descriptors
kube_controller_manager.queue.adds
(count)
Elements added, by queue
kube_controller_manager.queue.depth
(gauge)
Current depth, by queue
kube_controller_manager.queue.latency.count
(gauge)
Processing latency count, by queue (deprecated in kubernetes v1.14)
kube_controller_manager.queue.latency.quantile
(gauge)
Processing latency quantiles, by queue (deprecated in kubernetes v1.14)
Shown as microsecond
kube_controller_manager.queue.latency.sum
(gauge)
Processing latency sum, by queue (deprecated in kubernetes v1.14)
Shown as microsecond
kube_controller_manager.queue.process_duration.count
(gauge)
How long processing an item from workqueue takes, by queue
kube_controller_manager.queue.process_duration.sum
(gauge)
Total workqueue processing time, by queue
Shown as second
kube_controller_manager.queue.queue_duration.count
(gauge)
How long item stays in a queue before being requested, by queue
kube_controller_manager.queue.queue_duration.sum
(gauge)
Total time of items stays in a queue before being requested, by queue
Shown as second
kube_controller_manager.queue.retries
(count)
Retries handled, by queue
kube_controller_manager.queue.work_duration.count
(gauge)
Work duration, by queue (deprecated in kubernetes v1.14)
kube_controller_manager.queue.work_duration.quantile
(gauge)
Work duration quantiles, by queue (deprecated in kubernetes v1.14)
Shown as microsecond
kube_controller_manager.queue.work_duration.sum
(gauge)
Work duration sum, by queue (deprecated in kubernetes v1.14)
Shown as microsecond
kube_controller_manager.queue.work_longest_duration
(gauge)
How many seconds has the longest running processor been running, by queue
Shown as second
kube_controller_manager.queue.work_unfinished_duration
(gauge)
How many seconds of work has done that is in progress and hasn't been observed by process_duration, by queue
Shown as second
kube_controller_manager.rate_limiter.use
(gauge)
Usage of the rate limiter, by limiter
kube_controller_manager.slis.kubernetes_healthcheck
(gauge)
Result of a single controller manager healthcheck (alpha; requires k8s v1.26+)
kube_controller_manager.slis.kubernetes_healthcheck_total
(count)
Cumulative results of all controller manager healthchecks (alpha; requires k8s v1.26+)
kube_controller_manager.threads
(gauge)
Number of OS threads created

Eventos

El check del Kubernetes Controller Manager no incluye eventos.

Checks de servicio

kube_controller_manager.prometheus.health
Returns CRITICAL if the check cannot access the metrics endpoint.
Statuses: ok, critical

kube_controller_manager.leader_election.status
Returns CRITICAL if no replica is currently set as leader.
Statuses: ok, critical

kube_controller_manager.up
Returns CRITICAL if Kube Controller Manager is not healthy.
Statuses: ok, critical

Solucionar problemas

¿Necesitas ayuda? Ponte en contacto con el servicio de asistencia de Datadog.