Kubernetes API server metrics

Supported OS Linux Windows Mac OS

Integration version4.3.1

Kubernetes API Server dashboard

Overview

This check monitors Kube_apiserver_metrics.

Setup

Installation

The Kube_apiserver_metrics check is included in the Datadog Agent package, so you do not need to install anything else on your server.

Configuration

The main use case to run the kube_apiserver_metrics check is as a Cluster Level Check. See the documentation for Cluster Level Checks. You can annotate the service of your apiserver with the following:

annotations:
  ad.datadoghq.com/endpoints.check_names: '["kube_apiserver_metrics"]'
  ad.datadoghq.com/endpoints.init_configs: '[{}]'
  ad.datadoghq.com/endpoints.instances:
    '[{ "prometheus_url": "https://%%host%%:%%port%%/metrics", "bearer_token_auth": "true" }]'

Then the Datadog Cluster Agent schedules the check(s) for each endpoint onto Datadog Agent(s).

You can also run the check by configuring the endpoints directly in the kube_apiserver_metrics.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory. You must add cluster_check: true to your configuration file when using a static configuration file or ConfigMap to configure cluster checks. See the sample kube_apiserver_metrics.d/conf.yaml for all available configuration options.

By default the Agent running the check tries to get the service account bearer token to authenticate against the APIServer. If you are not using RBACs, set bearer_token_auth to false.

Finally, if you run the Datadog Agent on the master nodes, you can rely on Autodiscovery to schedule the check. It is automatic if you are running the official image registry.k8s.io/kube-apiserver.

Validation

Run the Agent’s status subcommand and look for kube_apiserver_metrics under the Checks section.

Data Collected

Metrics

kube_apiserver.APIServiceRegistrationController_depth
(gauge)
The current depth of workqueue: APIServiceRegistrationController
kube_apiserver.admission_controller_admission_duration_seconds.count
(count)
The admission controller latency histogram in seconds identified by name and broken out for each operation and API resource and type (validate or admit) count
kube_apiserver.admission_controller_admission_duration_seconds.sum
(gauge)
The admission controller latency histogram in seconds identified by name and broken out for each operation and API resource and type (validate or admit)
Shown as second
kube_apiserver.admission_step_admission_latencies_seconds.count
(count)
The admission sub-step latency histogram broken out for each operation and API resource and step type (validate or admit) count
kube_apiserver.admission_step_admission_latencies_seconds.sum
(gauge)
The admission sub-step latency broken out for each operation and API resource and step type (validate or admit)
Shown as second
kube_apiserver.admission_step_admission_latencies_seconds_summary.count
(count)
The admission sub-step latency summary broken out for each operation and API resource and step type (validate or admit) count
kube_apiserver.admission_step_admission_latencies_seconds_summary.quantile
(gauge)
The admission sub-step latency summary broken out for each operation and API resource and step type (validate or admit) quantile
Shown as second
kube_apiserver.admission_step_admission_latencies_seconds_summary.sum
(gauge)
The admission sub-step latency summary broken out for each operation and API resource and step type (validate or admit)
Shown as second
kube_apiserver.admission_webhook_admission_latencies_seconds.count
(count)
The admission webhook latency identified by name and broken out for each operation and API resource and type (validate or admit) count
kube_apiserver.admission_webhook_admission_latencies_seconds.sum
(gauge)
The admission webhook latency identified by name and broken out for each operation and API resource and type (validate or admit)
Shown as second
kube_apiserver.aggregator_unavailable_apiservice
(gauge)
Gauge of APIServices which are marked as unavailable broken down by APIService name (alpha; Kubernetes 1.14+)
kube_apiserver.apiserver_admission_webhook_fail_open_count
(gauge)
Admission webhook fail open count, identified by name and broken out for each admission type (validating or mutating).
kube_apiserver.apiserver_admission_webhook_fail_open_count.count
(count)
Admission webhook fail open count, identified by name and broken out for each admission type (validating or mutating).
kube_apiserver.apiserver_dropped_requests_total
(gauge)
The accumulated number of requests dropped with 'Try again later' response
Shown as request
kube_apiserver.apiserver_dropped_requests_total.count
(count)
The monotonic count of requests dropped with 'Try again later' response
Shown as request
kube_apiserver.apiserver_request_count
(gauge)
The accumulated number of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (deprecated in Kubernetes 1.15)
Shown as request
kube_apiserver.apiserver_request_count.count
(count)
The monotonic count of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (deprecated in Kubernetes 1.15)
Shown as request
kube_apiserver.apiserver_request_terminations_total.count
(count)
The number of requests the apiserver terminated in self-defense (Kubernetes 1.17+)
Shown as request
kube_apiserver.apiserver_request_total
(gauge)
The accumulated number of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (Kubernetes 1.15+; replaces apiserverrequestcount)
Shown as request
kube_apiserver.apiserver_request_total.count
(count)
The monotonic count of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (Kubernetes 1.15+; replaces apiserverrequestcount.count)
Shown as request
kube_apiserver.audit_event
(gauge)
The accumulated number audit events generated and sent to the audit backend
Shown as event
kube_apiserver.audit_event.count
(count)
The monotonic count of audit events generated and sent to the audit backend
Shown as event
kube_apiserver.authenticated_user_requests
(gauge)
The accumulated number of authenticated requests broken out by username
Shown as request
kube_apiserver.authenticated_user_requests.count
(count)
The monotonic count of authenticated requests broken out by username
Shown as request
kube_apiserver.authentication_attempts.count
(count)
The counter of authenticated attempts (Kubernetes 1.16+)
Shown as request
kube_apiserver.authentication_duration_seconds.count
(count)
The authentication duration histogram broken out by result (Kubernetes 1.17+)
kube_apiserver.authentication_duration_seconds.sum
(gauge)
The authentication duration histogram broken out by result (Kubernetes 1.17+)
Shown as second
kube_apiserver.current_inflight_requests
(gauge)
The maximal number of currently used inflight request limit of this apiserver per request kind in last second.
kube_apiserver.envelope_encryption_dek_cache_fill_percent
(gauge)
Percent of the cache slots currently occupied by cached DEKs.
kube_apiserver.etcd.db.total_size
(gauge)
The total size of the etcd database file physically allocated in bytes (alpha; Kubernetes 1.19+)
Shown as byte
kube_apiserver.etcd_object_counts
(gauge)
The number of stored objects at the time of last check split by kind (alpha; deprecated in Kubernetes 1.22)
Shown as object
kube_apiserver.etcd_request_duration_seconds.count
(count)
Etcd request latencies count for each operation and object type (alpha)
kube_apiserver.etcd_request_duration_seconds.sum
(gauge)
Etcd request latencies for each operation and object type (alpha)
Shown as second
kube_apiserver.etcd_request_errors_total
(count)
Etcd failed request counts for each operation and object type
Shown as request
kube_apiserver.etcd_requests_total
(count)
Etcd request counts for each operation and object type
Shown as request
kube_apiserver.flowcontrol_current_executing_requests
(gauge)
Number of requests in initial (for a WATCH) or any (for a non-WATCH) execution stage in the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_current_inqueue_requests
(count)
Number of requests currently pending in queues of the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_dispatched_requests_total
(count)
Number of requests executed by API Priority and Fairness subsystem
kube_apiserver.flowcontrol_rejected_requests_total.count
(count)
Number of requests rejected by API Priority and Fairness subsystem
kube_apiserver.flowcontrol_request_concurrency_limit
(gauge)
Shared concurrency limit in the API Priority and Fairness subsystem
kube_apiserver.go_goroutines
(gauge)
The number of goroutines that currently exist
kube_apiserver.go_threads
(gauge)
The number of OS threads created
Shown as thread
kube_apiserver.grpc_client_handled_total
(count)
The total number of RPCs completed by the client regardless of success or failure
Shown as request
kube_apiserver.grpc_client_msg_received_total
(count)
The total number of gRPC stream messages received by the client
Shown as message
kube_apiserver.grpc_client_msg_sent_total
(count)
The total number of gRPC stream messages sent by the client
Shown as message
kube_apiserver.grpc_client_started_total
(count)
The total number of RPCs started on the client
Shown as request
kube_apiserver.http_requests_total
(gauge)
The accumulated number of HTTP requests made
Shown as request
kube_apiserver.http_requests_total.count
(count)
The monotonic count of the number of HTTP requests made
Shown as request
kube_apiserver.kubernetes_feature_enabled
(gauge)
Whether a Kubernetes feature gate is enabled or not, identified by name and stage (alpha; Kubernetes 1.26+)
kube_apiserver.longrunning_gauge
(gauge)
The gauge of all active long-running apiserver requests broken out by verb, group, version, resource, scope, and component. Not all requests are tracked this way.
Shown as request
kube_apiserver.process_resident_memory_bytes
(gauge)
The resident memory size in bytes
Shown as byte
kube_apiserver.process_virtual_memory_bytes
(gauge)
The virtual memory size in bytes
Shown as byte
kube_apiserver.registered_watchers
(gauge)
The number of currently registered watchers for a given resource
Shown as object
kube_apiserver.request_duration_seconds.count
(count)
The response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope, and component count
kube_apiserver.request_duration_seconds.sum
(gauge)
The response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope, and component
Shown as second
kube_apiserver.request_latencies.count
(count)
The response latency distribution in microseconds for each verb, resource, and subresource count
kube_apiserver.request_latencies.sum
(gauge)
The response latency distribution in microseconds for each verb, resource and subresource
Shown as microsecond
kube_apiserver.requested_deprecated_apis
(gauge)
Gauge of deprecated APIs that have been requested, broken out by API group, version, resource, subresource, and removed_release
Shown as request
kube_apiserver.rest_client_request_latency_seconds.count
(count)
The request latency in seconds broken down by verb and URL count
kube_apiserver.rest_client_request_latency_seconds.sum
(gauge)
The request latency in seconds broken down by verb and URL
Shown as second
kube_apiserver.rest_client_requests_total
(gauge)
The accumulated number of HTTP requests partitioned by status code method and host
Shown as request
kube_apiserver.rest_client_requests_total.count
(count)
The monotonic count of HTTP requests partitioned by status code method and host
Shown as request
kube_apiserver.slis.kubernetes_healthcheck
(gauge)
Result of a single kubernetes apiserver healthcheck (alpha; requires k8s v1.26+)
kube_apiserver.slis.kubernetes_healthcheck_total
(count)
The monotonic count of all kubernetes apiserver healthchecks (alpha; requires k8s v1.26+)
kube_apiserver.storage_list_evaluated_objects_total
(gauge)
The number of objects tested in the course of serving a LIST request from storage (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.storage_list_fetched_objects_total
(gauge)
The number of objects read from storage in the course of serving a LIST request (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.storage_list_returned_objects_total
(gauge)
The number of objects returned for a LIST request from storage (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.storage_list_total
(gauge)
The number of LIST requests served from storage (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.storage_objects
(gauge)
The number of stored objects at the time of last check split by kind (Kubernetes 1.21+; replaces etcdobjectcounts)
Shown as object
kube_apiserver.watch_events_sizes.count
(count)
The watch event size distribution (Kubernetes 1.16+)
kube_apiserver.watch_events_sizes.sum
(gauge)
The watch event size distribution (Kubernetes 1.16+)
Shown as byte

Service Checks

Kube_apiserver_metrics does not include any service checks.

Events

Kube_apiserver_metrics does not include any events.

Troubleshooting

Need help? Contact Datadog support.