Kubernetes API server metrics

Supported OS Linux Mac OS Windows

Integrationv3.2.0

Overview

This check monitors Kube_apiserver_metrics.

Setup

Installation

The Kube_apiserver_metrics check is included in the Datadog Agent package, so you do not need to install anything else on your server.

Note: A tile is not included on the Datadog site for this integration. Follow the configuration steps below to configure this integration.

Configuration

The main use case to run the kube_apiserver_metrics check is as a Cluster Level Check. See the documentation for Cluster Level Checks. You can annotate the service of your apiserver with the following:

annotations:
  ad.datadoghq.com/endpoints.check_names: '["kube_apiserver_metrics"]'
  ad.datadoghq.com/endpoints.init_configs: '[{}]'
  ad.datadoghq.com/endpoints.instances:
    '[{ "prometheus_url": "https://%%host%%:%%port%%/metrics", "bearer_token_auth": "true" }]'

Then the Datadog Cluster Agent schedules the check(s) for each endpoint onto Datadog Agent(s).

You can also run the check by configuring the endpoints directly in the kube_apiserver_metrics.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory. You must add cluster_check: true to your configuration file when using a static configuration file or ConfigMap to configure cluster checks. See the sample kube_apiserver_metrics.d/conf.yaml for all available configuration options.

By default the Agent running the check tries to get the service account bearer token to authenticate against the APIServer. If you are not using RBACs, set bearer_token_auth to false.

Finally, if you run the Datadog Agent on the master nodes, you can rely on Autodiscovery to schedule the check. It is automatic if you are running the official image k8s.gcr.io/kube-apiserver.

Validation

Run the Agent’s status subcommand and look for kube_apiserver_metrics under the Checks section.

Data Collected

Metrics

kube_apiserver.longrunning_gauge
(gauge)
The gauge of all active long-running apiserver requests broken out by verb API resource and scope. Not all requests are tracked this way.
Shown as request
kube_apiserver.current_inflight_requests
(gauge)
The maximal number of currently used inflight request limit of this apiserver per request kind in last second.
kube_apiserver.audit_event
(gauge)
The accumulated number audit events generated and sent to the audit backend
Shown as event
kube_apiserver.go_threads
(gauge)
The number of OS threads created
Shown as thread
kube_apiserver.go_goroutines
(gauge)
The number of goroutines that currently exist
kube_apiserver.APIServiceRegistrationController_depth
(gauge)
The current depth of workqueue: APIServiceRegistrationController
kube_apiserver.etcd_request_duration_seconds.sum
(gauge)
Etcd request latencies for each operation and object type (alpha)
Shown as second
kube_apiserver.etcd_request_duration_seconds.count
(count)
Etcd request latencies count for each operation and object type (alpha)
kube_apiserver.etcd_object_counts
(gauge)
The number of stored objects at the time of last check split by kind (alpha; deprecated in Kubernetes 1.22)
Shown as object
kube_apiserver.etcd.db.total_size
(gauge)
The total size of the etcd database file physically allocated in bytes (alpha; Kubernetes 1.19+)
Shown as byte
kube_apiserver.storage_objects
(gauge)
The number of stored objects at the time of last check split by kind (Kubernetes 1.21+; replaces etcd_object_counts)
Shown as object
kube_apiserver.storage_list_total
(gauge)
The number of LIST requests served from storage (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.storage_list_fetched_objects_total
(gauge)
The number of objects read from storage in the course of serving a LIST request (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.storage_list_evaluated_objects_total
(gauge)
The number of objects tested in the course of serving a LIST request from storage (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.storage_list_returned_objects_total
(gauge)
The number of objects returned for a LIST request from storage (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.rest_client_requests_total
(gauge)
The accumulated number of HTTP requests partitioned by status code method and host
Shown as request
kube_apiserver.apiserver_request_count
(gauge)
The accumulated number of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (deprecated in Kubernetes 1.15)
Shown as request
kube_apiserver.apiserver_dropped_requests_total
(gauge)
The accumulated number of requests dropped with 'Try again later' response
Shown as request
kube_apiserver.http_requests_total
(gauge)
The accumulated number of HTTP requests made
Shown as request
kube_apiserver.authenticated_user_requests
(gauge)
The accumulated number of authenticated requests broken out by username
Shown as request
kube_apiserver.audit_event.count
(count)
The monotonic count of audit events generated and sent to the audit backend
Shown as event
kube_apiserver.rest_client_requests_total.count
(count)
The monotonic count of HTTP requests partitioned by status code method and host
Shown as request
kube_apiserver.apiserver_request_count.count
(count)
The monotonic count of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (deprecated in Kubernetes 1.15)
Shown as request
kube_apiserver.apiserver_dropped_requests_total.count
(count)
The monotonic count of requests dropped with 'Try again later' response
Shown as request
kube_apiserver.http_requests_total.count
(count)
The monotonic count of the number of HTTP requests made
Shown as request
kube_apiserver.authenticated_user_requests.count
(count)
The monotonic count of authenticated requests broken out by username
Shown as request
kube_apiserver.apiserver_request_total
(gauge)
The accumulated number of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (Kubernetes 1.15+; replaces apiserver_request_count)
Shown as request
kube_apiserver.apiserver_request_total.count
(count)
The monotonic count of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (Kubernetes 1.15+; replaces apiserver_request_count.count)
Shown as request
kube_apiserver.rest_client_request_latency_seconds.sum
(gauge)
The request latency in seconds broken down by verb and URL
Shown as second
kube_apiserver.rest_client_request_latency_seconds.count
(count)
The request latency in seconds broken down by verb and URL count
kube_apiserver.admission_webhook_admission_latencies_seconds.sum
(gauge)
The admission webhook latency identified by name and broken out for each operation and API resource and type (validate or admit)
Shown as second
kube_apiserver.admission_webhook_admission_latencies_seconds.count
(count)
The admission webhook latency identified by name and broken out for each operation and API resource and type (validate or admit) count
kube_apiserver.admission_step_admission_latencies_seconds.sum
(gauge)
The admission sub-step latency broken out for each operation and API resource and step type (validate or admit)
Shown as second
kube_apiserver.admission_step_admission_latencies_seconds.count
(count)
The admission sub-step latency histogram broken out for each operation and API resource and step type (validate or admit) count
kube_apiserver.admission_step_admission_latencies_seconds_summary.sum
(gauge)
The admission sub-step latency summary broken out for each operation and API resource and step type (validate or admit)
Shown as second
kube_apiserver.admission_step_admission_latencies_seconds_summary.count
(count)
The admission sub-step latency summary broken out for each operation and API resource and step type (validate or admit) count
kube_apiserver.admission_step_admission_latencies_seconds_summary.quantile
(gauge)
The admission sub-step latency summary broken out for each operation and API resource and step type (validate or admit) quantile
Shown as second
kube_apiserver.admission_controller_admission_duration_seconds.sum
(gauge)
The admission controller latency histogram in seconds identified by name and broken out for each operation and API resource and type (validate or admit)
Shown as second
kube_apiserver.admission_controller_admission_duration_seconds.count
(count)
The admission controller latency histogram in seconds identified by name and broken out for each operation and API resource and type (validate or admit) count
kube_apiserver.request_latencies.sum
(gauge)
The response latency distribution in microseconds for each verb, resource and subresource
Shown as microsecond
kube_apiserver.request_latencies.count
(count)
The response latency distribution in microseconds for each verb, resource, and subresource count
kube_apiserver.request_duration_seconds.sum
(gauge)
The response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope, and component
Shown as second
kube_apiserver.request_duration_seconds.count
(count)
The response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope, and component count
kube_apiserver.registered_watchers
(gauge)
The number of currently registered watchers for a given resource
Shown as object
kube_apiserver.process_resident_memory_bytes
(gauge)
The resident memory size in bytes
Shown as byte
kube_apiserver.process_virtual_memory_bytes
(gauge)
The virtual memory size in bytes
Shown as byte
kube_apiserver.watch_events_sizes.sum
(gauge)
The watch event size distribution (Kubernetes 1.16+)
Shown as byte
kube_apiserver.watch_events_sizes.count
(count)
The watch event size distribution (Kubernetes 1.16+)
kube_apiserver.authentication_duration_seconds.sum
(gauge)
The authentication duration histogram broken out by result (Kubernetes 1.17+)
Shown as second
kube_apiserver.authentication_duration_seconds.count
(count)
The authentication duration histogram broken out by result (Kubernetes 1.17+)
kube_apiserver.authentication_attempts.count
(count)
The counter of authenticated attempts (Kubernetes 1.16+)
Shown as request
kube_apiserver.apiserver_request_terminations_total.count
(count)
The number of requests the apiserver terminated in self-defense (Kubernetes 1.17+)
Shown as request
kube_apiserver.grpc_client_handled_total
(count)
The total number of RPCs completed by the client regardless of success or failure
Shown as request
kube_apiserver.grpc_client_msg_received_total
(count)
The total number of gRPC stream messages received by the client
Shown as message
kube_apiserver.grpc_client_msg_sent_total
(count)
The total number of gRPC stream messages sent by the client
Shown as message
kube_apiserver.grpc_client_started_total
(count)
The total number of RPCs started on the client
Shown as request

Service Checks

Kube_apiserver_metrics does not include any service checks.

Events

Kube_apiserver_metrics does not include any events.

Troubleshooting

Need help? Contact Datadog support.