Kubernetes API 서버 메트릭

Supported OS Windows Mac OS

통합 버전6.2.0

To find out if this integration is available in your organization, see your Datadog Integrations page or ask your organization administrator.

To initiate an exception request to enable this integration for your organization, email support@ddog-gov.com.

Kubernetes API Server 대시보드

개요

이 점검은 Kube_apiserver_metrics를 모니터링합니다.

설정

설치

Kube_apiserver_metrics 점검은 Datadog Agent 패키지에 포함되어 있으므로 서버에 추가 설치할 필요가 없습니다.

구성

쿠버네티스(Kubernetes) 클러스터에 마스터 노드가 있고 kube-apiserver 이미지에 대한 포드와 컨테이너 를 실행 중인 경우, Datadog 에이전트가 자동으로 해당 포드를 감지하고 kube_apiserver_metrics.d/auto_conf.yaml 파일에 관한 통합을 설정합니다.

그러나 GKE, EKS 또는 AKS와 같은 관리형 쿠버네티스(Kubernetes) 배포를 사용하는 경우, 에이전트가 감지할 수 있는 실행 중인 kube-apiserver 포드가 없을 수도 있습니다.

해당 경우 default 네임스페이스에서 kubernetes 서비스에 대한 통합을 설정할 수 있습니다.

파라미터
<INTEGRATION_NAME>["kube_apiserver_metrics"]
<INIT_CONFIG>[{}]
<INSTANCE_CONFIG>[{"prometheus_url": "https://%%host%%:%%port%%/metrics"}]

사용 가능한 모든 설정 옵션은 kube_apiserver_metrics.yaml에서 검토할 수 있습니다.

서비스 어노테이션

default 네임스페이스의 쿠버네티스(Kubernetes) 서비스에 다음과 같이 어노테이션할 수 있습니다.

ad.datadoghq.com/endpoints.checks: |
  {
    "kube_apiserver_metrics": {
      "instances": [
        {
          "prometheus_url": "https://%%host%%:%%port%%/metrics"
        }
      ]
    }
  }
annotations:
  ad.datadoghq.com/endpoints.check_names: '["kube_apiserver_metrics"]'
  ad.datadoghq.com/endpoints.init_configs: '[{}]'
  ad.datadoghq.com/endpoints.instances:
    '[{ "prometheus_url": "https://%%host%%:%%port%%/metrics"}]'

그런 다음 Datadog Cluster Agent는 각 엔드포인트에 대한 점검을 Datadog Agent에 예약합니다.

로컬 파일

에이전트의 설정 디렉토리의 루트에 있는 conf.d/ 폴더의 kube_apiserver_metrics.yaml 파일에서 직접 엔드포인트를 설정하여 클러스터 점검으로 디스패칭하도록 설정하여 해당 점검을 실행할 수도 있습니다.

참고: 로컬 파일 또는 ConfigMap을 사용하는 경우 설정 파일에 cluster_check: true를 추가하여 클러스터 점검을 설정합니다.

클러스터 에이전트에 설정를 제공하여 클러스터 점검을 설정합니다.

clusterAgent:
  confd:
    kube_apiserver_metrics.yaml: |-
      advanced_ad_identifiers:
        - kube_endpoints:
            name: "kubernetes"
            namespace: "default"
      cluster_check: true
      init_config:
      instances:
        - prometheus_url: "https://%%host%%:%%port%%/metrics"
spec:
#(...)
  override:
    clusterAgent:
      extraConfd:
        configDataMap:
          kube_apiserver_metrics.yaml: |-
            advanced_ad_identifiers:
              - kube_endpoints:
                  name: "kubernetes"
                  namespace: "default"
            cluster_check: true
            init_config:
            instances:
              - prometheus_url: "https://%%host%%:%%port%%/metrics"

해당 설정은 에이전트를 트리거하여 정의된 엔드포인트 IP 주소 및 정의된 포트에서 default 네임스페이스의 kubernetes 서비스에 요청합니다.

검증

에이전트의 상태 하위 명령을 실행하고 점검 섹션에서 kube_apiserver_metrics를 찾습니다.

수집한 데이터

메트릭

kube_apiserver.APIServiceRegistrationController_depth
(gauge)
The current depth of workqueue: APIServiceRegistrationController
kube_apiserver.admission_controller_admission_duration_seconds.count
(count)
The admission controller latency histogram in seconds identified by name and broken out for each operation and API resource and type (validate or admit) count
kube_apiserver.admission_controller_admission_duration_seconds.sum
(gauge)
The admission controller latency histogram in seconds identified by name and broken out for each operation and API resource and type (validate or admit)
Shown as second
kube_apiserver.admission_step_admission_latencies_seconds.count
(count)
The admission sub-step latency histogram broken out for each operation and API resource and step type (validate or admit) count
kube_apiserver.admission_step_admission_latencies_seconds.sum
(gauge)
The admission sub-step latency broken out for each operation and API resource and step type (validate or admit)
Shown as second
kube_apiserver.admission_step_admission_latencies_seconds_summary.count
(count)
The admission sub-step latency summary broken out for each operation and API resource and step type (validate or admit) count
kube_apiserver.admission_step_admission_latencies_seconds_summary.quantile
(gauge)
The admission sub-step latency summary broken out for each operation and API resource and step type (validate or admit) quantile
Shown as second
kube_apiserver.admission_step_admission_latencies_seconds_summary.sum
(gauge)
The admission sub-step latency summary broken out for each operation and API resource and step type (validate or admit)
Shown as second
kube_apiserver.admission_webhook_admission_latencies_seconds.count
(count)
The admission webhook latency identified by name and broken out for each operation and API resource and type (validate or admit) count
kube_apiserver.admission_webhook_admission_latencies_seconds.sum
(gauge)
The admission webhook latency identified by name and broken out for each operation and API resource and type (validate or admit)
Shown as second
kube_apiserver.aggregator_unavailable_apiservice
(gauge)
Gauge of APIServices which are marked as unavailable broken down by APIService name (alpha; Kubernetes 1.14+)
kube_apiserver.apiserver_admission_webhook_fail_open_count
(gauge)
Admission webhook fail open count, identified by name and broken out for each admission type (validating or mutating).
kube_apiserver.apiserver_admission_webhook_fail_open_count.count
(count)
Admission webhook fail open count, identified by name and broken out for each admission type (validating or mutating).
kube_apiserver.apiserver_admission_webhook_request_total
(gauge)
Admission webhook request total, identified by name and broken out for each admission type (alpha; Kubernetes 1.23+)
kube_apiserver.apiserver_admission_webhook_request_total.count
(count)
Admission webhook request total, identified by name and broken out for each admission type (alpha; Kubernetes 1.23+)
kube_apiserver.apiserver_dropped_requests_total
(gauge)
The accumulated number of requests dropped with ‘Try again later’ response
Shown as request
kube_apiserver.apiserver_dropped_requests_total.count
(count)
The monotonic count of requests dropped with ‘Try again later’ response
Shown as request
kube_apiserver.apiserver_request_count
(gauge)
The accumulated number of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (deprecated in Kubernetes 1.15)
Shown as request
kube_apiserver.apiserver_request_count.count
(count)
The monotonic count of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (deprecated in Kubernetes 1.15)
Shown as request
kube_apiserver.apiserver_request_terminations_total.count
(count)
The number of requests the apiserver terminated in self-defense (Kubernetes 1.17+)
Shown as request
kube_apiserver.apiserver_request_total
(gauge)
The accumulated number of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (Kubernetes 1.15+; replaces apiserver_request_count)
Shown as request
kube_apiserver.apiserver_request_total.count
(count)
The monotonic count of apiserver requests broken out for each verb API resource client and HTTP response contentType and code (Kubernetes 1.15+; replaces apiserver_request_count.count)
Shown as request
kube_apiserver.audit_event
(gauge)
The accumulated number audit events generated and sent to the audit backend
Shown as event
kube_apiserver.audit_event.count
(count)
The monotonic count of audit events generated and sent to the audit backend
Shown as event
kube_apiserver.authenticated_user_requests
(gauge)
The accumulated number of authenticated requests broken out by username
Shown as request
kube_apiserver.authenticated_user_requests.count
(count)
The monotonic count of authenticated requests broken out by username
Shown as request
kube_apiserver.authentication_attempts.count
(count)
The counter of authenticated attempts (Kubernetes 1.16+)
Shown as request
kube_apiserver.authentication_duration_seconds.count
(count)
The authentication duration histogram broken out by result (Kubernetes 1.17+)
kube_apiserver.authentication_duration_seconds.sum
(gauge)
The authentication duration histogram broken out by result (Kubernetes 1.17+)
Shown as second
kube_apiserver.current_inflight_requests
(gauge)
The maximal number of currently used inflight request limit of this apiserver per request kind in last second.
kube_apiserver.envelope_encryption_dek_cache_fill_percent
(gauge)
Percent of the cache slots currently occupied by cached DEKs.
kube_apiserver.etcd.db.total_size
(gauge)
The total size of the etcd database file physically allocated in bytes (alpha; Kubernetes 1.19+)
Shown as byte
kube_apiserver.etcd_object_counts
(gauge)
The number of stored objects at the time of last check split by kind (alpha; deprecated in Kubernetes 1.22)
Shown as object
kube_apiserver.etcd_request_duration_seconds.count
(count)
Etcd request latencies count for each operation and object type (alpha)
kube_apiserver.etcd_request_duration_seconds.sum
(gauge)
Etcd request latencies for each operation and object type (alpha)
Shown as second
kube_apiserver.etcd_request_errors_total
(count)
Etcd failed request counts for each operation and object type
Shown as request
kube_apiserver.etcd_requests_total
(count)
Etcd request counts for each operation and object type
Shown as request
kube_apiserver.flowcontrol_current_executing_requests
(gauge)
Number of requests in initial (for a WATCH) or any (for a non-WATCH) execution stage in the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_current_executing_seats
(gauge)
Number of seats (concurrency units) currently occupied by executing requests in the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_current_inqueue_requests
(count)
Number of requests currently pending in queues of the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_dispatched_requests_total
(count)
Number of requests executed by API Priority and Fairness subsystem
kube_apiserver.flowcontrol_nominal_limit_seats
(gauge)
Nominal limit on the number of execution seats available to requests in the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_rejected_requests_total.count
(count)
Number of requests rejected by API Priority and Fairness subsystem
kube_apiserver.flowcontrol_request_concurrency_limit
(gauge)
Shared concurrency limit in the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_request_wait_duration_seconds.count
(count)
The request wait duration histogram count in the API Priority and Fairness subsystem
kube_apiserver.flowcontrol_request_wait_duration_seconds.sum
(gauge)
The request wait duration histogram sum in the API Priority and Fairness subsystem
Shown as second
kube_apiserver.go_goroutines
(gauge)
The number of goroutines that currently exist
kube_apiserver.go_threads
(gauge)
The number of OS threads created
Shown as thread
kube_apiserver.grpc_client_handled_total
(count)
The total number of RPCs completed by the client regardless of success or failure
Shown as request
kube_apiserver.grpc_client_msg_received_total
(count)
The total number of gRPC stream messages received by the client
Shown as message
kube_apiserver.grpc_client_msg_sent_total
(count)
The total number of gRPC stream messages sent by the client
Shown as message
kube_apiserver.grpc_client_started_total
(count)
The total number of RPCs started on the client
Shown as request
kube_apiserver.http_requests_total
(gauge)
The accumulated number of HTTP requests made
Shown as request
kube_apiserver.http_requests_total.count
(count)
The monotonic count of the number of HTTP requests made
Shown as request
kube_apiserver.kubernetes_feature_enabled
(gauge)
Whether a Kubernetes feature gate is enabled or not, identified by name and stage (alpha; Kubernetes 1.26+)
kube_apiserver.longrunning_gauge
(gauge)
The gauge of all active long-running apiserver requests broken out by verb, group, version, resource, scope, and component. Not all requests are tracked this way.
Shown as request
kube_apiserver.process_cpu_total
(count)
Total user and system CPU time spent in seconds.
Shown as second
kube_apiserver.process_resident_memory_bytes
(gauge)
The resident memory size in bytes
Shown as byte
kube_apiserver.process_virtual_memory_bytes
(gauge)
The virtual memory size in bytes
Shown as byte
kube_apiserver.registered_watchers
(gauge)
The number of currently registered watchers for a given resource
Shown as object
kube_apiserver.request_duration_seconds.count
(count)
The response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope, and component count
kube_apiserver.request_duration_seconds.sum
(gauge)
The response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope, and component
Shown as second
kube_apiserver.request_latencies.count
(count)
The response latency distribution in microseconds for each verb, resource, and subresource count
kube_apiserver.request_latencies.sum
(gauge)
The response latency distribution in microseconds for each verb, resource and subresource
Shown as microsecond
kube_apiserver.requested_deprecated_apis
(gauge)
Gauge of deprecated APIs that have been requested, broken out by API group, version, resource, subresource, and removed_release
Shown as request
kube_apiserver.rest_client_request_latency_seconds.count
(count)
The request latency in seconds broken down by verb and URL count
kube_apiserver.rest_client_request_latency_seconds.sum
(gauge)
The request latency in seconds broken down by verb and URL
Shown as second
kube_apiserver.rest_client_requests_total
(gauge)
The accumulated number of HTTP requests partitioned by status code method and host
Shown as request
kube_apiserver.rest_client_requests_total.count
(count)
The monotonic count of HTTP requests partitioned by status code method and host
Shown as request
kube_apiserver.slis.kubernetes_healthcheck
(gauge)
Result of a single kubernetes apiserver healthcheck (alpha; requires k8s v1.26+)
kube_apiserver.slis.kubernetes_healthcheck_total
(count)
The monotonic count of all kubernetes apiserver healthchecks (alpha; requires k8s v1.26+)
kube_apiserver.storage_list_evaluated_objects_total
(gauge)
The number of objects tested in the course of serving a LIST request from storage (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.storage_list_fetched_objects_total
(gauge)
The number of objects read from storage in the course of serving a LIST request (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.storage_list_returned_objects_total
(gauge)
The number of objects returned for a LIST request from storage (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.storage_list_total
(gauge)
The number of LIST requests served from storage (alpha; Kubernetes 1.23+)
Shown as object
kube_apiserver.storage_objects
(gauge)
The number of stored objects at the time of last check split by kind (Kubernetes 1.21+; replaces etcd_object_counts)
Shown as object
kube_apiserver.watch_events_sizes.count
(count)
The watch event size distribution (Kubernetes 1.16+)
kube_apiserver.watch_events_sizes.sum
(gauge)
The watch event size distribution (Kubernetes 1.16+)
Shown as byte

서비스 점검

Kube_apiserver_metrics는 서비스 점검을 포함하지 않습니다.

이벤트

Kube_apiserver_metrics는 이벤트를 포함하지 않습니다.

트러블슈팅

도움이 필요하신가요? Datadog 고객 지원팀에 문의하세요.