- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Google Kubernetes Engine(GKE)은 도커(Docker) 컨테이너 실행을 위한 강력한 클러스터 관리자 및 오케스트레이션 시스템입니다.
Google Kubernetes Engine으로 메트릭을 수집하면 다음 작업을 수행할 수 있습니다.
본 통합에는 다음과 같은 두 개의 별도 프리셋 대시보드가 제공됩니다.
표준 대시보드는 간단한 설정만으로 GKE에서 옵저빌리티를 제공합니다. 강화 대시보드는 추가 설정 단계가 필요하지만, 실시간 쿠버네티스(Kubernetes) 메트릭을 더 제공합니다. 대개 프로덕션 환경에서 워크로드를 모니터링할 목적으로 대시보드를 복제 및 사용자 정의할 때 더 적합합니다.
자체 호스팅된 쿠버네티스(Kubernetes) 클러스터와 달리, GKE 컨트롤 플레인은 Google이 관리하며 클러스터에서 실행되는 Datadog 에이전트에서는 접근할 수 없습니다. 따라서 GKE 컨트롤 플레인의 옵저빌리티를 활용하려면 클러스터를 모니터링하는데 Datadog 에이전트를 주로 사용하더라도 Google 통합이 필요합니다.
아직 설치하지 않았다면, 먼저 Google Cloud Platform 통합을 설정하세요. 기본 메트릭 및 프리셋 대시보드를 활용하는 데에는 추가 설치 단계가 필요하지 않습니다.
강화 대시보드를 채우고 애플리케이션 성능 모니터링(APM) 추적, 로깅, 프로파일링, 보안 및 기타 Datadog 서비스를 활성화하려면 GKE 클러스터에 Datadog 에이전트를 설치하세요.
컨트롤 플레인 메트릭을 채우려면 GKE 컨트롤 플레인 메트릭을 활성화해야 합니다. 컨트롤 플레인 메트릭을 사용하면 쿠버네티스(Kubernetes) 컨트롤 플레인 작업에 관한 옵저빌리티를 활용할 수 있으며, 이는 GKE에서 Google이 관리합니다.
Google Kubernetes Engine 로그는 Google Cloud Logging으로 수집하여 클라우드 Pub/Sub 토픽을 통해 데이터 플로우 작업으로 전송됩니다. 아직 설정하지 않았다면 Datadog 데이터 플로우 템플릿으로 로깅을 설정하세요.
해당 작업이 완료되면 Google Cloud Logging에서 Google Kubernetes Engine 로그를 다음 Pub/Sub 주제로 내보냅니다.
GCP 로그 탐색기 페이지로 이동하여 쿠버네티스(Kubernetes) 및 GKE 로그를 필터링합니다.
Create Sink를 클릭하고 그에 따라 싱크 이름을 지정합니다.
“Cloud Pub/Sub"를 대상으로 선택하고 해당 목적으로 생성된 Pub/Sub 주제를 선택합니다. 참고: Pub/Sub 주제는 다른 프로젝트에 있을 수 있습니다.
Create를 클릭하고 확인 메시지가 나타날 때까지 기다립니다.
gcp.gke.container.accelerator.duty_cycle (gauge) | Percent of time over the past sample period during which the accelerator was actively processing. Shown as percent |
gcp.gke.container.accelerator.memory_total (gauge) | Total accelerator memory. Shown as byte |
gcp.gke.container.accelerator.memory_used (gauge) | Total accelerator memory allocated. Shown as byte |
gcp.gke.container.accelerator.request (gauge) | Number of accelerator devices requested by the container. Shown as device |
gcp.gke.container.cpu.core_usage_time (count) | Cumulative CPU usage on all cores used by the container. Shown as second |
gcp.gke.container.cpu.limit_cores (gauge) | CPU cores limit of the container. Shown as core |
gcp.gke.container.cpu.limit_utilization (gauge) | Fraction of the CPU limit that is currently in use on the instance. Shown as fraction |
gcp.gke.container.cpu.request_cores (gauge) | Number of CPU cores requested by the container. Shown as core |
gcp.gke.container.cpu.request_utilization (gauge) | Fraction of the requested CPU that is currently in use on the instance. Shown as fraction |
gcp.gke.container.ephemeral_storage.limit_bytes (gauge) | Local ephemeral storage limit. Shown as byte |
gcp.gke.container.ephemeral_storage.request_bytes (gauge) | Local ephemeral storage request. Shown as byte |
gcp.gke.container.ephemeral_storage.used_bytes (gauge) | Local ephemeral storage usage. Shown as byte |
gcp.gke.container.memory.limit_bytes (gauge) | Memory limit of the container. Shown as byte |
gcp.gke.container.memory.limit_utlization (gauge) | Fraction of the memory limit that is currently in use on the instance. Shown as fraction |
gcp.gke.container.memory.page_fault_count (count) | Number of page faults, broken down by type. Shown as fault |
gcp.gke.container.memory.request_bytes (gauge) | Memory request of the container. Shown as byte |
gcp.gke.container.memory.request_utilization (gauge) | Fraction of the requested memory that is currently in use on the instance. Shown as fraction |
gcp.gke.container.memory.used_bytes (gauge) | Memory usage of the container. Shown as byte |
gcp.gke.container.restart_count (count) | Number of times the container has restarted. Shown as occurrence |
gcp.gke.container.uptime (gauge) | Time in seconds that the container has been running. Shown as second |
gcp.gke.node.cpu.allocatable_cores (gauge) | Number of allocatable CPU cores on the node. Shown as core |
gcp.gke.node.cpu.allocatable_utilization (gauge) | Fraction of the allocatable CPU that is currently in use on the instance. Shown as fraction |
gcp.gke.node.cpu.core_usage_time (count) | Cumulative CPU usage on all cores used on the node. Shown as second |
gcp.gke.node.cpu.total_cores (gauge) | Total number of CPU cores on the node. Shown as core |
gcp.gke.node.ephemeral_storage.allocatable_bytes (gauge) | Local ephemeral storage bytes allocatable on the node. Shown as byte |
gcp.gke.node.ephemeral_storage.inodes_free (gauge) | Free number of inodes on local ephemeral storage. |
gcp.gke.node.ephemeral_storage.inodes_total (gauge) | Total number of inodes on local ephemeral storage. |
gcp.gke.node.ephemeral_storage.total_bytes (gauge) | Total ephemeral storage bytes on the node. Shown as byte |
gcp.gke.node.ephemeral_storage.used_bytes (gauge) | Local ephemeral storage bytes used by the node. Shown as byte |
gcp.gke.node.memory.allocatable_bytes (gauge) | Cumulative memory bytes used by the node. Shown as byte |
gcp.gke.node.memory.allocatable_utilization (gauge) | Fraction of the allocatable memory that is currently in use on the instance. Shown as fraction |
gcp.gke.node.memory.total_bytes (gauge) | Number of bytes of memory allocatable on the node. Shown as byte |
gcp.gke.node.memory.used_bytes (gauge) | Cumulative memory bytes used by the node. Shown as byte |
gcp.gke.node.network.received_bytes_count (count) | Cumulative number of bytes received by the node over the network. Shown as byte |
gcp.gke.node.network.sent_bytes_count (count) | Cumulative number of bytes transmitted by the node over the network. Shown as byte |
gcp.gke.node.pid_limit (gauge) | Max PID of OS on the node. |
gcp.gke.node.pid_used (gauge) | Number of running process in the OS on the node. |
gcp.gke.node_daemon.cpu.core_usage_time (count) | Cumulative CPU usage on all cores used by the node level system daemon. Shown as second |
gcp.gke.node_daemon.memory.used_bytes (gauge) | Memory usage by the system daemon. Shown as byte |
gcp.gke.pod.network.received_bytes_count (count) | Cumulative number of bytes received by the pod over the network. Shown as byte |
gcp.gke.pod.network.sent_bytes_count (count) | Cumulative number of bytes transmitted by the pod over the network. Shown as byte |
gcp.gke.pod.volume.total_bytes (gauge) | Total number of disk bytes available to the pod. Shown as byte |
gcp.gke.pod.volume.used_bytes (gauge) | Number of disk bytes used by the pod. Shown as byte |
gcp.gke.pod.volume.utilization (gauge) | Fraction of the volume that is currently being used by the instance. Shown as fraction |
gcp.gke.control_plane.apiserver.admission_controller_admission_duration_seconds (gauge) | Admission controller latency histogram in seconds, identified by name and broken out for each operation and API resource and type (validate or admit). Shown as second |
gcp.gke.control_plane.apiserver.admission_step_admission_duration_seconds (gauge) | Admission sub-step latency histogram in seconds, broken out for each operation and API resource and step type (validate or admit). Shown as second |
gcp.gke.control_plane.apiserver.admission_webhook_admission_duration_seconds (gauge) | Admission webhook latency histogram in seconds, identified by name and broken out for each operation and API resource and type (validate or admit). Shown as second |
gcp.gke.control_plane.apiserver.current_inflight_requests (gauge) | Maximal number of currently used inflight request limit of this apiserver per request kind. Shown as request |
gcp.gke.control_plane.apiserver.request_duration_seconds (gauge) | Response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope and component. Shown as second |
gcp.gke.control_plane.apiserver.request_total (gauge) | Counter of apiserver requests broken out for each verb, dry run value, group, version, resource, scope, component, and HTTP response code. Shown as request |
gcp.gke.control_plane.apiserver.response_sizes (gauge) | Response size distribution in bytes for each group, version, verb, resource, subresource, scope and component. Shown as byte |
gcp.gke.control_plane.apiserver.storage_objects (gauge) | Number of stored objects at the time of last check split by kind. Shown as object |
gcp.gke.control_plane.controller_manager.node_collector_evictions_number (count) | Number of Node evictions that happened since current instance of NodeController started. Shown as event |
gcp.gke.control_plane.scheduler.pending_pods (gauge) | Number of pending pods, by the queue type. Shown as event |
gcp.gke.control_plane.scheduler.pod_scheduling_duration_seconds (gauge) | E2e latency for a pod being scheduled Shown as second |
gcp.gke.control_plane.scheduler.preemption_attempts_total (count) | Total preemption attempts in the cluster till now Shown as attempt |
gcp.gke.control_plane.scheduler.preemption_victims (gauge) | Number of selected preemption victims Shown as event |
gcp.gke.control_plane.scheduler.scheduling_attempt_duration_seconds (gauge) | Scheduling attempt latency in seconds Shown as second |
gcp.gke.control_plane.scheduler.schedule_attempts_total (gauge) | Number of attempts to schedule pods. Shown as attempt |
gcp.gke.control_plane.apiserver.aggregator_unavailable_apiservice (gauge) | (Deprecated) |
gcp.gke.control_plane.apiserver.audit_event_total (gauge) | (Deprecated) Accumulated number audit events generated and sent to the audit backend Shown as event |
gcp.gke.control_plane.apiserver.audit_level_total (gauge) | (Deprecated) |
gcp.gke.control_plane.apiserver.audit_requests_rejected_total (gauge) | (Deprecated) Shown as request |
gcp.gke.control_plane.apiserver.client_certificate_expiration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.apiserver.etcd_object_counts (gauge) | (Deprecated) Number of stored objects split by kind. Shown as object |
gcp.gke.control_plane.apiserver.etcd_request_duration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.apiserver.init_events_total (gauge) | (Deprecated) Shown as event |
gcp.gke.control_plane.apiserver.longrunning_gauge (gauge) | (Deprecated) Gauge of all active long-running apiserver requests. Shown as request |
gcp.gke.control_plane.apiserver.registered_watchers (gauge) | (Deprecated) Number of currently registered watchers for a given resource. Shown as object |
gcp.gke.control_plane.apiserver.workqueue_adds_total (count) | (Deprecated) |
gcp.gke.control_plane.apiserver.workqueue_depth (gauge) | (Deprecated) |
gcp.gke.control_plane.apiserver.workqueue_longest_running_processor_seconds (gauge) | (Deprecated) Number of seconds that the longest running processor has been running. Shown as second |
gcp.gke.control_plane.apiserver.workqueue_queue_duration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.apiserver.workqueue_retries_total (count) | (Deprecated) |
gcp.gke.control_plane.apiserver.workqueue_unfinished_work_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.apiserver.workqueue_work_duration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.controller_manager.cloudprovider_gce_api_request_duration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.controller_manager.cronjob_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by cronjob controller |
gcp.gke.control_plane.controller_manager.daemon_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by daemon controller |
gcp.gke.control_plane.controller_manager.deployment_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by deployment controller |
gcp.gke.control_plane.controller_manager.endpoint_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by endpoint controller |
gcp.gke.control_plane.controller_manager.gc_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by GC controller |
gcp.gke.control_plane.controller_manager.job_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by job controller |
gcp.gke.control_plane.controller_manager.leader_election_master_status (gauge) | (Deprecated) |
gcp.gke.control_plane.controller_manager.namespace_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by namespace controller |
gcp.gke.control_plane.controller_manager.node_collector_evictions_number (count) | (Deprecated) Count of node eviction events. |
gcp.gke.control_plane.controller_manager.node_collector_unhealthy_nodes_in_zone (gauge) | (Deprecated) Number of unhealthy nodes |
gcp.gke.control_plane.controller_manager.node_collector_zone_health (gauge) | (Deprecated) |
gcp.gke.control_plane.controller_manager.node_collector_zone_size (gauge) | (Deprecated) |
gcp.gke.control_plane.controller_manager.node_ipam_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by IPAM controller |
gcp.gke.control_plane.controller_manager.node_lifecycle_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by lifecycle controller |
gcp.gke.control_plane.controller_manager.persistentvolume_protection_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by persistent volume protection controller |
gcp.gke.control_plane.controller_manager.persistentvolumeclaim_protection_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by persistent volume claim protection controller |
gcp.gke.control_plane.controller_manager.replicaset_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by ReplicaSet controller |
gcp.gke.control_plane.controller_manager.replication_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by replication controller |
gcp.gke.control_plane.controller_manager.route_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by route controller |
gcp.gke.control_plane.controller_manager.service_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by service controller |
gcp.gke.control_plane.controller_manager.serviceaccount_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by service account controller |
gcp.gke.control_plane.controller_manager.serviceaccount_tokens_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by service account tokens controller |
gcp.gke.control_plane.controller_manager.workqueue_adds_total (count) | (Deprecated) |
gcp.gke.control_plane.controller_manager.workqueue_depth (gauge) | (Deprecated) |
gcp.gke.control_plane.controller_manager.workqueue_longest_running_processor_seconds (gauge) | (Deprecated) Number of seconds that the longest running processor has been running. Shown as second |
gcp.gke.control_plane.controller_manager.workqueue_queue_duration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.controller_manager.workqueue_retries_total (count) | (Deprecated) |
gcp.gke.control_plane.controller_manager.workqueue_unfinished_work_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.controller_manager.workqueue_work_duration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.scheduler.binding_duration_seconds (gauge) | (Deprecated) Number of latency in seconds. Shown as second |
gcp.gke.control_plane.scheduler.e2e_scheduling_duration_seconds (gauge) | (Deprecated) Total e2e scheduling latency. Shown as second |
gcp.gke.control_plane.scheduler.framework_extension_point_duration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.scheduler.leader_election_master_status (gauge) | (Deprecated) |
gcp.gke.control_plane.scheduler.scheduling_algorithm_duration_seconds (gauge) | (Deprecated) Total scheduling algorithm latency. Shown as second |
gcp.gke.control_plane.scheduler.scheduling_algorithm_preemption_evaluation_seconds (gauge) | (Deprecated) Shown as second |
Google Kubernetes Engine 통합은 이벤트를 포함하지 않습니다.
Google Kubernetes Engine 통합은 서비스 점검을 포함하지 않습니다.
도움이 필요하신가요? Datadog 지원팀에 문의하세요.