- 필수 기능
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- 디지털 경험
- 소프트웨어 제공
- 보안
- 로그 관리
- 관리
- 인프라스트럭처
- ci
- containers
- csm
- ndm
- otel_guides
- overview
- slos
- synthetics
- tests
- 워크플로
Google Kubernetes Engine (GKE) is a powerful cluster manager and orchestration system for running your Docker containers.
Get metrics from Google Kubernetes Engine to:
This integration comes with two separate preset dashboards:
The standard dashboard provides observability in GKE with a simple configuration. The enhanced dashboard requires additional configuration steps, but provides more real-time Kubernetes metrics, and is often a better place to start from when cloning and customizing a dashboard for monitoring workloads in production.
Unlike self-hosted Kubernetes clusters, the GKE control plane is managed by Google and not accessible by a Datadog Agent running in the cluster. Therefore, observability into the GKE control plane requires the Google integration even if you are primarily using the Datadog Agent to monitor your clusters.
If you haven’t already, set up the Google Cloud Platform integration first. There are no other installation steps for the standard metrics and preset dashboard.
To populate the enhanced dashboard and enable APM tracing, logging, profiling, security, and other Datadog services, install the Datadog Agent into your GKE cluster.
To populate the control plane metrics, you must enable GKE control plane metrics. Control plane metrics give you visibility into the operation of the Kubernetes control plane, which is managed by Google in GKE.
Google Kubernetes Engine logs are collected with Google Cloud Logging and sent to a Dataflow job through a Cloud Pub/Sub topic. If you haven’t already, set up logging with the Datadog Dataflow template.
Once this is done, export your Google Kubernetes Engine logs from Google Cloud Logging to the Pub/Sub topic:
Go to the GCP Logs Explorer page and filter Kubernetes and GKE logs.
Click Create Sink and name the sink accordingly.
Choose “Cloud Pub/Sub” as the destination and select the Pub/Sub topic that was created for that purpose. Note: The Pub/Sub topic can be located in a different project.
Click Create and wait for the confirmation message to show up.
gcp.gke.container.accelerator.duty_cycle (gauge) | Percent of time over the past sample period during which the accelerator was actively processing. Shown as percent |
gcp.gke.container.accelerator.memory_total (gauge) | Total accelerator memory. Shown as byte |
gcp.gke.container.accelerator.memory_used (gauge) | Total accelerator memory allocated. Shown as byte |
gcp.gke.container.accelerator.request (gauge) | Number of accelerator devices requested by the container. Shown as device |
gcp.gke.container.cpu.core_usage_time (count) | Cumulative CPU usage on all cores used by the container. Shown as second |
gcp.gke.container.cpu.limit_cores (gauge) | CPU cores limit of the container. Shown as core |
gcp.gke.container.cpu.limit_utilization (gauge) | Fraction of the CPU limit that is currently in use on the instance. Shown as fraction |
gcp.gke.container.cpu.request_cores (gauge) | Number of CPU cores requested by the container. Shown as core |
gcp.gke.container.cpu.request_utilization (gauge) | Fraction of the requested CPU that is currently in use on the instance. Shown as fraction |
gcp.gke.container.ephemeral_storage.limit_bytes (gauge) | Local ephemeral storage limit. Shown as byte |
gcp.gke.container.ephemeral_storage.request_bytes (gauge) | Local ephemeral storage request. Shown as byte |
gcp.gke.container.ephemeral_storage.used_bytes (gauge) | Local ephemeral storage usage. Shown as byte |
gcp.gke.container.memory.limit_bytes (gauge) | Memory limit of the container. Shown as byte |
gcp.gke.container.memory.limit_utlization (gauge) | Fraction of the memory limit that is currently in use on the instance. Shown as fraction |
gcp.gke.container.memory.page_fault_count (count) | Number of page faults, broken down by type. Shown as fault |
gcp.gke.container.memory.request_bytes (gauge) | Memory request of the container. Shown as byte |
gcp.gke.container.memory.request_utilization (gauge) | Fraction of the requested memory that is currently in use on the instance. Shown as fraction |
gcp.gke.container.memory.used_bytes (gauge) | Memory usage of the container. Shown as byte |
gcp.gke.container.restart_count (count) | Number of times the container has restarted. Shown as occurrence |
gcp.gke.container.uptime (gauge) | Time in seconds that the container has been running. Shown as second |
gcp.gke.node.cpu.allocatable_cores (gauge) | Number of allocatable CPU cores on the node. Shown as core |
gcp.gke.node.cpu.allocatable_utilization (gauge) | Fraction of the allocatable CPU that is currently in use on the instance. Shown as fraction |
gcp.gke.node.cpu.core_usage_time (count) | Cumulative CPU usage on all cores used on the node. Shown as second |
gcp.gke.node.cpu.total_cores (gauge) | Total number of CPU cores on the node. Shown as core |
gcp.gke.node.ephemeral_storage.allocatable_bytes (gauge) | Local ephemeral storage bytes allocatable on the node. Shown as byte |
gcp.gke.node.ephemeral_storage.inodes_free (gauge) | Free number of inodes on local ephemeral storage. |
gcp.gke.node.ephemeral_storage.inodes_total (gauge) | Total number of inodes on local ephemeral storage. |
gcp.gke.node.ephemeral_storage.total_bytes (gauge) | Total ephemeral storage bytes on the node. Shown as byte |
gcp.gke.node.ephemeral_storage.used_bytes (gauge) | Local ephemeral storage bytes used by the node. Shown as byte |
gcp.gke.node.memory.allocatable_bytes (gauge) | Cumulative memory bytes used by the node. Shown as byte |
gcp.gke.node.memory.allocatable_utilization (gauge) | Fraction of the allocatable memory that is currently in use on the instance. Shown as fraction |
gcp.gke.node.memory.total_bytes (gauge) | Number of bytes of memory allocatable on the node. Shown as byte |
gcp.gke.node.memory.used_bytes (gauge) | Cumulative memory bytes used by the node. Shown as byte |
gcp.gke.node.network.received_bytes_count (count) | Cumulative number of bytes received by the node over the network. Shown as byte |
gcp.gke.node.network.sent_bytes_count (count) | Cumulative number of bytes transmitted by the node over the network. Shown as byte |
gcp.gke.node.pid_limit (gauge) | Max PID of OS on the node. |
gcp.gke.node.pid_used (gauge) | Number of running process in the OS on the node. |
gcp.gke.node_daemon.cpu.core_usage_time (count) | Cumulative CPU usage on all cores used by the node level system daemon. Shown as second |
gcp.gke.node_daemon.memory.used_bytes (gauge) | Memory usage by the system daemon. Shown as byte |
gcp.gke.pod.network.received_bytes_count (count) | Cumulative number of bytes received by the pod over the network. Shown as byte |
gcp.gke.pod.network.sent_bytes_count (count) | Cumulative number of bytes transmitted by the pod over the network. Shown as byte |
gcp.gke.pod.volume.total_bytes (gauge) | Total number of disk bytes available to the pod. Shown as byte |
gcp.gke.pod.volume.used_bytes (gauge) | Number of disk bytes used by the pod. Shown as byte |
gcp.gke.pod.volume.utilization (gauge) | Fraction of the volume that is currently being used by the instance. Shown as fraction |
gcp.gke.control_plane.apiserver.admission_controller_admission_duration_seconds (gauge) | Admission controller latency histogram in seconds, identified by name and broken out for each operation and API resource and type (validate or admit). Shown as second |
gcp.gke.control_plane.apiserver.admission_step_admission_duration_seconds (gauge) | Admission sub-step latency histogram in seconds, broken out for each operation and API resource and step type (validate or admit). Shown as second |
gcp.gke.control_plane.apiserver.admission_webhook_admission_duration_seconds (gauge) | Admission webhook latency histogram in seconds, identified by name and broken out for each operation and API resource and type (validate or admit). Shown as second |
gcp.gke.control_plane.apiserver.current_inflight_requests (gauge) | Maximal number of currently used inflight request limit of this apiserver per request kind. Shown as request |
gcp.gke.control_plane.apiserver.request_duration_seconds (gauge) | Response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope and component. Shown as second |
gcp.gke.control_plane.apiserver.request_total (gauge) | Counter of apiserver requests broken out for each verb, dry run value, group, version, resource, scope, component, and HTTP response code. Shown as request |
gcp.gke.control_plane.apiserver.response_sizes (gauge) | Response size distribution in bytes for each group, version, verb, resource, subresource, scope and component. Shown as byte |
gcp.gke.control_plane.apiserver.storage_objects (gauge) | Number of stored objects at the time of last check split by kind. Shown as object |
gcp.gke.control_plane.controller_manager.node_collector_evictions_number (count) | Number of Node evictions that happened since current instance of NodeController started. Shown as event |
gcp.gke.control_plane.scheduler.pending_pods (gauge) | Number of pending pods, by the queue type. Shown as event |
gcp.gke.control_plane.scheduler.pod_scheduling_duration_seconds (gauge) | E2e latency for a pod being scheduled Shown as second |
gcp.gke.control_plane.scheduler.preemption_attempts_total (count) | Total preemption attempts in the cluster till now Shown as attempt |
gcp.gke.control_plane.scheduler.preemption_victims (gauge) | Number of selected preemption victims Shown as event |
gcp.gke.control_plane.scheduler.scheduling_attempt_duration_seconds (gauge) | Scheduling attempt latency in seconds Shown as second |
gcp.gke.control_plane.scheduler.schedule_attempts_total (gauge) | Number of attempts to schedule pods. Shown as attempt |
gcp.gke.control_plane.apiserver.aggregator_unavailable_apiservice (gauge) | (Deprecated) |
gcp.gke.control_plane.apiserver.audit_event_total (gauge) | (Deprecated) Accumulated number audit events generated and sent to the audit backend Shown as event |
gcp.gke.control_plane.apiserver.audit_level_total (gauge) | (Deprecated) |
gcp.gke.control_plane.apiserver.audit_requests_rejected_total (gauge) | (Deprecated) Shown as request |
gcp.gke.control_plane.apiserver.client_certificate_expiration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.apiserver.etcd_object_counts (gauge) | (Deprecated) Number of stored objects split by kind. Shown as object |
gcp.gke.control_plane.apiserver.etcd_request_duration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.apiserver.init_events_total (gauge) | (Deprecated) Shown as event |
gcp.gke.control_plane.apiserver.longrunning_gauge (gauge) | (Deprecated) Gauge of all active long-running apiserver requests. Shown as request |
gcp.gke.control_plane.apiserver.registered_watchers (gauge) | (Deprecated) Number of currently registered watchers for a given resource. Shown as object |
gcp.gke.control_plane.apiserver.workqueue_adds_total (count) | (Deprecated) |
gcp.gke.control_plane.apiserver.workqueue_depth (gauge) | (Deprecated) |
gcp.gke.control_plane.apiserver.workqueue_longest_running_processor_seconds (gauge) | (Deprecated) Number of seconds that the longest running processor has been running. Shown as second |
gcp.gke.control_plane.apiserver.workqueue_queue_duration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.apiserver.workqueue_retries_total (count) | (Deprecated) |
gcp.gke.control_plane.apiserver.workqueue_unfinished_work_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.apiserver.workqueue_work_duration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.controller_manager.cloudprovider_gce_api_request_duration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.controller_manager.cronjob_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by cronjob controller |
gcp.gke.control_plane.controller_manager.daemon_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by daemon controller |
gcp.gke.control_plane.controller_manager.deployment_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by deployment controller |
gcp.gke.control_plane.controller_manager.endpoint_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by endpoint controller |
gcp.gke.control_plane.controller_manager.gc_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by GC controller |
gcp.gke.control_plane.controller_manager.job_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by job controller |
gcp.gke.control_plane.controller_manager.leader_election_master_status (gauge) | (Deprecated) |
gcp.gke.control_plane.controller_manager.namespace_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by namespace controller |
gcp.gke.control_plane.controller_manager.node_collector_evictions_number (count) | (Deprecated) Count of node eviction events. |
gcp.gke.control_plane.controller_manager.node_collector_unhealthy_nodes_in_zone (gauge) | (Deprecated) Number of unhealthy nodes |
gcp.gke.control_plane.controller_manager.node_collector_zone_health (gauge) | (Deprecated) |
gcp.gke.control_plane.controller_manager.node_collector_zone_size (gauge) | (Deprecated) |
gcp.gke.control_plane.controller_manager.node_ipam_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by IPAM controller |
gcp.gke.control_plane.controller_manager.node_lifecycle_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by lifecycle controller |
gcp.gke.control_plane.controller_manager.persistentvolume_protection_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by persistent volume protection controller |
gcp.gke.control_plane.controller_manager.persistentvolumeclaim_protection_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by persistent volume claim protection controller |
gcp.gke.control_plane.controller_manager.replicaset_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by ReplicaSet controller |
gcp.gke.control_plane.controller_manager.replication_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by replication controller |
gcp.gke.control_plane.controller_manager.route_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by route controller |
gcp.gke.control_plane.controller_manager.service_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by service controller |
gcp.gke.control_plane.controller_manager.serviceaccount_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by service account controller |
gcp.gke.control_plane.controller_manager.serviceaccount_tokens_controller_rate_limiter_use (gauge) | (Deprecated) Usage of the rate limiter by service account tokens controller |
gcp.gke.control_plane.controller_manager.workqueue_adds_total (count) | (Deprecated) |
gcp.gke.control_plane.controller_manager.workqueue_depth (gauge) | (Deprecated) |
gcp.gke.control_plane.controller_manager.workqueue_longest_running_processor_seconds (gauge) | (Deprecated) Number of seconds that the longest running processor has been running. Shown as second |
gcp.gke.control_plane.controller_manager.workqueue_queue_duration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.controller_manager.workqueue_retries_total (count) | (Deprecated) |
gcp.gke.control_plane.controller_manager.workqueue_unfinished_work_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.controller_manager.workqueue_work_duration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.scheduler.binding_duration_seconds (gauge) | (Deprecated) Number of latency in seconds. Shown as second |
gcp.gke.control_plane.scheduler.e2e_scheduling_duration_seconds (gauge) | (Deprecated) Total e2e scheduling latency. Shown as second |
gcp.gke.control_plane.scheduler.framework_extension_point_duration_seconds (gauge) | (Deprecated) Shown as second |
gcp.gke.control_plane.scheduler.leader_election_master_status (gauge) | (Deprecated) |
gcp.gke.control_plane.scheduler.scheduling_algorithm_duration_seconds (gauge) | (Deprecated) Total scheduling algorithm latency. Shown as second |
gcp.gke.control_plane.scheduler.scheduling_algorithm_preemption_evaluation_seconds (gauge) | (Deprecated) Shown as second |
The Google Kubernetes Engine integration does not include any events.
The Google Kubernetes Engine integration does not include any service checks.
Need help? Contact Datadog support.