Google Cloud Dataproc

개요

Google Cloud Dataproc은 빠르고 사용하기 쉬운 완전관리형 클라우드 서비스로, 더욱 간단하고 비용 효율적인 방식으로 Apache Spark 및 Apache Hadoop 클러스터를 실행할 수 있습니다.

Datadog Google Cloud Platform 통합을 사용하여 Google Cloud Dataproc에서 메트릭을 수집합니다.

설정

설치

아직 설치하지 않았다면 먼저 Google 클라우드 플랫폼 통합을 설정합니다. 그 외 다른 설치가 필요하지 않습니다.

로그 수집

Google Cloud Dataproc 로그는 Google Cloud Logging으로 수집하여 클라우드 Pub/Sub 토픽을 통해 데이터 플로우 작업으로 전송됩니다. 아직 설정하지 않았다면 Datadog 데이터 플로우 템플릿으로 로깅을 설정하세요.

해당 작업이 완료되면 Google Cloud Logging에서 Google Cloud Dataproc 로그를 다음 Pub/Sub 주제로 내보냅니다.

Google Cloud Logging 페이지로 이동해 Google Cloud Dataproc 로그를 필터링하세요.
Create Export를 클릭하고 싱크 이름을 지정하세요.
“Cloud Pub/Sub"를 대상으로 선택하고 해당 목적으로 생성된 Pub/Sub 주제를 선택합니다. 참고: Pub/Sub 주제는 다른 프로젝트에 있을 수 있습니다.
Create를 클릭하고 확인 메시지가 나타날 때까지 기다립니다.

수집한 데이터

메트릭


gcp.dataproc.batch.spark.executors (gauge)	Indicates the number of Batch Spark executors. Shown as worker
gcp.dataproc.cluster.capacity_deviation (gauge)	Difference between the expected node count in the cluster and the actual active YARN node managers.
gcp.dataproc.cluster.hdfs.datanodes (gauge)	Indicates the number of HDFS DataNodes that are running inside a cluster. Shown as node
gcp.dataproc.cluster.hdfs.storage_capacity (gauge)	Indicates capacity of HDFS system running on a cluster in GB. Shown as gibibyte
gcp.dataproc.cluster.hdfs.storage_utilization (gauge)	The percentage of HDFS storage currently used. Shown as percent
gcp.dataproc.cluster.hdfs.unhealthy_blocks (gauge)	Indicates the number of unhealthy blocks inside the cluster. Shown as block
gcp.dataproc.cluster.job.completion_time.avg (gauge)	The time jobs took to complete from the time the user submits a job to the time Dataproc reports it is completed. Shown as millisecond
gcp.dataproc.cluster.job.completion_time.samplecount (count)	Sample count for cluster job completion time. Shown as millisecond
gcp.dataproc.cluster.job.completion_time.sumsqdev (gauge)	Sum of squared deviation for cluster job completion time. Shown as second
gcp.dataproc.cluster.job.duration.avg (gauge)	The time jobs have spent in a given state. Shown as millisecond
gcp.dataproc.cluster.job.duration.samplecount (count)	Sample count for cluster job duration. Shown as millisecond
gcp.dataproc.cluster.job.duration.sumsqdev (gauge)	Sum of squared deviation for cluster job duration. Shown as second
gcp.dataproc.cluster.job.failed_count (count)	Indicates the number of jobs that have failed on a cluster. Shown as job
gcp.dataproc.cluster.job.running_count (gauge)	Indicates the number of jobs that are running on a cluster. Shown as job
gcp.dataproc.cluster.job.submitted_count (count)	Indicates the number of jobs that have been submitted to a cluster. Shown as job
gcp.dataproc.cluster.mig_instances.failed_count (count)	Indicates the number of instance failures for a managed instance group.
gcp.dataproc.cluster.nodes.expected (gauge)	Indicates the number of nodes that are expected in a cluster. Shown as node
gcp.dataproc.cluster.nodes.failed_count (count)	Indicates the number of nodes that have failed in a cluster. Shown as node
gcp.dataproc.cluster.nodes.recovered_count (count)	Indicates the number of nodes that are detected as failed and have been successfully removed from cluster. Shown as node
gcp.dataproc.cluster.nodes.running (gauge)	Indicates the number of nodes in running state. Shown as node
gcp.dataproc.cluster.operation.completion_time.avg (gauge)	The time operations took to complete from the time the user submits an operation to the time Dataproc reports it is completed. Shown as millisecond
gcp.dataproc.cluster.operation.completion_time.samplecount (count)	Sample count for cluster operation completion time. Shown as millisecond
gcp.dataproc.cluster.operation.completion_time.sumsqdev (gauge)	Sum of squared deviation for cluster operation completion time. Shown as second
gcp.dataproc.cluster.operation.duration.avg (gauge)	The time operations have spent in a given state. Shown as millisecond
gcp.dataproc.cluster.operation.duration.samplecount (count)	Sample count for cluster operation duration. Shown as millisecond
gcp.dataproc.cluster.operation.duration.sumsqdev (gauge)	Sum of squared deviation for cluster operation duration. Shown as second
gcp.dataproc.cluster.operation.failed_count (count)	Indicates the number of operations that have failed on a cluster. Shown as operation
gcp.dataproc.cluster.operation.running_count (gauge)	Indicates the number of operations that are running on a cluster. Shown as operation
gcp.dataproc.cluster.operation.submitted_count (count)	Indicates the number of operations that have been submitted to a cluster. Shown as operation
gcp.dataproc.cluster.yarn.allocated_memory_percentage (gauge)	The percentage of YARN memory is allocated. Shown as percent
gcp.dataproc.cluster.yarn.apps (gauge)	Indicates the number of active YARN applications.
gcp.dataproc.cluster.yarn.containers (gauge)	Indicates the number of YARN containers. Shown as container
gcp.dataproc.cluster.yarn.memory_size (gauge)	Indicates the YARN memory size in GB. Shown as gibibyte
gcp.dataproc.cluster.yarn.nodemanagers (gauge)	Indicates the number of YARN NodeManagers running inside cluster.
gcp.dataproc.cluster.yarn.pending_memory_size (gauge)	The current memory request, in GB, that is pending to be fulfilled by the scheduler. Shown as gibibyte
gcp.dataproc.cluster.yarn.virtual_cores (gauge)	Indicates the number of virtual cores in YARN. Shown as core
gcp.dataproc.job.state (gauge)	Indicates whether job is currently in a particular state or not.
gcp.dataproc.job.yarn.memory_seconds (gauge)	Indicates the Memory Seconds consumed by the `job_id` job per yarn `application_id`.
gcp.dataproc.job.yarn.vcore_seconds (gauge)	Indicates the VCore Seconds consumed by the `job_id` job per yarn `application_id`.
gcp.dataproc.node.problem_count (count)	Total number of times a specific type of problem has occurred.
gcp.dataproc.node.yarn.nodemanager.health (gauge)	YARN NodeManager health state.
gcp.dataproc.session.spark.executors (gauge)	Indicates the number of Session Spark executors. Shown as worker

이벤트

Google Cloud Dataproc 통합은 이벤트를 포함하지 않습니다.

서비스 점검

Google Cloud Dataproc 통합은 서비스 점검을 포함하지 않습니다.

트러블슈팅

도움이 필요하신가요? Datadog 지원팀에 문의하세요.

Google Cloud Dataproc

개요

설정

설치

로그 수집

수집한 데이터

메트릭

이벤트

서비스 점검

트러블슈팅

How can I help you today?