Network Performance Monitoring is now generally available! Network Monitoring is now available!

Google Cloud Dataproc

Crawler Crawler

Overview

Google Cloud Dataproc is a fast, easy-to-use, fully-managed cloud service for running Apache Spark and Apache Hadoop clusters in a simpler, more cost-efficient way.

Use the Datadog Google Cloud Platform integration to collect metrics from Google Cloud Dataproc.

Setup

Installation

If you haven’t already, set up the Google Cloud Platform integration first. There are no other installation steps.

Log collection

Google Cloud Dataproc logs are collected with Stackdriver and sent to a Cloud pub/sub with an HTTP push forwarder. If you haven’t already, set up a Cloud pub/sub with an HTTP push forwarder.

Once this is done, export your Google Cloud Dataproc logs from Stackdriver to the pub/sub:

  1. Go to the Stackdriver page and filter the Google Cloud Dataproc logs.
  2. Click Create Export and name the sink.
  3. Choose “Cloud Pub/Sub” as the destination and select the pub/sub that was created for that purpose. Note: The pub/sub can be located in a different project.
  4. Click Create and wait for the confirmation message to show up.

Data Collected

Metrics

gcp.dataproc.cluster.hdfs.datanodes
(gauge)
Indicates the number of HDFS DataNodes that are running inside a cluster.
Shown as node
gcp.dataproc.cluster.hdfs.storage_capacity
(gauge)
Indicates capacity of HDFS system running on cluster in GB.
Shown as gibibyte
gcp.dataproc.cluster.hdfs.storage_utilization
(gauge)
The percentage of HDFS storage currently used.
Shown as percent
gcp.dataproc.cluster.hdfs.unhealthy_blocks
(gauge)
Indicates the number of unhealthy blocks inside the cluster.
Shown as block
gcp.dataproc.cluster.job.completion_time.avg
(gauge)
The time jobs took to complete from the time the user submits a job to the time Dataproc reports it is completed.
Shown as millisecond
gcp.dataproc.cluster.job.completion_time.samplecount
(count)
Sample count for cluster job completion time
Shown as millisecond
gcp.dataproc.cluster.job.completion_time.sumsqdev
(gauge)
Sum of squared deviation for cluster job completion time
Shown as second
gcp.dataproc.cluster.job.duration.avg
(gauge)
The time jobs have spent in a given state.
Shown as millisecond
gcp.dataproc.cluster.job.duration.samplecount
(count)
Sample count for cluster job duration
Shown as millisecond
gcp.dataproc.cluster.job.duration.sumsqdev
(gauge)
Sum of squared deviation for cluster job duration
Shown as second
gcp.dataproc.cluster.job.failed_count
(count)
Indicates the number of jobs that have failed on a cluster.
Shown as job
gcp.dataproc.cluster.job.running_count
(gauge)
Indicates the number of jobs that are running on a cluster.
Shown as job
gcp.dataproc.cluster.job.submitted_count
(count)
Indicates the number of jobs that have been submitted to a cluster.
Shown as job
gcp.dataproc.cluster.operation.completion_time.avg
(gauge)
The time operations took to complete from the time the user submits a operation to the time Dataproc reports it is completed.
Shown as millisecond
gcp.dataproc.cluster.operation.completion_time.samplecount
(count)
Sample count for cluster operation completion time
Shown as millisecond
gcp.dataproc.cluster.operation.completion_time.sumsqdev
(gauge)
Sum of squared deviation for cluster operation completion time
Shown as second
gcp.dataproc.cluster.operation.duration.avg
(gauge)
The time operations have spent in a given state.
Shown as millisecond
gcp.dataproc.cluster.operation.duration.samplecount
(count)
Sample count for cluster operation duration
Shown as millisecond
gcp.dataproc.cluster.operation.duration.sumsqdev
(gauge)
Sum of squared deviation for cluster operation duration
Shown as second
gcp.dataproc.cluster.operation.failed_count
(count)
Indicates the number of operations that have failed on a cluster.
Shown as operation
gcp.dataproc.cluster.operation.running_count
(gauge)
Indicates the number of operations that are running on a cluster.
Shown as operation
gcp.dataproc.cluster.operation.submitted_count
(count)
Indicates the number of operations that have been submitted to a cluster.
Shown as operation
gcp.dataproc.cluster.yarn.allocated_memory_percentage
(gauge)
The percentage of YARN memory is allocated.
Shown as percent
gcp.dataproc.cluster.yarn.apps
(gauge)
Indicates the number of active YARN applications.
gcp.dataproc.cluster.yarn.containers
(gauge)
Indicates the number of YARN containers.
Shown as container
gcp.dataproc.cluster.yarn.memory_size
(gauge)
Indicates the YARN memory size in GB.
Shown as gibibyte
gcp.dataproc.cluster.yarn.nodemanagers
(gauge)
Indicates the number of YARN NodeManagers running inside cluster.
gcp.dataproc.cluster.yarn.pending_memory_size
(gauge)
The current memory request, in GB, that is pending to be fulfilled by the scheduler.
Shown as gibibyte
gcp.dataproc.cluster.yarn.virtual_cores
(gauge)
Indicates the number of virtual cores in YARN.
Shown as core

Events

The Google Cloud Dataproc integration does not include any events.

Service Checks

The Google Cloud Dataproc integration does not include any service checks.

Troubleshooting

Need help? Contact Datadog support.