Google Cloud Dataproc

Présentation

Cloud Dataproc est un service cloud rapide, facile à utiliser et entièrement géré qui permet une exécution plus simple et plus rentable des clusters Apache Spark et Apache Hadoop.

Utilisez l’intégration Datadog/Google Cloud Platform pour recueillir des métriques de Google Cloud Dataproc.

Configuration

Installation

Si vous ne l’avez pas déjà fait, configurez d’abord l’intégration Google Cloud Platform. Aucune autre procédure d’installation n’est requise.

Collecte de logs

Les logs Google Cloud Dataproc sont recueillis avec Google Cloud Logging et envoyés à un Cloud Pub/Sub via un forwarder Push HTTP. Si vous ne l’avez pas déjà fait, configurez un Cloud Pub/Sub à l’aide d’un forwarder Push HTTP.

Une fois cette opération effectuée, exportez vos logs Google Cloud Dataproc depuis Google Cloud Logging vers le Pub/Sub :

  1. Accédez à la page Google Cloud Logging et filtrez les logs Google Cloud Dataproc.
  2. Cliquez sur Create Export et nommez le récepteur.
  3. Choisissez Cloud Pub/Sub comme destination et sélectionnez le Pub/Sub créé à cette fin. Remarque : le Pub/Sub peut se situer dans un autre projet.
  4. Cliquez sur Create et attendez que le message de confirmation s’affiche.

Données collectées

Métriques

gcp.dataproc.batch.spark.executors
(gauge)
Indicates the number of Batch Spark executors.
Shown as worker
gcp.dataproc.cluster.capacity_deviation
(gauge)
Difference between the expected node count in the cluster and the actual active YARN node managers.
gcp.dataproc.cluster.hdfs.datanodes
(gauge)
Indicates the number of HDFS DataNodes that are running inside a cluster.
Shown as node
gcp.dataproc.cluster.hdfs.storage_capacity
(gauge)
Indicates capacity of HDFS system running on a cluster in GB.
Shown as gibibyte
gcp.dataproc.cluster.hdfs.storage_utilization
(gauge)
The percentage of HDFS storage currently used.
Shown as percent
gcp.dataproc.cluster.hdfs.unhealthy_blocks
(gauge)
Indicates the number of unhealthy blocks inside the cluster.
Shown as block
gcp.dataproc.cluster.job.completion_time.avg
(gauge)
The time jobs took to complete from the time the user submits a job to the time Dataproc reports it is completed.
Shown as millisecond
gcp.dataproc.cluster.job.completion_time.samplecount
(count)
Sample count for cluster job completion time.
Shown as millisecond
gcp.dataproc.cluster.job.completion_time.sumsqdev
(gauge)
Sum of squared deviation for cluster job completion time.
Shown as second
gcp.dataproc.cluster.job.duration.avg
(gauge)
The time jobs have spent in a given state.
Shown as millisecond
gcp.dataproc.cluster.job.duration.samplecount
(count)
Sample count for cluster job duration.
Shown as millisecond
gcp.dataproc.cluster.job.duration.sumsqdev
(gauge)
Sum of squared deviation for cluster job duration.
Shown as second
gcp.dataproc.cluster.job.failed_count
(count)
Indicates the number of jobs that have failed on a cluster.
Shown as job
gcp.dataproc.cluster.job.running_count
(gauge)
Indicates the number of jobs that are running on a cluster.
Shown as job
gcp.dataproc.cluster.job.submitted_count
(count)
Indicates the number of jobs that have been submitted to a cluster.
Shown as job
gcp.dataproc.cluster.mig_instances.failed_count
(count)
Indicates the number of instance failures for a managed instance group.
gcp.dataproc.cluster.nodes.expected
(gauge)
Indicates the number of nodes that are expected in a cluster.
Shown as node
gcp.dataproc.cluster.nodes.failed_count
(count)
Indicates the number of nodes that have failed in a cluster.
Shown as node
gcp.dataproc.cluster.nodes.recovered_count
(count)
Indicates the number of nodes that are detected as failed and have been successfully removed from cluster.
Shown as node
gcp.dataproc.cluster.nodes.running
(gauge)
Indicates the number of nodes in running state.
Shown as node
gcp.dataproc.cluster.operation.completion_time.avg
(gauge)
The time operations took to complete from the time the user submits an operation to the time Dataproc reports it is completed.
Shown as millisecond
gcp.dataproc.cluster.operation.completion_time.samplecount
(count)
Sample count for cluster operation completion time.
Shown as millisecond
gcp.dataproc.cluster.operation.completion_time.sumsqdev
(gauge)
Sum of squared deviation for cluster operation completion time.
Shown as second
gcp.dataproc.cluster.operation.duration.avg
(gauge)
The time operations have spent in a given state.
Shown as millisecond
gcp.dataproc.cluster.operation.duration.samplecount
(count)
Sample count for cluster operation duration.
Shown as millisecond
gcp.dataproc.cluster.operation.duration.sumsqdev
(gauge)
Sum of squared deviation for cluster operation duration.
Shown as second
gcp.dataproc.cluster.operation.failed_count
(count)
Indicates the number of operations that have failed on a cluster.
Shown as operation
gcp.dataproc.cluster.operation.running_count
(gauge)
Indicates the number of operations that are running on a cluster.
Shown as operation
gcp.dataproc.cluster.operation.submitted_count
(count)
Indicates the number of operations that have been submitted to a cluster.
Shown as operation
gcp.dataproc.cluster.yarn.allocated_memory_percentage
(gauge)
The percentage of YARN memory is allocated.
Shown as percent
gcp.dataproc.cluster.yarn.apps
(gauge)
Indicates the number of active YARN applications.
gcp.dataproc.cluster.yarn.containers
(gauge)
Indicates the number of YARN containers.
Shown as container
gcp.dataproc.cluster.yarn.memory_size
(gauge)
Indicates the YARN memory size in GB.
Shown as gibibyte
gcp.dataproc.cluster.yarn.nodemanagers
(gauge)
Indicates the number of YARN NodeManagers running inside cluster.
gcp.dataproc.cluster.yarn.pending_memory_size
(gauge)
The current memory request, in GB, that is pending to be fulfilled by the scheduler.
Shown as gibibyte
gcp.dataproc.cluster.yarn.virtual_cores
(gauge)
Indicates the number of virtual cores in YARN.
Shown as core
gcp.dataproc.job.state
(gauge)
Indicates whether job is currently in a particular state or not.
gcp.dataproc.job.yarn.memory_seconds
(gauge)
Indicates the Memory Seconds consumed by the job_id job per yarn application_id.
gcp.dataproc.job.yarn.vcore_seconds
(gauge)
Indicates the VCore Seconds consumed by the job_id job per yarn application_id.
gcp.dataproc.node.problem_count
(count)
Total number of times a specific type of problem has occurred.
gcp.dataproc.node.yarn.nodemanager.health
(gauge)
YARN NodeManager health state.
gcp.dataproc.session.spark.executors
(gauge)
Indicates the number of Session Spark executors.
Shown as worker

Événements

L’intégration Google Cloud Dataproc n’inclut aucun événement.

Checks de service

L’intégration Google Cloud Dataproc n’inclut aucun check de service.

Dépannage

Besoin d’aide ? Contactez l’assistance Datadog.