Google Cloud Dataproc

概要

Google Cloud Dataproc は、Apache Spark と Apache Hadoop のクラスターを簡単かつコスト効率よく実行するための高速で使いやすいフルマネージド型のクラウドサービスです。

Datadog Google Cloud Platform インテグレーションを使用して、Google Cloud Dataproc からメトリクスを収集できます。

セットアップ

インストール

Google Cloud Platform インテグレーション をまだセットアップしていない場合は、最初にセットアップします。それ以上のインストール手順はありません。

ログの収集

Google Cloud Dataproc のログは Google Cloud Logging により収集され、HTTP プッシュフォワーダーを使用して Cloud Pub/Sub へ送信されます。HTTP プッシュフォワーダーを使用した Cloud Pub/Sub をまだセットアップしていない場合は、これをセットアップしてください。

これが完了したら、Google Cloud Dataproc のログを Google Cloud Logging から Pub/Sub へエクスポートします。

  1. Google Cloud Logging のページ に移動し、Google Cloud Dataproc のログを絞り込みます。
  2. Create Export をクリックし、シンクに名前を付けます。
  3. エクスポート先として「Cloud Pub/Sub」を選択し、エクスポート用に作成された Pub/Sub を選択します。: この Pub/Sub は別のプロジェクト内に配置することもできます。
  4. 作成をクリックし、確認メッセージが表示されるまで待ちます。

収集データ

メトリクス

gcp.dataproc.batch.spark.executors
(gauge)
Indicates the number of Batch Spark executors.
Shown as worker
gcp.dataproc.cluster.hdfs.datanodes
(gauge)
Indicates the number of HDFS DataNodes that are running inside a cluster.
Shown as node
gcp.dataproc.cluster.hdfs.storage_capacity
(gauge)
Indicates capacity of HDFS system running on a cluster in GB.
Shown as gibibyte
gcp.dataproc.cluster.hdfs.storage_utilization
(gauge)
The percentage of HDFS storage currently used.
Shown as percent
gcp.dataproc.cluster.hdfs.unhealthy_blocks
(gauge)
Indicates the number of unhealthy blocks inside the cluster.
Shown as block
gcp.dataproc.cluster.job.completion_time.avg
(gauge)
The time jobs took to complete from the time the user submits a job to the time Dataproc reports it is completed.
Shown as millisecond
gcp.dataproc.cluster.job.completion_time.samplecount
(count)
Sample count for cluster job completion time.
Shown as millisecond
gcp.dataproc.cluster.job.completion_time.sumsqdev
(gauge)
Sum of squared deviation for cluster job completion time.
Shown as second
gcp.dataproc.cluster.job.duration.avg
(gauge)
The time jobs have spent in a given state.
Shown as millisecond
gcp.dataproc.cluster.job.duration.samplecount
(count)
Sample count for cluster job duration.
Shown as millisecond
gcp.dataproc.cluster.job.duration.sumsqdev
(gauge)
Sum of squared deviation for cluster job duration.
Shown as second
gcp.dataproc.cluster.job.failed_count
(count)
Indicates the number of jobs that have failed on a cluster.
Shown as job
gcp.dataproc.cluster.job.running_count
(gauge)
Indicates the number of jobs that are running on a cluster.
Shown as job
gcp.dataproc.cluster.job.submitted_count
(count)
Indicates the number of jobs that have been submitted to a cluster.
Shown as job
gcp.dataproc.cluster.nodes.expected
(gauge)
Indicates the number of nodes that are expected in a cluster.
Shown as node
gcp.dataproc.cluster.nodes.failed_count
(count)
Indicates the number of nodes that have failed in a cluster.
Shown as node
gcp.dataproc.cluster.nodes.recovered_count
(count)
Indicates the number of nodes that are detected as failed and have been successfully removed from cluster.
Shown as node
gcp.dataproc.cluster.nodes.running
(gauge)
Indicates the number of nodes in running state.
Shown as node
gcp.dataproc.cluster.operation.completion_time.avg
(gauge)
The time operations took to complete from the time the user submits a operation to the time Dataproc reports it is completed.
Shown as millisecond
gcp.dataproc.cluster.operation.completion_time.samplecount
(count)
Sample count for cluster operation completion time.
Shown as millisecond
gcp.dataproc.cluster.operation.completion_time.sumsqdev
(gauge)
Sum of squared deviation for cluster operation completion time.
Shown as second
gcp.dataproc.cluster.operation.duration.avg
(gauge)
The time operations have spent in a given state.
Shown as millisecond
gcp.dataproc.cluster.operation.duration.samplecount
(count)
Sample count for cluster operation duration.
Shown as millisecond
gcp.dataproc.cluster.operation.duration.sumsqdev
(gauge)
Sum of squared deviation for cluster operation duration.
Shown as second
gcp.dataproc.cluster.operation.failed_count
(count)
Indicates the number of operations that have failed on a cluster.
Shown as operation
gcp.dataproc.cluster.operation.running_count
(gauge)
Indicates the number of operations that are running on a cluster.
Shown as operation
gcp.dataproc.cluster.operation.submitted_count
(count)
Indicates the number of operations that have been submitted to a cluster.
Shown as operation
gcp.dataproc.cluster.yarn.allocated_memory_percentage
(gauge)
The percentage of YARN memory is allocated.
Shown as percent
gcp.dataproc.cluster.yarn.apps
(gauge)
Indicates the number of active YARN applications.
gcp.dataproc.cluster.yarn.containers
(gauge)
Indicates the number of YARN containers.
Shown as container
gcp.dataproc.cluster.yarn.memory_size
(gauge)
Indicates the YARN memory size in GB.
Shown as gibibyte
gcp.dataproc.cluster.yarn.nodemanagers
(gauge)
Indicates the number of YARN NodeManagers running inside cluster.
gcp.dataproc.cluster.yarn.pending_memory_size
(gauge)
The current memory request, in GB, that is pending to be fulfilled by the scheduler.
Shown as gibibyte
gcp.dataproc.cluster.yarn.virtual_cores
(gauge)
Indicates the number of virtual cores in YARN.
Shown as core
gcp.dataproc.job.state
(gauge)
Indicates whether job is currently in a particular state or not.
gcp.dataproc.session.spark.executors
(gauge)
Indicates the number of Session Spark executors.
Shown as worker

イベント

Google Cloud Dataproc インテグレーションには、イベントは含まれません。

サービスのチェック

Google Cloud Dataproc インテグレーションには、サービスのチェック機能は含まれません。

トラブルシューティング

ご不明な点は、Datadog のサポートチーム までお問合せください。