Join us at the Dash conference! July 16-17, NYC

Google Cloud Dataflow

Crawler Crawler

Overview

Google Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness.

Use the Datadog Google Cloud integration to collect metrics from Google Cloud Dataflow.

Setup

Installation

If you haven’t already, set up the Google Cloud Platform integration first. There are no other installation steps that need to be performed.

Data Collected

Metrics

gcp.dataflow.job.billable_shuffle_data_processed
(gauge)
The billable bytes of shuffle data processed by this Dataflow job.
shown as byte
gcp.dataflow.job.current_num_vcpus
(gauge)
The number of vCPUs currently being used by this Dataflow job.
shown as cpu
gcp.dataflow.job.current_shuffle_slots
(gauge)
The current shuffle slots used by this Dataflow job.
gcp.dataflow.job.data_watermark_age
(gauge)
The age (time since event timestamp) of the most recent item of data that has been fully processed by the pipeline.
shown as second
gcp.dataflow.job.elapsed_time
(gauge)
Duration that the current run of this pipeline has been in the Running state so far, in seconds. When a run completes, this stays at the duration of that run until the next run starts.
shown as second
gcp.dataflow.job.element_count
(count)
Number of elements added to the pcollection so far.
shown as item
gcp.dataflow.job.estimated_byte_count
(count)
An estimated number of bytes added to the pcollection so far.
shown as byte
gcp.dataflow.job.is_failed
(gauge)
Has this job failed.
gcp.dataflow.job.system_lag
(gauge)
The current maximum duration that an item of data has been awaiting processing, in seconds.
shown as second
gcp.dataflow.job.total_memory_usage_time
(gauge)
The total GB seconds of memory allocated to this Dataflow job.
shown as gibibyte
gcp.dataflow.job.total_pd_usage_time
(gauge)
The total GB seconds for all persistent disk used by all workers associated with this Dataflow job.
shown as gibibyte
gcp.dataflow.job.total_shuffle_data_processed
(gauge)
The total bytes of shuffle data processed by this Dataflow job.
shown as byte
gcp.dataflow.job.total_streaming_data_processed
(gauge)
The total bytes of streaming data processed by this Dataflow job.
shown as byte
gcp.dataflow.job.total_vcpu_time
(gauge)
The total vCPU seconds used by this Dataflow job.
gcp.dataflow.job.user_counter
(gauge)
A user-defined counter metric.

Events

The Google Cloud Dataflow integration does not include any events.

Service Checks

The Google Cloud Dataflow integration does not include any service checks.

Troubleshooting

Need help? Contact Datadog support.