Dataproc Job

Docs > DDSQL Reference > Data Directory > Dataproc Job

A Dataproc Job in Google Cloud is a workload submitted to a Dataproc cluster for processing big data tasks. It supports running jobs using open-source frameworks like Apache Spark, Hadoop, Hive, and Pig. Users can submit jobs directly without managing the underlying infrastructure, and Dataproc handles resource allocation, scaling, and monitoring. This allows efficient execution of data processing, analytics, and machine learning workloads on managed clusters.

gcp.dataproc_job

Fields

ID	Type	Data Type	Description
_key	core	string
ancestors	core	array<string>
datadog_display_name	core	string
done	core	bool	Output only. Indicates whether the job is completed. If the value is false, the job is still in progress. If true, the job is completed, and status.state field will indicate if it was successful, failed, or cancelled.
driver_control_files_uri	core	string	Output only. If present, the location of miscellaneous control files which can be used as part of job setup and handling. If not present, control files might be placed in the same location as driver_output_uri.
driver_output_resource_uri	core	string	Output only. A URI pointing to the location of the stdout of the job's driver program.
driver_scheduling_config	core	json	Optional. Driver scheduling configuration.
flink_job	core	json	Optional. Job is a Flink job.
gcp_status	core	json	Output only. The job status. Additional application-specific status information might be contained in the type_job and yarn_applications fields.
hadoop_job	core	json	Optional. Job is a Hadoop job.
hive_job	core	json	Optional. Job is a Hive job.
job_uuid	core	string	Output only. A UUID that uniquely identifies a job within the project over time. This is in contrast to a user-settable reference.job_id that might be reused over time.
labels	core	array<string>	Optional. The labels to associate with this job. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values can be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a job.
organization_id	core	string
parent	core	string
pig_job	core	json	Optional. Job is a Pig job.
placement	core	json	Required. Job information, including how, when, and where to run the job.
presto_job	core	json	Optional. Job is a Presto job.
project_id	core	string
project_number	core	string
pyspark_job	core	json	Optional. Job is a PySpark job.
reference	core	json	Optional. The fully qualified reference to the job, which can be used to obtain the equivalent REST path of the job resource. If this property is not specified when a job is created, the server generates a job_id.
region_id	core	string
resource_name	core	string
scheduling	core	json	Optional. Job scheduling configuration.
spark_job	core	json	Optional. Job is a Spark job.
spark_r_job	core	json	Optional. Job is a SparkR job.
spark_sql_job	core	json	Optional. Job is a SparkSql job.
status_history	core	json	Output only. The previous job status.
tags	core	hstore_csv
trino_job	core	json	Optional. Job is a Trino job.
yarn_applications	core	json	Output only. The collection of YARN applications spun up by this job.Beta Feature: This report is available for testing purposes only. It might be changed before final release.
zone_id	core	string