Dataproc Job

A Dataproc Job in Google Cloud is a workload submitted to a Dataproc cluster for processing big data tasks. It supports running jobs using open-source frameworks like Apache Spark, Hadoop, Hive, and Pig. Users can submit jobs directly without managing the underlying infrastructure, and Dataproc handles resource allocation, scaling, and monitoring. This allows efficient execution of data processing, analytics, and machine learning workloads on managed clusters.

gcp.dataproc_job

Fields

TitleIDTypeData TypeDescription
_keycorestring
ancestorscorearray<string>
datadog_display_namecorestring
donecoreboolOutput only. Indicates whether the job is completed. If the value is false, the job is still in progress. If true, the job is completed, and status.state field will indicate if it was successful, failed, or cancelled.
driver_control_files_uricorestringOutput only. If present, the location of miscellaneous control files which can be used as part of job setup and handling. If not present, control files might be placed in the same location as driver_output_uri.
driver_output_resource_uricorestringOutput only. A URI pointing to the location of the stdout of the job's driver program.
driver_scheduling_configcorejsonOptional. Driver scheduling configuration.
flink_jobcorejsonOptional. Job is a Flink job.
gcp_statuscorejsonOutput only. The job status. Additional application-specific status information might be contained in the type_job and yarn_applications fields.
hadoop_jobcorejsonOptional. Job is a Hadoop job.
hive_jobcorejsonOptional. Job is a Hive job.
job_uuidcorestringOutput only. A UUID that uniquely identifies a job within the project over time. This is in contrast to a user-settable reference.job_id that might be reused over time.
labelscorearray<string>Optional. The labels to associate with this job. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values can be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a job.
organization_idcorestring
parentcorestring
pig_jobcorejsonOptional. Job is a Pig job.
placementcorejsonRequired. Job information, including how, when, and where to run the job.
presto_jobcorejsonOptional. Job is a Presto job.
project_idcorestring
project_numbercorestring
pyspark_jobcorejsonOptional. Job is a PySpark job.
referencecorejsonOptional. The fully qualified reference to the job, which can be used to obtain the equivalent REST path of the job resource. If this property is not specified when a job is created, the server generates a job_id.
resource_namecorestring
schedulingcorejsonOptional. Job scheduling configuration.
spark_jobcorejsonOptional. Job is a Spark job.
spark_r_jobcorejsonOptional. Job is a SparkR job.
spark_sql_jobcorejsonOptional. Job is a SparkSql job.
status_historycorejsonOutput only. The previous job status.
tagscorehstore
trino_jobcorejsonOptional. Job is a Trino job.
yarn_applicationscorejsonOutput only. The collection of YARN applications spun up by this job.Beta Feature: This report is available for testing purposes only. It might be changed before final release.