This page is not yet available in Spanish. We are working on its translation.
If you have any questions or feedback about our current translation project, feel free to reach out to us!

gcp_dataproc_batch

ancestors

Type: UNORDERED_LIST_STRING

create_time

Type: TIMESTAMP
Provider name: createTime
Description: Output only. The time when the batch was created.

creator

Type: STRING
Provider name: creator
Description: Output only. The email address of the user who created the batch.

environment_config

Type: STRUCT
Provider name: environmentConfig
Description: Optional. Environment configuration for the batch execution.

  • execution_config
    Type: STRUCT
    Provider name: executionConfig
    Description: Optional. Execution configuration for a workload.
    • authentication_config
      Type: STRUCT
      Provider name: authenticationConfig
      Description: Optional. Authentication configuration used to set the default identity for the workload execution. The config specifies the type of identity (service account or user) that will be used by workloads to access resources on the project(s).
      • user_workload_authentication_type
        Type: STRING
        Provider name: userWorkloadAuthenticationType
        Description: Optional. Authentication type for the user workload running in containers.
        Possible values:
        • AUTHENTICATION_TYPE_UNSPECIFIED - If AuthenticationType is unspecified then END_USER_CREDENTIALS is used for 3.0 and newer runtimes, and SERVICE_ACCOUNT is used for older runtimes.
        • SERVICE_ACCOUNT - Use service account credentials for authenticating to other services.
        • END_USER_CREDENTIALS - Use OAuth credentials associated with the workload creator/user for authenticating to other services.
    • idle_ttl
      Type: STRING
      Provider name: idleTtl
      Description: Optional. Applies to sessions only. The duration to keep the session alive while it’s idling. Exceeding this threshold causes the session to terminate. This field cannot be set on a batch workload. Minimum value is 10 minutes; maximum value is 14 days (see JSON representation of Duration (https://developers.google.com/protocol-buffers/docs/proto3#json)). Defaults to 1 hour if not set. If both ttl and idle_ttl are specified for an interactive session, the conditions are treated as OR conditions: the workload will be terminated when it has been idle for idle_ttl or when ttl has been exceeded, whichever occurs first.
    • kms_key
      Type: STRING
      Provider name: kmsKey
      Description: Optional. The Cloud KMS key to use for encryption.
    • network_tags
      Type: UNORDERED_LIST_STRING
      Provider name: networkTags
      Description: Optional. Tags used for network traffic control.
    • network_uri
      Type: STRING
      Provider name: networkUri
      Description: Optional. Network URI to connect workload to.
    • service_account
      Type: STRING
      Provider name: serviceAccount
      Description: Optional. Service account that used to execute workload.
    • staging_bucket
      Type: STRING
      Provider name: stagingBucket
      Description: Optional. A Cloud Storage bucket used to stage workload dependencies, config files, and store workload output and other ephemeral data, such as Spark history files. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location according to the region where your workload is running, and then create and manage project-level, per-location staging and temporary buckets. This field requires a Cloud Storage bucket name, not a gs://… URI to a Cloud Storage bucket.
    • subnetwork_uri
      Type: STRING
      Provider name: subnetworkUri
      Description: Optional. Subnetwork URI to connect workload to.
    • ttl
      Type: STRING
      Provider name: ttl
      Description: Optional. The duration after which the workload will be terminated, specified as the JSON representation for Duration (https://protobuf.dev/programming-guides/proto3/#json). When the workload exceeds this duration, it will be unconditionally terminated without waiting for ongoing work to finish. If ttl is not specified for a batch workload, the workload will be allowed to run until it exits naturally (or run forever without exiting). If ttl is not specified for an interactive session, it defaults to 24 hours. If ttl is not specified for a batch that uses 2.1+ runtime version, it defaults to 4 hours. Minimum value is 10 minutes; maximum value is 14 days. If both ttl and idle_ttl are specified (for an interactive session), the conditions are treated as OR conditions: the workload will be terminated when it has been idle for idle_ttl or when ttl has been exceeded, whichever occurs first.
  • peripherals_config
    Type: STRUCT
    Provider name: peripheralsConfig
    Description: Optional. Peripherals configuration that workload has access to.
    • metastore_service
      Type: STRING
      Provider name: metastoreService
      Description: Optional. Resource name of an existing Dataproc Metastore service.Example: projects/[project_id]/locations/[region]/services/[service_id]
    • spark_history_server_config
      Type: STRUCT
      Provider name: sparkHistoryServerConfig
      Description: Optional. The Spark History Server configuration for the workload.
      • dataproc_cluster
        Type: STRING
        Provider name: dataprocCluster
        Description: Optional. Resource name of an existing Dataproc Cluster to act as a Spark History Server for the workload.Example: projects/[project_id]/regions/[region]/clusters/[cluster_name]

labels

Type: UNORDERED_LIST_STRING

name

Type: STRING
Provider name: name
Description: Output only. The resource name of the batch.

operation

Type: STRING
Provider name: operation
Description: Output only. The resource name of the operation associated with this batch.

organization_id

Type: STRING

parent

Type: STRING

project_id

Type: STRING

project_number

Type: STRING

pyspark_batch

Type: STRUCT
Provider name: pysparkBatch
Description: Optional. PySpark batch config.

  • archive_uris
    Type: UNORDERED_LIST_STRING
    Provider name: archiveUris
    Description: Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
  • args
    Type: UNORDERED_LIST_STRING
    Provider name: args
    Description: Optional. The arguments to pass to the driver. Do not include arguments that can be set as batch properties, such as –conf, since a collision can occur that causes an incorrect batch submission.
  • file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: fileUris
    Description: Optional. HCFS URIs of files to be placed in the working directory of each executor.
  • jar_file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: jarFileUris
    Description: Optional. HCFS URIs of jar files to add to the classpath of the Spark driver and tasks.
  • main_python_file_uri
    Type: STRING
    Provider name: mainPythonFileUri
    Description: Required. The HCFS URI of the main Python file to use as the Spark driver. Must be a .py file.
  • python_file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: pythonFileUris
    Description: Optional. HCFS file URIs of Python files to pass to the PySpark framework. Supported file types: .py, .egg, and .zip.

resource_name

Type: STRING

runtime_config

Type: STRUCT
Provider name: runtimeConfig
Description: Optional. Runtime configuration for the batch execution.

  • autotuning_config
    Type: STRUCT
    Provider name: autotuningConfig
    Description: Optional. Autotuning configuration of the workload.
    • scenarios
      Type: UNORDERED_LIST_STRING
      Provider name: scenarios
      Description: Optional. Scenarios for which tunings are applied.
  • cohort
    Type: STRING
    Provider name: cohort
    Description: Optional. Cohort identifier. Identifies families of the workloads having the same shape, e.g. daily ETL jobs.
  • container_image
    Type: STRING
    Provider name: containerImage
    Description: Optional. Optional custom container image for the job runtime environment. If not specified, a default container image will be used.
  • repository_config
    Type: STRUCT
    Provider name: repositoryConfig
    Description: Optional. Dependency repository configuration.
    • pypi_repository_config
      Type: STRUCT
      Provider name: pypiRepositoryConfig
      Description: Optional. Configuration for PyPi repository.
      • pypi_repository
        Type: STRING
        Provider name: pypiRepository
        Description: Optional. PyPi repository address
  • version
    Type: STRING
    Provider name: version
    Description: Optional. Version of the batch runtime.

runtime_info

Type: STRUCT
Provider name: runtimeInfo
Description: Output only. Runtime information about batch execution.

  • approximate_usage
    Type: STRUCT
    Provider name: approximateUsage
    Description: Output only. Approximate workload resource usage, calculated when the workload completes (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing)).Note: This metric calculation may change in the future, for example, to capture cumulative workload resource consumption during workload execution (see the Dataproc Serverless release notes (https://cloud.google.com/dataproc-serverless/docs/release-notes) for announcements, changes, fixes and other Dataproc developments).
    • accelerator_type
      Type: STRING
      Provider name: acceleratorType
      Description: Optional. Accelerator type being used, if any
    • milli_accelerator_seconds
      Type: INT64
      Provider name: milliAcceleratorSeconds
      Description: Optional. Accelerator usage in (milliAccelerator x seconds) (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing)).
    • milli_dcu_seconds
      Type: INT64
      Provider name: milliDcuSeconds
      Description: Optional. DCU (Dataproc Compute Units) usage in (milliDCU x seconds) (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing)).
    • milli_slot_seconds
      Type: INT64
      Provider name: milliSlotSeconds
      Description: Optional. Slot usage in (milliSlot x seconds).
    • shuffle_storage_gb_seconds
      Type: INT64
      Provider name: shuffleStorageGbSeconds
      Description: Optional. Shuffle storage usage in (GB x seconds) (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing)).
    • update_time
      Type: TIMESTAMP
      Provider name: updateTime
      Description: Optional. The timestamp of the usage metrics.
  • current_usage
    Type: STRUCT
    Provider name: currentUsage
    Description: Output only. Snapshot of current workload resource usage.
    • accelerator_type
      Type: STRING
      Provider name: acceleratorType
      Description: Optional. Accelerator type being used, if any
    • milli_accelerator
      Type: INT64
      Provider name: milliAccelerator
      Description: Optional. Milli (one-thousandth) accelerator. (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing))
    • milli_dcu
      Type: INT64
      Provider name: milliDcu
      Description: Optional. Milli (one-thousandth) Dataproc Compute Units (DCUs) (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing)).
    • milli_dcu_premium
      Type: INT64
      Provider name: milliDcuPremium
      Description: Optional. Milli (one-thousandth) Dataproc Compute Units (DCUs) charged at premium tier (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing)).
    • milli_slot
      Type: INT64
      Provider name: milliSlot
      Description: Optional. Milli (one-thousandth) Slot usage of the workload.
    • shuffle_storage_gb
      Type: INT64
      Provider name: shuffleStorageGb
      Description: Optional. Shuffle Storage in gigabytes (GB). (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing))
    • shuffle_storage_gb_premium
      Type: INT64
      Provider name: shuffleStorageGbPremium
      Description: Optional. Shuffle Storage in gigabytes (GB) charged at premium tier. (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing))
    • snapshot_time
      Type: TIMESTAMP
      Provider name: snapshotTime
      Description: Optional. The timestamp of the usage snapshot.
  • diagnostic_output_uri
    Type: STRING
    Provider name: diagnosticOutputUri
    Description: Output only. A URI pointing to the location of the diagnostics tarball.
  • output_uri
    Type: STRING
    Provider name: outputUri
    Description: Output only. A URI pointing to the location of the stdout and stderr of the workload.

spark_batch

Type: STRUCT
Provider name: sparkBatch
Description: Optional. Spark batch config.

  • archive_uris
    Type: UNORDERED_LIST_STRING
    Provider name: archiveUris
    Description: Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
  • args
    Type: UNORDERED_LIST_STRING
    Provider name: args
    Description: Optional. The arguments to pass to the driver. Do not include arguments that can be set as batch properties, such as –conf, since a collision can occur that causes an incorrect batch submission.
  • file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: fileUris
    Description: Optional. HCFS URIs of files to be placed in the working directory of each executor.
  • jar_file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: jarFileUris
    Description: Optional. HCFS URIs of jar files to add to the classpath of the Spark driver and tasks.
  • main_class
    Type: STRING
    Provider name: mainClass
    Description: Optional. The name of the driver main class. The jar file that contains the class must be in the classpath or specified in jar_file_uris.
  • main_jar_file_uri
    Type: STRING
    Provider name: mainJarFileUri
    Description: Optional. The HCFS URI of the jar file that contains the main class.

spark_r_batch

Type: STRUCT
Provider name: sparkRBatch
Description: Optional. SparkR batch config.

  • archive_uris
    Type: UNORDERED_LIST_STRING
    Provider name: archiveUris
    Description: Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
  • args
    Type: UNORDERED_LIST_STRING
    Provider name: args
    Description: Optional. The arguments to pass to the Spark driver. Do not include arguments that can be set as batch properties, such as –conf, since a collision can occur that causes an incorrect batch submission.
  • file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: fileUris
    Description: Optional. HCFS URIs of files to be placed in the working directory of each executor.
  • main_r_file_uri
    Type: STRING
    Provider name: mainRFileUri
    Description: Required. The HCFS URI of the main R file to use as the driver. Must be a .R or .r file.

spark_sql_batch

Type: STRUCT
Provider name: sparkSqlBatch
Description: Optional. SparkSql batch config.

  • jar_file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: jarFileUris
    Description: Optional. HCFS URIs of jar files to be added to the Spark CLASSPATH.
  • query_file_uri
    Type: STRING
    Provider name: queryFileUri
    Description: Required. The HCFS URI of the script that contains Spark SQL queries to execute.

state

Type: STRING
Provider name: state
Description: Output only. The state of the batch.
Possible values:

  • STATE_UNSPECIFIED - The batch state is unknown.
  • PENDING - The batch is created before running.
  • RUNNING - The batch is running.
  • CANCELLING - The batch is cancelling.
  • CANCELLED - The batch cancellation was successful.
  • SUCCEEDED - The batch completed successfully.
  • FAILED - The batch is no longer running due to an error.

state_history

Type: UNORDERED_LIST_STRUCT
Provider name: stateHistory
Description: Output only. Historical state information for the batch.

  • state
    Type: STRING
    Provider name: state
    Description: Output only. The state of the batch at this point in history.
    Possible values:
    • STATE_UNSPECIFIED - The batch state is unknown.
    • PENDING - The batch is created before running.
    • RUNNING - The batch is running.
    • CANCELLING - The batch is cancelling.
    • CANCELLED - The batch cancellation was successful.
    • SUCCEEDED - The batch completed successfully.
    • FAILED - The batch is no longer running due to an error.
  • state_message
    Type: STRING
    Provider name: stateMessage
    Description: Output only. Details about the state at this point in history.
  • state_start_time
    Type: TIMESTAMP
    Provider name: stateStartTime
    Description: Output only. The time when the batch entered the historical state.

state_message

Type: STRING
Provider name: stateMessage
Description: Output only. Batch state details, such as a failure description if the state is FAILED.

state_time

Type: TIMESTAMP
Provider name: stateTime
Description: Output only. The time when the batch entered a current state.

tags

Type: UNORDERED_LIST_STRING

uuid

Type: STRING
Provider name: uuid
Description: Output only. A batch UUID (Unique Universal Identifier). The service generates this value when it creates the batch.