An EMR Cluster in AWS is a managed big data framework that allows you to process and analyze large datasets using open-source tools like Apache Spark, Hadoop, and Hive. It consists of a set of EC2 instances configured as master, core, and task nodes, which work together to run distributed data processing workloads. The service automatically handles provisioning, configuration, scaling, and cluster management, making it easier to run complex analytics and machine learning jobs without needing to manage the underlying infrastructure manually.

aws.emr_cluster

Fields

TitleIDTypeData TypeDescription
_keycorestring
account_idcorestring
applicationscorejsonThe applications installed on this cluster.
auto_scaling_rolecorestringAn IAM role for automatic scaling policies. The default role is EMR_AutoScaling_DefaultRole. The IAM role provides permissions that the automatic scaling feature requires to launch and terminate Amazon EC2 instances in an instance group.
auto_terminatecoreboolSpecifies whether the cluster should terminate after completing all steps.
cluster_arncorestringThe Amazon Resource Name of the cluster.
custom_ami_idcorestringAvailable only in Amazon EMR releases 5.7.0 and later. The ID of a custom Amazon EBS-backed Linux AMI if the cluster uses a custom AMI.
ebs_root_volume_iopscoreint64The IOPS, of the Amazon EBS root device volume of the Linux AMI that is used for each Amazon EC2 instance. Available in Amazon EMR releases 6.15.0 and later.
ebs_root_volume_sizecoreint64The size, in GiB, of the Amazon EBS root device volume of the Linux AMI that is used for each Amazon EC2 instance. Available in Amazon EMR releases 4.x and later.
ebs_root_volume_throughputcoreint64The throughput, in MiB/s, of the Amazon EBS root device volume of the Linux AMI that is used for each Amazon EC2 instance. Available in Amazon EMR releases 6.15.0 and later.
ec2_instance_attributescorejsonProvides information about the Amazon EC2 instances in a cluster grouped by category. For example, key name, subnet ID, IAM instance profile, and so on.
idcorestringThe unique identifier for the cluster.
instance_collection_typecorestringThe instance fleet configuration is available only in Amazon EMR releases 4.8.0 and later, excluding 5.0.x versions. The instance group configuration of the cluster. A value of INSTANCE_GROUP indicates a uniform instance group configuration. A value of INSTANCE_FLEET indicates an instance fleets configuration.
kerberos_attributescorejsonAttributes for Kerberos configuration when Kerberos authentication is enabled using a security configuration. For more information see Use Kerberos Authentication in the Amazon EMR Management Guide.
log_encryption_kms_key_idcorestringThe KMS key used for encrypting log files. This attribute is only available with Amazon EMR 5.30.0 and later, excluding Amazon EMR 6.0.0.
log_uricorestringThe path to the Amazon S3 location where logs for this cluster are stored.
master_public_dns_namecorestringThe DNS name of the master node. If the cluster is on a private subnet, this is the private DNS name. On a public subnet, this is the public DNS name.
namecorestringThe name of the cluster. This parameter can't contain the characters <, >, $, |, or ` (backtick).
normalized_instance_hourscoreint64An approximation of the cost of the cluster, represented in m1.small/hours. This value is incremented one time for every hour an m1.small instance runs. Larger instances are weighted more, so an Amazon EC2 instance that is roughly four times more expensive would result in the normalized instance hours being incremented by four. This result is only an approximation and does not reflect the actual billing rate.
os_release_labelcorestringThe Amazon Linux release specified in a cluster launch RunJobFlow request. If no Amazon Linux release was specified, the default Amazon Linux release is shown in the response.
outpost_arncorestringThe Amazon Resource Name (ARN) of the Outpost where the cluster is launched.
placement_groupscorejsonPlacement group configured for an Amazon EMR cluster.
release_labelcorestringThe Amazon EMR release label, which determines the version of open-source application packages installed on the cluster. Release labels are in the form emr-x.x.x, where x.x.x is an Amazon EMR release version such as emr-5.14.0. For more information about Amazon EMR release versions and included application versions and features, see https://docs.aws.amazon.com/emr/latest/ReleaseGuide/. The release label applies only to Amazon EMR releases version 4.0 and later. Earlier versions use AmiVersion.
repo_upgrade_on_bootcorestringApplies only when CustomAmiID is used. Specifies the type of updates that the Amazon Linux AMI package repositories apply when an instance boots using the AMI.
requested_ami_versioncorestringThe AMI version requested for this cluster.
running_ami_versioncorestringThe AMI version running on this cluster.
scale_down_behaviorcorestringThe way that individual Amazon EC2 instances terminate when an automatic scale-in activity occurs or an instance group is resized. TERMINATE_AT_INSTANCE_HOUR indicates that Amazon EMR terminates nodes at the instance-hour boundary, regardless of when the request to terminate the instance was submitted. This option is only available with Amazon EMR 5.1.0 and later and is the default for clusters created using that version. TERMINATE_AT_TASK_COMPLETION indicates that Amazon EMR adds nodes to a deny list and drains tasks from nodes before terminating the Amazon EC2 instances, regardless of the instance-hour boundary. With either behavior, Amazon EMR removes the least active nodes first and blocks instance termination if it could lead to HDFS corruption. TERMINATE_AT_TASK_COMPLETION is available only in Amazon EMR releases 4.1.0 and later, and is the default for versions of Amazon EMR earlier than 5.1.0.
security_configurationcorestringThe name of the security configuration applied to the cluster.
service_rolecorestringThe IAM role that Amazon EMR assumes in order to access Amazon Web Services resources on your behalf.
statuscorejsonThe current status details about the cluster.
step_concurrency_levelcoreint64Specifies the number of steps that can be executed concurrently.
tagscorehstore
termination_protectedcoreboolIndicates whether Amazon EMR will lock the cluster to prevent the Amazon EC2 instances from being terminated by an API call or user intervention, or in the event of a cluster error.
unhealthy_node_replacementcoreboolIndicates whether Amazon EMR should gracefully replace Amazon EC2 core instances that have degraded within the cluster.
visible_to_all_userscoreboolIndicates whether the cluster is visible to IAM principals in the Amazon Web Services account associated with the cluster. When true, IAM principals in the Amazon Web Services account can perform Amazon EMR cluster actions on the cluster that their IAM policies allow. When false, only the IAM principal that created the cluster and the Amazon Web Services account root user can perform Amazon EMR actions, regardless of IAM permissions policies attached to other IAM principals. The default value is true if a value is not provided when creating a cluster using the Amazon EMR API RunJobFlow command, the CLI create-cluster command, or the Amazon Web Services Management Console.