Enable Data Jobs Monitoring for Spark on Amazon EMR

Data Jobs Monitoring gives visibility into the performance and reliability of Apache Spark applications on Amazon EMR.

Requirements

Amazon EMR Release 6.0.1 or later is required.

Setup

Follow these steps to enable Data Jobs Monitoring for Amazon EMR.

  1. Store your Datadog API key in AWS Secrets Manager (Recommended).
  2. Grant permissions to EMR EC2 instance profile.
  3. Create and configure your EMR cluster.
  4. Specify service tagging per Spark application.
  1. Take note of your Datadog API key.
  2. In AWS Secrets Manager, choose Store a new secret.
    • Under Secret type, select Other type of secret.
    • Under Key/value pairs, add your Datadog API key as a key-value pair, where the key is dd_api_key.
      AWS Secrets Manager, Store a new secret. A section titled 'Key/value pairs'. On the left, a text box containing 'dd_api_key'. On the right, a text box containing a redacted API key.
    • Then, click Next.
  3. On the Configure secret page, enter a Secret name. You can use datadog/dd_api_key. Then, click Next.
  4. On the Configure rotation page, you can optionally turn on automatic rotation. Then, click Next.
  5. On the Review page, review your secret details. Then, click Store.
  6. In AWS Secrets Manager, open the secret you created. Take note of the Secret ARN.

Grant permissions to EMR EC2 instance profile

EMR EC2 instance profile is a IAM role assigned to every EC2 instance in an Amazon EMR cluster when the instance launches. Follow the Amazon guide to prepare this role based on your application’s need to interact with other AWS services. The following additional permissions may be required for Data Jobs Monitoring.

Permissions to get secret value using AWS Secrets Manager

These permissions are required if you are using AWS Secrets Manager.
  1. In your AWS IAM console, click on Access management > Roles in the left navigation bar.
  2. Click on the IAM role you plan to use as the instance profile for your EMR cluster.
  3. On the next page, under the Permissions tab, find the Permissions policies section. Click on Add permissions > Create inline policy.
  4. On the Specify permissions page, find the Select a service section. Under Service, select Secrets Manager.
    AWS IAM console, Specify Permissions page.
    • Then, under Actions allowed, select GetSecretValue. This is a Read action.
    • Under Resources, select Specific. Then, next to Secret, click on Add ARNs and add the ARN of the secret you created in the first step on this page.
    • Click Next.
  5. On the next page, give your policy a name. Then, click Create policy.

Permissions to describe cluster

These permissions are required if you are NOT using the default role, EMR_EC2_DefaultRole.
  1. In your AWS IAM console, click on Access management > Roles in the left navigation bar.
  2. Click on the IAM role you plan to use as the instance profile for your EMR cluster.
  3. On the next page, under the Permissions tab, find the Permissions policies section. Click on Add permissions > Create inline policy.
  4. On the Specify permissions page, toggle on the JSON tab.
    • Then, copy and paste the following policy into the Policy editor
    {
       "Version": "2012-10-17",
       "Statement": [
          {
             "Effect": "Allow",
             "Action": [
                "elasticmapreduce:ListBootstrapActions",
                "elasticmapreduce:ListInstanceFleets",
                "elasticmapreduce:DescribeCluster",
                "elasticmapreduce:ListInstanceGroups"
             ],
             "Resource": [
                "*"
             ]
          }
       ]
    }
    
    • Click Next.
  5. On the next page, give your policy a name. Then, click Create policy.

Take note of the name of the IAM role you plan to use as the instance profile for your EMR cluster.

Create and configure your EMR cluster

When you create a new EMR cluster in the Amazon EMR console, add a bootstrap action on the Create Cluster page:

  1. Save the following script to an S3 bucket that your EMR cluster can read. Take note of the path to this script.

    #!/bin/bash
    
    # Set required parameter DD_SITE
    DD_SITE=
    
    # Set required parameter DD_API_KEY with Datadog API key.
    # The commands below assumes the API key is stored in AWS Secrets Manager, with the secret name as datadog/dd_api_key and the key as dd_api_key.
    # IMPORTANT: Modify if you choose to manage and retrieve your secret differently.
    SECRET_NAME=datadog/dd_api_key
    DD_API_KEY=$(aws secretsmanager get-secret-value --secret-id $SECRET_NAME | jq -r .SecretString | jq -r '.["dd_api_key"]')
    
    # Optional parameters
    # Uncomment the following line to allow adding init script logs when reporting a failure back to Datadog. A failure is reported when the init script fails to start the Datadog Agent successfully.
    # export DD_DJM_ADD_LOGS_TO_FAILURE_REPORT=true
    
    # Download and run the latest init script
    DD_SITE=$DD_SITE DD_API_KEY=$DD_API_KEY bash -c "$(curl -L https://dd-data-jobs-monitoring-setup.s3.amazonaws.com/scripts/emr/emr_init_latest.sh)" || true
    

    The script above sets the required parameters, and downloads and runs the latest init script for Data Jobs Monitoring in EMR. If you want to pin your script to a specific version, you can replace the file name in the URL with emr_init_1.4.0.sh to use the last stable version.

  2. On the Create Cluster page, find the Bootstrap actions section. Click Add to bring up the Add bootstrap action dialog.

    Amazon EMR console, Create Cluster, Add Bootstrap Action dialog. Text fields for name, script location, and arguments.

    • For Name, give your bootstrap action a name. You can use datadog_agent.
    • For Script location, enter the path to where you stored the init script in S3.
    • Click Add bootstrap action.
  3. On the Create Cluster page, find the Identity and Access Management (IAM) roles section. For instance profile dropdown, select the IAM role you have granted permissions in Grant permissions to EMR EC2 instance profile.

When your cluster is created, this bootstrap action installs the Datadog Agent and downloads the Java tracer on each node of the cluster.

Specify service tagging per Spark application

Tagging enables you to better filter, aggregate, and compare your telemetry in Datadog. You can configure tags by passing -Ddd.service, -Ddd.env, -Ddd.version, and -Ddd.tags options to your Spark driver and executor extraJavaOptions properties.

In Datadog, each job’s name corresponds to the value you set for -Ddd.service.

spark-submit \
 --conf spark.driver.extraJavaOptions="-Ddd.service=<JOB_NAME> -Ddd.env=<ENV> -Ddd.version=<VERSION> -Ddd.tags=<KEY_1>:<VALUE_1>,<KEY_2:VALUE_2>" \
 --conf spark.executor.extraJavaOptions="-Ddd.service=<JOB_NAME> -Ddd.env=<ENV> -Ddd.version=<VERSION> -Ddd.tags=<KEY_1>:<VALUE_1>,<KEY_2:VALUE_2>" \
 application.jar

Validation

In Datadog, view the Data Jobs Monitoring page to see a list of all your data processing jobs.

Troubleshooting

If you don’t see any data in DJM after installing the product, follow those steps.

The init script installs the Datadog Agent. To make sure it is properly installed, ssh into the cluster and run the Agent status command:

sudo datadog-agent status

If the Agent is not installed, view the installation logs located in /tmp/datadog-djm-init.log.

If you need further assistance from Datadog support, add the following environment variable to the init script. This ensures that logs are sent to Datadog when a failure occurs.

export DD_DJM_ADD_LOGS_TO_FAILURE_REPORT=true

Advanced Configuration

Tag spans at runtime

You can set tags on Spark spans at runtime. These tags are applied only to spans that start after the tag is added.

// Add tag for all next Spark computations
sparkContext.setLocalProperty("spark.datadog.tags.key", "value")
spark.read.parquet(...)

To remove a runtime tag:

// Remove tag for all next Spark computations
sparkContext.setLocalProperty("spark.datadog.tags.key", null)

Further Reading

Additional helpful documentation, links, and articles: