Data Jobs Monitoring for Spark on Kubernetes
This product is not supported for your selected 
Datadog site. (
).
Data Jobs Monitoring gives visibility into the performance and reliability of Apache Spark applications on Kubernetes.
Setup
Follow these steps to enable Data Jobs Monitoring for Spark on Kubernetes.
- Install the Datadog Agent on your Kubernetes cluster.
- Inject Spark instrumentation.
Install the Datadog Agent on your Kubernetes cluster
If you have already installed the Datadog Agent on your Kubernetes cluster, ensure that you have enabled the Datadog Admission Controller. You can then go to the next step, Inject Spark instrumentation.
You can install the Datadog Agent using the Datadog Operator or Helm.
Prerequisites
Installation
- Install the Datadog Operator by running the following commands: - helm repo add datadog https://helm.datadoghq.com
helm install my-datadog-operator datadog/datadog-operator
 
- Create a Kubernetes Secret to store your Datadog API key. - kubectl create secret generic datadog-secret --from-literal api-key=<DATADOG_API_KEY>
 
- Replace - <DATADOG_API_KEY>with your Datadog API key.
 
- Create a file, - datadog-agent.yaml, that contains the following configuration:
 - kind: DatadogAgent
apiVersion: datadoghq.com/v2alpha1
metadata:
  name: datadog
spec:
  features:
    apm:
      enabled: true
      hostPortConfig:
        enabled: true
        hostPort: 8126
    admissionController:
      enabled: true
      mutateUnlabelled: false
    # (Optional) Uncomment the next three lines to enable logs collection
    # logCollection:
      # enabled: true
      # containerCollectAll: true
  global:
    site: <DATADOG_SITE>
    credentials:
      apiSecret:
        secretName: datadog-secret
        keyName: api-key
  override:
    nodeAgent:
      image:
        tag: <DATADOG_AGENT_VERSION>
 
- Replace - <DATADOG_SITE>with your Datadog site. Your site is - Replace - <DATADOG_AGENT_VERSION>with version- 7.55.0or later.
 - Optional: Uncomment the - logCollectionsection to start collecting application logs which will be correlated to Spark job run traces. Once enabled, logs are collected from all discovered containers by default. See the Kubernetes log collection documentation for more details on the setup process.
 
- Deploy the Datadog Agent with the above configuration file: - kubectl apply -f /path/to/your/datadog-agent.yaml
 
- Create a Kubernetes Secret to store your Datadog API key. - kubectl create secret generic datadog-secret --from-literal api-key=<DATADOG_API_KEY>
 
- Replace - <DATADOG_API_KEY>with your Datadog API key.
 
- Create a file, - datadog-values.yaml, that contains the following configuration:
 - datadog:
  apiKeyExistingSecret: datadog-secret
  site: <DATADOG_SITE>
  apm:
    portEnabled: true
    port: 8126
  # (Optional) Uncomment the next three lines to enable logs collection
  # logs:
    # enabled: true
    # containerCollectAll: true
agents:
  image:
    tag: <DATADOG_AGENT_VERSION>
clusterAgent:
  admissionController:
    enabled: true
    muteUnlabelled: false
 
- Replace - <DATADOG_SITE>with your Datadog site. Your site is - Replace - <DATADOG_AGENT_VERSION>with version- 7.55.0or later.
 - Optional: Uncomment the logs section to start collecting application logs which will be correlated to Spark job run traces. Once enabled, logs are collected from all discovered containers by default. See the Kubernetes log collection documentation for more details on the setup process. 
- Run the following command: - helm install <RELEASE_NAME> \
 -f datadog-values.yaml \
 --set targetSystem=<TARGET_SYSTEM> \
 datadog/datadog
 
- Replace - <RELEASE_NAME>with your release name. For example,- datadog-agent.
 
- Replace - <TARGET_SYSTEM>with the name of your OS. For example,- linuxor- windows.
 
 
Inject Spark instrumentation
When you run your Spark job, use the following configurations:
- spark.kubernetes.{driver,executor}.label.admission.datadoghq.com/enabled(Required)
- true
- spark.kubernetes.{driver,executor}.annotation.admission.datadoghq.com/java-lib.version(Required)
- latest
- spark.{driver,executor}.extraJavaOptions
- -Ddd.data.jobs.enabled=true(Required)
- true
- -Ddd.service(Optional)
- Your service name. Because this option sets the job name in Datadog, it is recommended that you use a human-readable name.
- -Ddd.env(Optional)
- Your environment, such as prodordev.
- -Ddd.version(Optional)
- Your version.
- -Ddd.tags(Optional)
- Other tags you wish to add, in the format <KEY_1>:<VALUE_1>,<KEY_2:VALUE_2>.
 
Example: spark-submit
spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master k8s://<CLUSTER_ENDPOINT> \
  --conf spark.kubernetes.container.image=895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest \
  --deploy-mode cluster \
  --conf spark.kubernetes.namespace=<NAMESPACE> \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=<SERVICE_ACCOUNT> \
  --conf spark.kubernetes.authenticate.executor.serviceAccountName=<SERVICE_ACCOUNT> \
  --conf spark.kubernetes.driver.label.admission.datadoghq.com/enabled=true \
  --conf spark.kubernetes.executor.label.admission.datadoghq.com/enabled=true \
  --conf spark.kubernetes.driver.annotation.admission.datadoghq.com/java-lib.version=latest \
  --conf spark.kubernetes.executor.annotation.admission.datadoghq.com/java-lib.version=latest \
  --conf spark.driver.extraJavaOptions="-Ddd.data.jobs.enabled=true -Ddd.service=<JOB_NAME> -Ddd.env=<ENV> -Ddd.version=<VERSION> -Ddd.tags=<KEY_1>:<VALUE_1>,<KEY_2:VALUE_2>" \
  --conf spark.executor.extraJavaOptions="-Ddd.data.jobs.enabled=true -Ddd.service=<JOB_NAME> -Ddd.env=<ENV> -Ddd.version=<VERSION> -Ddd.tags=<KEY_1>:<VALUE_1>,<KEY_2:VALUE_2>" \
  local:///usr/lib/spark/examples/jars/spark-examples.jar 20
Example: AWS start-job-run
aws emr-containers start-job-run \
--virtual-cluster-id <EMR_CLUSTER_ID> \
--name myjob \
--execution-role-arn <EXECUTION_ROLE_ARN> \
--release-label emr-6.10.0-latest \
--job-driver '{
  "sparkSubmitJobDriver": {
    "entryPoint": "s3://BUCKET/spark-examples.jar",
    "sparkSubmitParameters": "--class <MAIN_CLASS> --conf spark.kubernetes.driver.label.admission.datadoghq.com/enabled=true --conf spark.kubernetes.executor.label.admission.datadoghq.com/enabled=true --conf spark.kubernetes.driver.annotation.admission.datadoghq.com/java-lib.version=latest --conf spark.kubernetes.executor.annotation.admission.datadoghq.com/java-lib.version=latest --conf spark.driver.extraJavaOptions=\"-Ddd.data.jobs.enabled=true -Ddd.service=<JOB_NAME> -Ddd.env=<ENV> -Ddd.version=<VERSION> -Ddd.tags=<KEY_1>:<VALUE_1>,<KEY_2:VALUE_2>\"  --conf spark.executor.extraJavaOptions=\"-Ddd.data.jobs.enabled=true -Ddd.service=<JOB_NAME> -Ddd.env=<ENV> -Ddd.version=<VERSION> -Ddd.tags=<KEY_1>:<VALUE_1>,<KEY_2:VALUE_2>\""
  }
}
Validation
In Datadog, view the Data Jobs Monitoring page to see a list of all your data processing jobs.
Advanced Configuration
Tag spans at runtime
You can set tags on Spark spans at runtime. These tags are applied only to spans that start after the tag is added.
// Add tag for all next Spark computations
sparkContext.setLocalProperty("spark.datadog.tags.key", "value")
spark.read.parquet(...)
To remove a runtime tag:
// Remove tag for all next Spark computations
sparkContext.setLocalProperty("spark.datadog.tags.key", null)
Further Reading
Additional helpful documentation, links, and articles: