---
title: Spark
description: Track failed task rates, shuffled bytes, and much more.
breadcrumbs: Docs > Integrations > Spark
---

# Spark
Supported OS Integration version7.6.0
{% alert level="warning" %}
[Data Observability: Jobs Monitoring](https://docs.datadoghq.com/data_observability/jobs_monitoring/) helps you observe, troubleshoot, and cost-optimize your Spark and Databricks jobs and clusters.This page only documents how to ingest Spark metrics and logs.
{% /alert %}



## Overview{% #overview %}

This check monitors [Spark](https://spark.apache.org/) through the Datadog Agent. Collect Spark metrics for:

- Drivers and executors: RDD blocks, memory used, disk used, duration, etc.
- RDDs: partition count, memory used, and disk used.
- Tasks: number of tasks active, skipped, failed, and total.
- Job state: number of jobs active, completed, skipped, and failed.

**Minimum Agent version:** 6.0.0

## Setup{% #setup %}

### Installation{% #installation %}

The Spark check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package. No additional installation is needed on your Mesos master (for Spark on Mesos), YARN ResourceManager (for Spark on YARN), or Spark master (for Spark Standalone).

### Configuration{% #configuration %}

{% tab title="Host" %}
#### Host{% #host %}

To configure this check for an Agent running on a host:

1. Edit the `spark.d/conf.yaml` file, in the `conf.d/` folder at the root of your [Agent's configuration directory](https://docs.datadoghq.com/agent/guide/agent-configuration-files/#agent-configuration-directory). The following parameters may require updating. See the [sample spark.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/spark/datadog_checks/spark/data/conf.yaml.example) for all available configuration options.

   ```yaml
   init_config:
   
   instances:
     - spark_url: http://localhost:8080 # Spark master web UI
       #   spark_url: http://<Mesos_master>:5050 # Mesos master web UI
       #   spark_url: http://<YARN_ResourceManager_address>:8088 # YARN ResourceManager address
   
       spark_cluster_mode: spark_yarn_mode # default
       #   spark_cluster_mode: spark_mesos_mode
       #   spark_cluster_mode: spark_yarn_mode
       #   spark_cluster_mode: spark_driver_mode
   
       # required; adds a tag 'cluster_name:<CLUSTER_NAME>' to all metrics
       cluster_name: "<CLUSTER_NAME>"
       # spark_pre_20_mode: true   # if you use Standalone Spark < v2.0
       # spark_proxy_enabled: true # if you have enabled the spark UI proxy
   ```

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands/#start-stop-and-restart-the-agent).

{% /tab %}

{% tab title="Containerized" %}
#### Containerized{% #containerized %}

For containerized environments, see the Autodiscovery Integration Templates, either for [Docker](https://docs.datadoghq.com/containers/docker/integrations/) or [Kubernetes](https://docs.datadoghq.com/agent/kubernetes/integrations/), for guidance on applying the parameters below.

| Parameter            | Value                                                             |
| -------------------- | ----------------------------------------------------------------- |
| `<INTEGRATION_NAME>` | `spark`                                                           |
| `<INIT_CONFIG>`      | blank or `{}`                                                     |
| `<INSTANCE_CONFIG>`  | `{"spark_url": "%%host%%:8080", "cluster_name":"<CLUSTER_NAME>"}` |

{% /tab %}

### Log collection{% #log-collection %}

1. Collecting logs is disabled by default in the Datadog Agent, enable it in your `datadog.yaml` file:

   ```yaml
    logs_enabled: true
   ```

1. Uncomment and edit the logs configuration block in your `spark.d/conf.yaml` file. Change the `type`, `path`, and `service` parameter values based on your environment. See the [sample spark.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/spark/datadog_checks/spark/data/conf.yaml.example) for all available configuration options.

   ```yaml
    logs:
      - type: file
        path: <LOG_FILE_PATH>
        source: spark
        service: <SERVICE_NAME>
        # To handle multi line that starts with yyyy-mm-dd use the following pattern
        # log_processing_rules:
        #   - type: multi_line
        #     pattern: \d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])
        #     name: new_log_start_with_date
   ```

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands/#start-stop-and-restart-the-agent).

To enable logs for Docker environments, see [Docker Log Collection](https://docs.datadoghq.com/agent/docker/log/).

### Validation{% #validation %}

Run the Agent's [status subcommand](https://docs.datadoghq.com/agent/guide/agent-commands/#agent-status-and-information) and look for `spark` under the Checks section.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **spark.driver.active\_tasks**(count)                                   | Number of active tasks in the driver*Shown as task*                                           |
| **spark.driver.completed\_tasks**(count)                                | Number of completed tasks in the driver*Shown as task*                                        |
| **spark.driver.disk\_used**(count)                                      | Amount of disk used in the driver*Shown as byte*                                              |
| **spark.driver.failed\_tasks**(count)                                   | Number of failed tasks in the driver*Shown as task*                                           |
| **spark.driver.max\_memory**(count)                                     | Maximum memory used in the driver*Shown as byte*                                              |
| **spark.driver.mem.total\_off\_heap\_storage**(count)                   | Total available off heap memory for storage*Shown as byte*                                    |
| **spark.driver.mem.total\_on\_heap\_storage**(count)                    | Total available on heap memory for storage*Shown as byte*                                     |
| **spark.driver.mem.used\_off\_heap\_storage**(count)                    | Used off heap memory currently for storage*Shown as byte*                                     |
| **spark.driver.mem.used\_on\_heap\_storage**(count)                     | Used on heap memory currently for storage*Shown as byte*                                      |
| **spark.driver.memory\_used**(count)                                    | Amount of memory used in the driver*Shown as byte*                                            |
| **spark.driver.peak\_mem.direct\_pool**(count)                          | Peak memory that the JVM is using for direct buffer pool*Shown as byte*                       |
| **spark.driver.peak\_mem.jvm\_heap\_memory**(count)                     | Peak memory usage of the heap that is used for object allocation*Shown as byte*               |
| **spark.driver.peak\_mem.jvm\_off\_heap\_memory**(count)                | Peak memory usage of non-heap memory that is used by the Java virtual machine*Shown as byte*  |
| **spark.driver.peak\_mem.major\_gc\_count**(count)                      | Total major GC count*Shown as byte*                                                           |
| **spark.driver.peak\_mem.major\_gc\_time**(count)                       | Elapsed total major GC time*Shown as millisecond*                                             |
| **spark.driver.peak\_mem.mapped\_pool**(count)                          | Peak memory that the JVM is using for mapped buffer pool*Shown as byte*                       |
| **spark.driver.peak\_mem.minor\_gc\_count**(count)                      | Total minor GC count*Shown as byte*                                                           |
| **spark.driver.peak\_mem.minor\_gc\_time**(count)                       | Elapsed total minor GC time*Shown as millisecond*                                             |
| **spark.driver.peak\_mem.off\_heap\_execution**(count)                  | Peak off heap execution memory in use*Shown as byte*                                          |
| **spark.driver.peak\_mem.off\_heap\_storage**(count)                    | Peak off heap storage memory in use*Shown as byte*                                            |
| **spark.driver.peak\_mem.off\_heap\_unified**(count)                    | Peak off heap memory (execution and storage)*Shown as byte*                                   |
| **spark.driver.peak\_mem.on\_heap\_execution**(count)                   | Peak on heap execution memory in use*Shown as byte*                                           |
| **spark.driver.peak\_mem.on\_heap\_storage**(count)                     | Peak on heap storage memory in use*Shown as byte*                                             |
| **spark.driver.peak\_mem.on\_heap\_unified**(count)                     | Peak on heap memory (execution and storage)*Shown as byte*                                    |
| **spark.driver.peak\_mem.process\_tree\_jvm**(count)                    | Virtual memory size*Shown as byte*                                                            |
| **spark.driver.peak\_mem.process\_tree\_jvm\_rss**(count)               | Resident Set Size: number of pages the process has in real memory*Shown as byte*              |
| **spark.driver.peak\_mem.process\_tree\_other**(count)                  | Virtual memory size for other kind of process*Shown as byte*                                  |
| **spark.driver.peak\_mem.process\_tree\_other\_rss**(count)             | Resident Set Size for other kind of process*Shown as byte*                                    |
| **spark.driver.peak\_mem.process\_tree\_python**(count)                 | Virtual memory size for Python*Shown as byte*                                                 |
| **spark.driver.peak\_mem.process\_tree\_python\_rss**(count)            | Resident Set Size for Python*Shown as byte*                                                   |
| **spark.driver.rdd\_blocks**(count)                                     | Number of RDD blocks in the driver*Shown as block*                                            |
| **spark.driver.total\_duration**(count)                                 | Time spent in the driver*Shown as millisecond*                                                |
| **spark.driver.total\_input\_bytes**(count)                             | Number of input bytes in the driver*Shown as byte*                                            |
| **spark.driver.total\_shuffle\_read**(count)                            | Number of bytes read during a shuffle in the driver*Shown as byte*                            |
| **spark.driver.total\_shuffle\_write**(count)                           | Number of shuffled bytes in the driver*Shown as byte*                                         |
| **spark.driver.total\_tasks**(count)                                    | Number of total tasks in the driver*Shown as task*                                            |
| **spark.executor.active\_tasks**(count)                                 | Number of active tasks in the application's executors*Shown as task*                          |
| **spark.executor.completed\_tasks**(count)                              | Number of completed tasks in the application's executors*Shown as task*                       |
| **spark.executor.count**(count)                                         | Number of executors*Shown as task*                                                            |
| **spark.executor.disk\_used**(count)                                    | Amount of disk space used by persisted RDDs in the application's executors*Shown as byte*     |
| **spark.executor.failed\_tasks**(count)                                 | Number of failed tasks in the application's executors*Shown as task*                          |
| **spark.executor.id.active\_tasks**(count)                              | Number of active tasks in this executor*Shown as task*                                        |
| **spark.executor.id.completed\_tasks**(count)                           | Number of completed tasks in this executor*Shown as task*                                     |
| **spark.executor.id.disk\_used**(count)                                 | Amount of disk space used by persisted RDDs in this executor*Shown as byte*                   |
| **spark.executor.id.failed\_tasks**(count)                              | Number of failed tasks in this executor*Shown as task*                                        |
| **spark.executor.id.max\_memory**(count)                                | Total amount of memory available for storage for this executor*Shown as byte*                 |
| **spark.executor.id.mem.total\_off\_heap\_storage**(count)              | Total available off heap memory for storage*Shown as byte*                                    |
| **spark.executor.id.mem.total\_on\_heap\_storage**(count)               | Total available on heap memory for storage*Shown as byte*                                     |
| **spark.executor.id.mem.used\_off\_heap\_storage**(count)               | Used off heap memory currently for storage*Shown as byte*                                     |
| **spark.executor.id.mem.used\_on\_heap\_storage**(count)                | Used on heap memory currently for storage*Shown as byte*                                      |
| **spark.executor.id.memory\_used**(count)                               | Amount of memory used for cached RDDs in this executor.*Shown as byte*                        |
| **spark.executor.id.peak\_mem.direct\_pool**(count)                     | Peak memory that the JVM is using for direct buffer pool*Shown as byte*                       |
| **spark.executor.id.peak\_mem.jvm\_heap\_memory**(count)                | Peak memory usage of the heap that is used for object allocation*Shown as byte*               |
| **spark.executor.id.peak\_mem.jvm\_off\_heap\_memory**(count)           | Peak memory usage of non-heap memory that is used by the Java virtual machine*Shown as byte*  |
| **spark.executor.id.peak\_mem.major\_gc\_count**(count)                 | Total major GC count*Shown as byte*                                                           |
| **spark.executor.id.peak\_mem.major\_gc\_time**(count)                  | Elapsed total major GC time*Shown as millisecond*                                             |
| **spark.executor.id.peak\_mem.mapped\_pool**(count)                     | Peak memory that the JVM is using for mapped buffer pool*Shown as byte*                       |
| **spark.executor.id.peak\_mem.minor\_gc\_count**(count)                 | Total minor GC count*Shown as byte*                                                           |
| **spark.executor.id.peak\_mem.minor\_gc\_time**(count)                  | Elapsed total minor GC time*Shown as millisecond*                                             |
| **spark.executor.id.peak\_mem.off\_heap\_execution**(count)             | Peak off heap execution memory in use*Shown as byte*                                          |
| **spark.executor.id.peak\_mem.off\_heap\_storage**(count)               | Peak off heap storage memory in use*Shown as byte*                                            |
| **spark.executor.id.peak\_mem.off\_heap\_unified**(count)               | Peak off heap memory (execution and storage)*Shown as byte*                                   |
| **spark.executor.id.peak\_mem.on\_heap\_execution**(count)              | Peak on heap execution memory in use*Shown as byte*                                           |
| **spark.executor.id.peak\_mem.on\_heap\_storage**(count)                | Peak on heap storage memory in use*Shown as byte*                                             |
| **spark.executor.id.peak\_mem.on\_heap\_unified**(count)                | Peak on heap memory (execution and storage)*Shown as byte*                                    |
| **spark.executor.id.peak\_mem.process\_tree\_jvm**(count)               | Virtual memory size*Shown as byte*                                                            |
| **spark.executor.id.peak\_mem.process\_tree\_jvm\_rss**(count)          | Resident Set Size: number of pages the process has in real memory*Shown as byte*              |
| **spark.executor.id.peak\_mem.process\_tree\_other**(count)             | Virtual memory size for other kind of process*Shown as byte*                                  |
| **spark.executor.id.peak\_mem.process\_tree\_other\_rss**(count)        | Resident Set Size for other kind of process*Shown as byte*                                    |
| **spark.executor.id.peak\_mem.process\_tree\_python**(count)            | Virtual memory size for Python*Shown as byte*                                                 |
| **spark.executor.id.peak\_mem.process\_tree\_python\_rss**(count)       | Resident Set Size for Python*Shown as byte*                                                   |
| **spark.executor.id.rdd\_blocks**(count)                                | Number of persisted RDD blocks in this executor*Shown as block*                               |
| **spark.executor.id.total\_duration**(count)                            | Time spent by the executor executing tasks*Shown as millisecond*                              |
| **spark.executor.id.total\_input\_bytes**(count)                        | Total number of input bytes in the executor*Shown as byte*                                    |
| **spark.executor.id.total\_shuffle\_read**(count)                       | Total number of bytes read during a shuffle in the executor*Shown as byte*                    |
| **spark.executor.id.total\_shuffle\_write**(count)                      | Total number of shuffled bytes in the executor*Shown as byte*                                 |
| **spark.executor.id.total\_tasks**(count)                               | Total number of tasks in this executor*Shown as task*                                         |
| **spark.executor.max\_memory**(count)                                   | Max memory across all executors working for a particular application*Shown as byte*           |
| **spark.executor.mem.total\_off\_heap\_storage**(count)                 | Total available off heap memory for storage*Shown as byte*                                    |
| **spark.executor.mem.total\_on\_heap\_storage**(count)                  | Total available on heap memory for storage*Shown as byte*                                     |
| **spark.executor.mem.used\_off\_heap\_storage**(count)                  | Used off heap memory currently for storage*Shown as byte*                                     |
| **spark.executor.mem.used\_on\_heap\_storage**(count)                   | Used on heap memory currently for storage*Shown as byte*                                      |
| **spark.executor.memory\_used**(count)                                  | Amount of memory used for cached RDDs in the application's executors*Shown as byte*           |
| **spark.executor.peak\_mem.direct\_pool**(count)                        | Peak memory that the JVM is using for direct buffer pool*Shown as byte*                       |
| **spark.executor.peak\_mem.jvm\_heap\_memory**(count)                   | Peak memory usage of the heap that is used for object allocation*Shown as byte*               |
| **spark.executor.peak\_mem.jvm\_off\_heap\_memory**(count)              | Peak memory usage of non-heap memory that is used by the Java virtual machine*Shown as byte*  |
| **spark.executor.peak\_mem.major\_gc\_count**(count)                    | Total major GC count*Shown as byte*                                                           |
| **spark.executor.peak\_mem.major\_gc\_time**(count)                     | Elapsed total major GC time*Shown as millisecond*                                             |
| **spark.executor.peak\_mem.mapped\_pool**(count)                        | Peak memory that the JVM is using for mapped buffer pool*Shown as byte*                       |
| **spark.executor.peak\_mem.minor\_gc\_count**(count)                    | Total minor GC count*Shown as byte*                                                           |
| **spark.executor.peak\_mem.minor\_gc\_time**(count)                     | Elapsed total minor GC time*Shown as millisecond*                                             |
| **spark.executor.peak\_mem.off\_heap\_execution**(count)                | Peak off heap execution memory in use*Shown as byte*                                          |
| **spark.executor.peak\_mem.off\_heap\_storage**(count)                  | Peak off heap storage memory in use*Shown as byte*                                            |
| **spark.executor.peak\_mem.off\_heap\_unified**(count)                  | Peak off heap memory (execution and storage)*Shown as byte*                                   |
| **spark.executor.peak\_mem.on\_heap\_execution**(count)                 | Peak on heap execution memory in use*Shown as byte*                                           |
| **spark.executor.peak\_mem.on\_heap\_storage**(count)                   | Peak on heap storage memory in use*Shown as byte*                                             |
| **spark.executor.peak\_mem.on\_heap\_unified**(count)                   | Peak on heap memory (execution and storage)*Shown as byte*                                    |
| **spark.executor.peak\_mem.process\_tree\_jvm**(count)                  | Virtual memory size*Shown as byte*                                                            |
| **spark.executor.peak\_mem.process\_tree\_jvm\_rss**(count)             | Resident Set Size: number of pages the process has in real memory*Shown as byte*              |
| **spark.executor.peak\_mem.process\_tree\_other**(count)                | Virtual memory size for other kind of process*Shown as byte*                                  |
| **spark.executor.peak\_mem.process\_tree\_other\_rss**(count)           | Resident Set Size for other kind of process*Shown as byte*                                    |
| **spark.executor.peak\_mem.process\_tree\_python**(count)               | Virtual memory size for Python*Shown as byte*                                                 |
| **spark.executor.peak\_mem.process\_tree\_python\_rss**(count)          | Resident Set Size for Python*Shown as byte*                                                   |
| **spark.executor.rdd\_blocks**(count)                                   | Number of persisted RDD blocks in the application's executors*Shown as block*                 |
| **spark.executor.total\_duration**(count)                               | Time spent by the application's executors executing tasks*Shown as millisecond*               |
| **spark.executor.total\_input\_bytes**(count)                           | Total number of input bytes in the application's executors*Shown as byte*                     |
| **spark.executor.total\_shuffle\_read**(count)                          | Total number of bytes read during a shuffle in the application's executors*Shown as byte*     |
| **spark.executor.total\_shuffle\_write**(count)                         | Total number of shuffled bytes in the application's executors*Shown as byte*                  |
| **spark.executor.total\_tasks**(count)                                  | Total number of tasks in the application's executors*Shown as task*                           |
| **spark.executor\_memory**(count)                                       | Maximum memory available for caching RDD blocks in the application's executors*Shown as byte* |
| **spark.job.count**(count)                                              | Number of jobs*Shown as task*                                                                 |
| **spark.job.num\_active\_stages**(count)                                | Number of active stages in the application*Shown as stage*                                    |
| **spark.job.num\_active\_tasks**(count)                                 | Number of active tasks in the application*Shown as task*                                      |
| **spark.job.num\_completed\_stages**(count)                             | Number of completed stages in the application*Shown as stage*                                 |
| **spark.job.num\_completed\_tasks**(count)                              | Number of completed tasks in the application*Shown as task*                                   |
| **spark.job.num\_failed\_stages**(count)                                | Number of failed stages in the application*Shown as stage*                                    |
| **spark.job.num\_failed\_tasks**(count)                                 | Number of failed tasks in the application*Shown as task*                                      |
| **spark.job.num\_skipped\_stages**(count)                               | Number of skipped stages in the application*Shown as stage*                                   |
| **spark.job.num\_skipped\_tasks**(count)                                | Number of skipped tasks in the application*Shown as task*                                     |
| **spark.job.num\_tasks**(count)                                         | Number of tasks in the application*Shown as task*                                             |
| **spark.rdd.count**(count)                                              | Number of RDDs                                                                                |
| **spark.rdd.disk\_used**(count)                                         | Amount of disk space used by persisted RDDs in the application*Shown as byte*                 |
| **spark.rdd.memory\_used**(count)                                       | Amount of memory used in the application's persisted RDDs*Shown as byte*                      |
| **spark.rdd.num\_cached\_partitions**(count)                            | Number of in-memory cached RDD partitions in the application                                  |
| **spark.rdd.num\_partitions**(count)                                    | Number of persisted RDD partitions in the application                                         |
| **spark.stage.count**(count)                                            | Number of stages*Shown as task*                                                               |
| **spark.stage.disk\_bytes\_spilled**(count)                             | Max size on disk of the spilled bytes in the application's stages*Shown as byte*              |
| **spark.stage.executor\_run\_time**(count)                              | Time spent by the executor in the application's stages*Shown as millisecond*                  |
| **spark.stage.input\_bytes**(count)                                     | Input bytes in the application's stages*Shown as byte*                                        |
| **spark.stage.input\_records**(count)                                   | Input records in the application's stages*Shown as record*                                    |
| **spark.stage.memory\_bytes\_spilled**(count)                           | Number of bytes spilled to disk in the application's stages*Shown as byte*                    |
| **spark.stage.num\_active\_tasks**(count)                               | Number of active tasks in the application's stages*Shown as task*                             |
| **spark.stage.num\_complete\_tasks**(count)                             | Number of complete tasks in the application's stages*Shown as task*                           |
| **spark.stage.num\_failed\_tasks**(count)                               | Number of failed tasks in the application's stages*Shown as task*                             |
| **spark.stage.output\_bytes**(count)                                    | Output bytes in the application's stages*Shown as byte*                                       |
| **spark.stage.output\_records**(count)                                  | Output records in the application's stages*Shown as record*                                   |
| **spark.stage.shuffle\_read\_bytes**(count)                             | Number of bytes read during a shuffle in the application's stages*Shown as byte*              |
| **spark.stage.shuffle\_read\_records**(count)                           | Number of records read during a shuffle in the application's stages*Shown as record*          |
| **spark.stage.shuffle\_write\_bytes**(count)                            | Number of shuffled bytes in the application's stages*Shown as byte*                           |
| **spark.stage.shuffle\_write\_records**(count)                          | Number of shuffled records in the application's stages*Shown as record*                       |
| **spark.streaming.statistics.avg\_input\_rate**(gauge)                  | Average streaming input data rate*Shown as byte*                                              |
| **spark.streaming.statistics.avg\_processing\_time**(gauge)             | Average application's streaming batch processing time*Shown as millisecond*                   |
| **spark.streaming.statistics.avg\_scheduling\_delay**(gauge)            | Average application's streaming batch scheduling delay*Shown as millisecond*                  |
| **spark.streaming.statistics.avg\_total\_delay**(gauge)                 | Average application's streaming batch total delay*Shown as millisecond*                       |
| **spark.streaming.statistics.batch\_duration**(gauge)                   | Application's streaming batch duration*Shown as millisecond*                                  |
| **spark.streaming.statistics.num\_active\_batches**(gauge)              | Number of active streaming batches*Shown as job*                                              |
| **spark.streaming.statistics.num\_active\_receivers**(gauge)            | Number of active streaming receivers*Shown as object*                                         |
| **spark.streaming.statistics.num\_inactive\_receivers**(gauge)          | Number of inactive streaming receivers*Shown as object*                                       |
| **spark.streaming.statistics.num\_processed\_records**(count)           | Number of processed streaming records*Shown as record*                                        |
| **spark.streaming.statistics.num\_received\_records**(count)            | Number of received streaming records*Shown as record*                                         |
| **spark.streaming.statistics.num\_receivers**(gauge)                    | Number of streaming application's receivers*Shown as object*                                  |
| **spark.streaming.statistics.num\_retained\_completed\_batches**(count) | Number of retained completed application's streaming batches*Shown as job*                    |
| **spark.streaming.statistics.num\_total\_completed\_batches**(count)    | Total number of completed application's streaming batches*Shown as job*                       |
| **spark.structured\_streaming.input\_rate**(gauge)                      | Average streaming input data rate*Shown as record*                                            |
| **spark.structured\_streaming.latency**(gauge)                          | Average latency for the structured streaming application.*Shown as millisecond*               |
| **spark.structured\_streaming.processing\_rate**(gauge)                 | Number of received streaming records per second*Shown as row*                                 |
| **spark.structured\_streaming.rows\_count**(gauge)                      | Count of rows.*Shown as row*                                                                  |
| **spark.structured\_streaming.used\_bytes**(gauge)                      | Number of bytes used in memory.*Shown as byte*                                                |

### Events{% #events %}

The Spark check does not include any events.

### Service Checks{% #service-checks %}

**spark.resource\_manager.can\_connect**

Returns `CRITICAL` if the Agent is unable to connect to the Spark instance's ResourceManager. Returns `OK` otherwise.

*Statuses: ok, critical*

**spark.application\_master.can\_connect**

Returns `CRITICAL` if the Agent is unable to connect to the Spark instance's ApplicationMaster. Returns `OK` otherwise.

*Statuses: ok, critical*

**spark.standalone\_master.can\_connect**

Returns `CRITICAL` if the Agent is unable to connect to the Spark instance's Standalone Master. Returns `OK` otherwise.

*Statuses: ok, critical*

**spark.mesos\_master.can\_connect**

Returns `CRITICAL` if the Agent is unable to connect to the Spark instance's Mesos Master. Returns `OK` otherwise.

*Statuses: ok, critical*

## Troubleshooting{% #troubleshooting %}

### Spark on Amazon EMR{% #spark-on-amazon-emr %}

To receive metrics for Spark on Amazon EMR, [use bootstrap actions](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-bootstrap.html) to install the [Datadog Agent](https://docs.datadoghq.com/agent/):

For Agent v5, create the `/etc/dd-agent/conf.d/spark.yaml` configuration file with the [proper values on each EMR node](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-connect-master-node-ssh.html).

For Agent v6/7, create the `/etc/datadog-agent/conf.d/spark.d/conf.yaml` configuration file with the [proper values on each EMR node](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-connect-master-node-ssh.html).

### Successful check but no metrics are collected{% #successful-check-but-no-metrics-are-collected %}

The Spark integration only collects metrics about running apps. If you have no currently running apps, the check will just submit a health check.

## Further Reading{% #further-reading %}

- [Troubleshoot and optimize data processing workloads with Data Jobs Monitoring](https://www.datadoghq.com/blog/data-jobs-monitoring/)
- [Observing the data lifecycle with Datadog](https://www.datadoghq.com/blog/data-observability-monitoring/)
- [Hadoop & Spark monitoring with Datadog](https://www.datadoghq.com/blog/monitoring-spark)
- [Monitoring Apache Spark applications running on Amazon EMR](https://www.datadoghq.com/blog/spark-emr-monitoring/)
