---
title: Yarn
description: Collect cluster-wide health metrics and track application progress.
breadcrumbs: Docs > Integrations > Yarn
---

# Yarn
Supported OS Integration version8.3.0


## Overview{% #overview %}

This check collects metrics from your YARN ResourceManager, including (but not limited to):

- Cluster-wide metrics, such as number of running apps, running containers, unhealthy nodes, and more.
- Per-application metrics, such as app progress, elapsed running time, running containers, memory use, and more.
- Node metrics, such as available vCores, time of last health update, and more.

### Deprecation notice{% #deprecation-notice %}

`yarn.apps.<METRIC>` metrics are deprecated in favor of `yarn.apps.<METRIC>_gauge` metrics because `yarn.apps` metrics are incorrectly reported as a `RATE` instead of a `GAUGE`.

**Minimum Agent version:** 6.0.0

## Setup{% #setup %}

### Installation{% #installation %}

The YARN check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package, so you don't need to install anything else on your YARN ResourceManager.

### Configuration{% #configuration %}

{% tab title="Host" %}
#### Host{% #host %}

To configure this check for an Agent running on a host:

1. Edit the `yarn.d/conf.yaml` file in the `conf.d/` folder at the root of your [Agent's configuration directory](https://docs.datadoghq.com/agent/guide/agent-configuration-files/#agent-configuration-directory).

   ```yaml
   init_config:
   
   instances:
     ## @param resourcemanager_uri - string - required
     ## The YARN check retrieves metrics from YARNS's ResourceManager. This
     ## check must be run from the Master Node and the ResourceManager URI must
     ## be specified below. The ResourceManager URI is composed of the
     ## ResourceManager's hostname and port.
     ## The ResourceManager hostname can be found in the yarn-site.xml conf file
     ## under the property yarn.resourcemanager.address
     ##
     ## The ResourceManager port can be found in the yarn-site.xml conf file under
     ## the property yarn.resourcemanager.webapp.address
     #
     - resourcemanager_uri: http://localhost:8088
   
       ## @param cluster_name - string - required - default: default_cluster
       ## A friendly name for the cluster.
       #
       cluster_name: default_cluster
   ```

See the [example check configuration](https://github.com/DataDog/integrations-core/blob/master/yarn/datadog_checks/yarn/data/conf.yaml.example) for a comprehensive list and description of all check options.

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands/#start-stop-and-restart-the-agent) to start sending YARN metrics to Datadog.

{% /tab %}

{% tab title="Containerized" %}
#### Containerized{% #containerized %}

For containerized environments, see the [Autodiscovery Integration Templates](https://docs.datadoghq.com/agent/kubernetes/integrations/) for guidance on applying the parameters below.

| Parameter            | Value                                                                                   |
| -------------------- | --------------------------------------------------------------------------------------- |
| `<INTEGRATION_NAME>` | `yarn`                                                                                  |
| `<INIT_CONFIG>`      | blank or `{}`                                                                           |
| `<INSTANCE_CONFIG>`  | `{"resourcemanager_uri": "http://%%host%%:%%port%%", "cluster_name": "<CLUSTER_NAME>"}` |

##### Log collection{% #log-collection %}

1. Collecting logs is disabled by default in the Datadog Agent, enable it in your `datadog.yaml` file:

   ```yaml
   logs_enabled: true
   ```

1. Uncomment and edit the logs configuration block in your `yarn.d/conf.yaml` file. Change the `type`, `path`, and `service` parameter values based on your environment. See the [sample yarn.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/yarn/datadog_checks/yarn/data/conf.yaml.example) for all available configuration options.

   ```yaml
   logs:
     - type: file
       path: <LOG_FILE_PATH>
       source: yarn
       service: <SERVICE_NAME>
       # To handle multi line that starts with yyyy-mm-dd use the following pattern
       # log_processing_rules:
       #   - type: multi_line
       #     pattern: \d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2},\d{3}
       #     name: new_log_start_with_date
   ```

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands/#start-stop-and-restart-the-agent).

To enable logs for Docker environments, see [Docker Log Collection](https://docs.datadoghq.com/agent/docker/log/).
{% /tab %}

### Validation{% #validation %}

Run the [Agent's status subcommand](https://docs.datadoghq.com/agent/guide/agent-commands/#agent-status-and-information) and look for `yarn` under the Checks section.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **yarn.apps.allocated\_mb**(rate)                          | Deprecated use yarn.apps.allocated_mb_gauge instead*Shown as mebibyte*                                                                               |
| **yarn.apps.allocated\_mb\_gauge**(gauge)                  | The sum of memory in MB allocated to the applications running containers*Shown as mebibyte*                                                          |
| **yarn.apps.allocated\_vcores**(rate)                      | Deprecated use yarn.apps.allocated_vcores_gauge instead*Shown as core*                                                                               |
| **yarn.apps.allocated\_vcores\_gauge**(gauge)              | The sum of virtual cores allocated to the applications running containers*Shown as core*                                                             |
| **yarn.apps.elapsed\_time**(rate)                          | Deprecated use yarn.apps.elapsed_time_gauge instead*Shown as second*                                                                                 |
| **yarn.apps.elapsed\_time\_gauge**(gauge)                  | The elapsed time since the application started (in ms)*Shown as millisecond*                                                                         |
| **yarn.apps.finished\_time**(rate)                         | Deprecated use yarn.apps.finished_time_gauge instead*Shown as second*                                                                                |
| **yarn.apps.finished\_time\_gauge**(gauge)                 | The time in which the application finished (in ms since epoch)*Shown as millisecond*                                                                 |
| **yarn.apps.memory\_seconds**(rate)                        | Deprecated use yarn.apps.memory_seconds_gauge instead*Shown as second*                                                                               |
| **yarn.apps.memory\_seconds\_gauge**(gauge)                | The amount of memory the application has allocated (megabyte-seconds)*Shown as mebibyte*                                                             |
| **yarn.apps.progress**(rate)                               | Deprecated use yarn.apps.progress_gauge instead*Shown as percent*                                                                                    |
| **yarn.apps.progress\_gauge**(gauge)                       | The progress of the application, displayed as 0, 10, & 100, which represent the 3 states: hasn't started, in progress, & completed*Shown as percent* |
| **yarn.apps.running\_containers**(rate)                    | Deprecated use yarn.apps.running_containers_gauge instead                                                                                            |
| **yarn.apps.running\_containers\_gauge**(gauge)            | The number of containers currently running for the application*Shown as container*                                                                   |
| **yarn.apps.started\_time**(rate)                          | Deprecated use yarn.apps.started_time_gauge instead*Shown as second*                                                                                 |
| **yarn.apps.started\_time\_gauge**(gauge)                  | The time in which application started (in ms since epoch)*Shown as millisecond*                                                                      |
| **yarn.apps.vcore\_seconds**(rate)                         | Deprecated use yarn.apps.vcore_seconds_gauge instead*Shown as second*                                                                                |
| **yarn.apps.vcore\_seconds\_gauge**(gauge)                 | The amount of CPU resources the application has allocated (virtual core-seconds)*Shown as core*                                                      |
| **yarn.metrics.active\_nodes**(gauge)                      | The number of active nodes*Shown as node*                                                                                                            |
| **yarn.metrics.allocated\_mb**(gauge)                      | The amount of allocated memory*Shown as mebibyte*                                                                                                    |
| **yarn.metrics.allocated\_virtual\_cores**(gauge)          | The number of allocated virtual cores*Shown as core*                                                                                                 |
| **yarn.metrics.apps\_completed**(gauge)                    | The number of completed apps*Shown as task*                                                                                                          |
| **yarn.metrics.apps\_failed**(gauge)                       | The number of failed apps*Shown as task*                                                                                                             |
| **yarn.metrics.apps\_killed**(gauge)                       | The number of killed apps*Shown as task*                                                                                                             |
| **yarn.metrics.apps\_pending**(gauge)                      | The number of pending apps*Shown as task*                                                                                                            |
| **yarn.metrics.apps\_running**(gauge)                      | The number of running apps*Shown as task*                                                                                                            |
| **yarn.metrics.apps\_submitted**(gauge)                    | The number of submitted apps*Shown as task*                                                                                                          |
| **yarn.metrics.available\_mb**(gauge)                      | The amount of available memory*Shown as mebibyte*                                                                                                    |
| **yarn.metrics.available\_virtual\_cores**(gauge)          | The number of available virtual cores*Shown as core*                                                                                                 |
| **yarn.metrics.containers\_allocated**(gauge)              | The number of containers allocated                                                                                                                   |
| **yarn.metrics.containers\_pending**(gauge)                | The number of containers pending                                                                                                                     |
| **yarn.metrics.containers\_reserved**(gauge)               | The number of containers reserved                                                                                                                    |
| **yarn.metrics.decommissioned\_nodes**(gauge)              | The number of decommissioned nodes*Shown as node*                                                                                                    |
| **yarn.metrics.decommissioning\_nodes**(gauge)             | The number of decommissioning nodes*Shown as node*                                                                                                   |
| **yarn.metrics.lost\_nodes**(gauge)                        | The number of lost nodes*Shown as node*                                                                                                              |
| **yarn.metrics.rebooted\_nodes**(gauge)                    | The number of rebooted nodes*Shown as node*                                                                                                          |
| **yarn.metrics.reserved\_mb**(gauge)                       | The size of reserved memory*Shown as mebibyte*                                                                                                       |
| **yarn.metrics.reserved\_virtual\_cores**(gauge)           | The number of reserved virtual cores*Shown as core*                                                                                                  |
| **yarn.metrics.total\_mb**(gauge)                          | The amount of total memory*Shown as mebibyte*                                                                                                        |
| **yarn.metrics.total\_nodes**(gauge)                       | The total number of nodes*Shown as node*                                                                                                             |
| **yarn.metrics.total\_virtual\_cores**(gauge)              | The total number of virtual cores*Shown as core*                                                                                                     |
| **yarn.metrics.unhealthy\_nodes**(gauge)                   | The number of unhealthy nodes*Shown as node*                                                                                                         |
| **yarn.node.avail\_memory\_mb**(gauge)                     | The total amount of memory currently available on the node (in MB)*Shown as mebibyte*                                                                |
| **yarn.node.available\_virtual\_cores**(gauge)             | The total number of vCores available on the node*Shown as core*                                                                                      |
| **yarn.node.last\_health\_update**(gauge)                  | The last time the node reported its health (in ms since epoch)*Shown as millisecond*                                                                 |
| **yarn.node.num\_containers**(gauge)                       | The total number of containers currently running on the node                                                                                         |
| **yarn.node.used\_memory\_mb**(gauge)                      | The total amount of memory currently used on the node (in MB)*Shown as mebibyte*                                                                     |
| **yarn.node.used\_virtual\_cores**(gauge)                  | The total number of vCores currently used on the node*Shown as core*                                                                                 |
| **yarn.queue.absolute\_capacity**(gauge)                   | The absolute capacity percentage this queue can use of entire cluster*Shown as percent*                                                              |
| **yarn.queue.absolute\_max\_capacity**(gauge)              | The absolute maximum capacity percentage this queue can use of the entire cluster*Shown as percent*                                                  |
| **yarn.queue.absolute\_used\_capacity**(gauge)             | The absolute used capacity percentage this queue is using of the entire cluster*Shown as percent*                                                    |
| **yarn.queue.am\_resource\_limit.memory**(gauge)           | The maximum memory resources this queue can use for Application Masters (in MB)*Shown as mebibyte*                                                   |
| **yarn.queue.am\_resource\_limit.vcores**(gauge)           | The maximum vCpus this queue can use for Application Masters*Shown as core*                                                                          |
| **yarn.queue.capacity**(gauge)                             | The configured queue capacity in percentage relative to its parent queue*Shown as percent*                                                           |
| **yarn.queue.max\_active\_applications**(gauge)            | The maximum number of active applications this queue can have*Shown as task*                                                                         |
| **yarn.queue.max\_active\_applications\_per\_user**(gauge) | The maximum number of active applications per user this queue can have*Shown as task*                                                                |
| **yarn.queue.max\_applications**(gauge)                    | The maximum number of applications this queue can have*Shown as task*                                                                                |
| **yarn.queue.max\_applications\_per\_user**(gauge)         | The maximum number of applications per user this queue can have*Shown as task*                                                                       |
| **yarn.queue.max\_capacity**(gauge)                        | The configured maximum queue capacity in percentage relative to its parent queue*Shown as percent*                                                   |
| **yarn.queue.num\_active\_applications**(gauge)            | The number of active applications in this queue*Shown as task*                                                                                       |
| **yarn.queue.num\_applications**(gauge)                    | The number of applications currently in the queue*Shown as task*                                                                                     |
| **yarn.queue.num\_containers**(gauge)                      | The number of containers being used                                                                                                                  |
| **yarn.queue.num\_pending\_applications**(gauge)           | The number of pending applications in this queue*Shown as task*                                                                                      |
| **yarn.queue.resources\_used.memory**(gauge)               | The total memory resources this queue is using (in MB)*Shown as mebibyte*                                                                            |
| **yarn.queue.resources\_used.vcores**(gauge)               | The total vCpus this queue is using*Shown as core*                                                                                                   |
| **yarn.queue.root.capacity**(gauge)                        | The configured queue capacity in percentage for root queue*Shown as percent*                                                                         |
| **yarn.queue.root.max\_capacity**(gauge)                   | The configured maximum queue capacity in percentage for root queue*Shown as percent*                                                                 |
| **yarn.queue.root.used\_capacity**(gauge)                  | The used queue capacity in percentage for root queue*Shown as percent*                                                                               |
| **yarn.queue.used\_am\_resource.memory**(gauge)            | The memory resources used for Application Masters (in MB)*Shown as mebibyte*                                                                         |
| **yarn.queue.used\_am\_resource.vcores**(gauge)            | The vCpus used for Application Masters*Shown as core*                                                                                                |
| **yarn.queue.used\_capacity**(gauge)                       | The used queue capacity in percentage*Shown as percent*                                                                                              |
| **yarn.queue.user\_am\_resource\_limit.memory**(gauge)     | The maximum memory resources a user can use for Application Masters (in MB)*Shown as mebibyte*                                                       |
| **yarn.queue.user\_am\_resource\_limit.vcores**(gauge)     | The maximum vCpus a user can use for Application Masters*Shown as core*                                                                              |
| **yarn.queue.user\_limit**(gauge)                          | The user limit factor set in the configuration                                                                                                       |
| **yarn.queue.user\_limit\_factor**(gauge)                  | The minimum user limit percent set in the configuration                                                                                              |

### Events{% #events %}

The Yarn check does not include any events.

### Service Checks{% #service-checks %}

**yarn.can\_connect**

Returns `CRITICAL` if the Agent cannot connect to the ResourceManager URI to collect metrics, otherwise `OK`.

*Statuses: ok, critical*

**yarn.application.status**

By default, returns `OK` if the Yarn application state is `NEW`, `NEW_SAVING`, `SUBMITTED`, `ACCEPTED`, `RUNNING`, or `FINISHED`; `UNKNOWN` if the application state is `ALL`; and `CRITICAL` if the Yarn application state is `FAILED` or `KILLED`.

*Statuses: ok, unknown, critical*

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).

## Further Reading{% #further-reading %}

- [Hadoop & HDFS Architecture: An Overview](https://www.datadoghq.com/blog/hadoop-architecture-overview)
- [How to monitor Hadoop metrics](https://www.datadoghq.com/blog/monitor-hadoop-metrics)
- [How to collect Hadoop metrics](https://www.datadoghq.com/blog/collecting-hadoop-metrics)
- [How to monitor Hadoop with Datadog](https://www.datadoghq.com/blog/monitor-hadoop-metrics-datadog)
