---
title: HDFS Namenode
description: Track cluster disk usage, volume failures, dead DataNodes, and more.
breadcrumbs: Docs > Integrations > HDFS Namenode
---

# HDFS Namenode
Supported OS Integration version7.4.0


## Overview{% #overview %}

Monitor your primary *and* standby HDFS NameNodes to know when your cluster falls into a precarious state: when you're down to one NameNode remaining, or when it's time to add more capacity to the cluster. This Agent check collects metrics for remaining capacity, corrupt/missing blocks, dead DataNodes, filesystem load, under-replicated blocks, total volume failures (across all DataNodes), and many more.

Use this check (hdfs_namenode) and its counterpart check (hdfs_datanode), not the older two-in-one check (hdfs); that check is deprecated.

**Minimum Agent version:** 6.0.0

## Setup{% #setup %}

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the [Autodiscovery Integration Templates](https://docs.datadoghq.com/agent/kubernetes/integrations.md) for guidance on applying these instructions.

### Installation{% #installation %}

The HDFS NameNode check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package, so you don't need to install anything else on your NameNodes.

### Configuration{% #configuration %}

#### Connect the Agent{% #connect-the-agent %}

{% tab title="Host" %}
#### Host{% #host %}

To configure this check for an Agent running on a host:

1. Edit the `hdfs_namenode.d/conf.yaml` file, in the `conf.d/` folder at the root of your [Agent's configuration directory](https://docs.datadoghq.com/agent/guide/agent-configuration-files.md#agent-configuration-directory). See the [sample hdfs_namenode.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/hdfs_namenode/datadog_checks/hdfs_namenode/data/conf.yaml.example) for all available configuration options:

   ```yaml
   init_config:
   
   instances:
     ## @param hdfs_namenode_jmx_uri - string - required
     ## The HDFS NameNode check retrieves metrics from the HDFS NameNode's JMX
     ## interface via HTTP(S) (not a JMX remote connection). This check must be installed on
     ## a HDFS NameNode. The HDFS NameNode JMX URI is composed of the NameNode's hostname and port.
     ##
     ## The hostname and port can be found in the hdfs-site.xml conf file under
     ## the property dfs.namenode.http-address
     ## https://hadoop.apache.org/docs/r3.1.3/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
     #
     - hdfs_namenode_jmx_uri: http://localhost:9870
   ```

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands.md#start-stop-and-restart-the-agent).

{% /tab %}

{% tab title="Containerized" %}
#### Containerized{% #containerized %}

For containerized environments, see the [Autodiscovery Integration Templates](https://docs.datadoghq.com/agent/kubernetes/integrations.md) for guidance on applying the parameters below.

| Parameter            | Value                                                |
| -------------------- | ---------------------------------------------------- |
| `<INTEGRATION_NAME>` | `hdfs_namenode`                                      |
| `<INIT_CONFIG>`      | blank or `{}`                                        |
| `<INSTANCE_CONFIG>`  | `{"hdfs_namenode_jmx_uri": "https://%%host%%:9870"}` |

#### Log collection{% #log-collection %}

**Available for Agent >6.0**

1. Collecting logs is disabled by default in the Datadog Agent. Enable it in the `datadog.yaml` file with:

   ```yaml
     logs_enabled: true
   ```

1. Add this configuration block to your `hdfs_namenode.d/conf.yaml` file to start collecting your NameNode logs:

   ```yaml
     logs:
       - type: file
         path: /var/log/hadoop-hdfs/*.log
         source: hdfs_namenode
         service: <SERVICE_NAME>
   ```

Change the `path` and `service` parameter values and configure them for your environment.

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands.md#start-stop-and-restart-the-agent).

{% /tab %}

### Validation{% #validation %}

[Run the Agent's status subcommand](https://docs.datadoghq.com/agent/guide/agent-commands.md#agent-status-and-information) and look for `hdfs_namenode` under the Checks section.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **hdfs.namenode.blocks\_total**(gauge)                     | Total number of blocks*Shown as block*                     |
| **hdfs.namenode.capacity\_remaining**(gauge)               | Remaining disk space left in bytes*Shown as byte*          |
| **hdfs.namenode.capacity\_total**(gauge)                   | Total disk capacity in bytes*Shown as byte*                |
| **hdfs.namenode.capacity\_used**(gauge)                    | Disk usage in bytes*Shown as byte*                         |
| **hdfs.namenode.corrupt\_blocks**(gauge)                   | Number of corrupt blocks*Shown as block*                   |
| **hdfs.namenode.estimated\_capacity\_lost\_total**(gauge)  | Estimated capacity lost in bytes*Shown as byte*            |
| **hdfs.namenode.files\_total**(gauge)                      | Total number of files*Shown as file*                       |
| **hdfs.namenode.fs\_lock\_queue\_length**(gauge)           | Lock queue length                                          |
| **hdfs.namenode.max\_objects**(gauge)                      | Maximum number of files HDFS supports*Shown as object*     |
| **hdfs.namenode.missing\_blocks**(gauge)                   | Number of missing blocks*Shown as block*                   |
| **hdfs.namenode.num\_dead\_data\_nodes**(gauge)            | Total number of dead data nodes*Shown as node*             |
| **hdfs.namenode.num\_decom\_dead\_data\_nodes**(gauge)     | Number of decommissioning dead data nodes*Shown as node*   |
| **hdfs.namenode.num\_decom\_live\_data\_nodes**(gauge)     | Number of decommissioning live data nodes*Shown as node*   |
| **hdfs.namenode.num\_decommissioning\_data\_nodes**(gauge) | Number of decommissioning data nodes*Shown as node*        |
| **hdfs.namenode.num\_live\_data\_nodes**(gauge)            | Total number of live data nodes*Shown as node*             |
| **hdfs.namenode.num\_stale\_data\_nodes**(gauge)           | Number of stale data nodes*Shown as node*                  |
| **hdfs.namenode.num\_stale\_storages**(gauge)              | Number of stale storages                                   |
| **hdfs.namenode.pending\_deletion\_blocks**(gauge)         | Number of pending deletion blocks*Shown as block*          |
| **hdfs.namenode.pending\_replication\_blocks**(gauge)      | Number of blocks pending replication*Shown as block*       |
| **hdfs.namenode.scheduled\_replication\_blocks**(gauge)    | Number of blocks scheduled for replication*Shown as block* |
| **hdfs.namenode.total\_load**(gauge)                       | Total load on the file system                              |
| **hdfs.namenode.under\_replicated\_blocks**(gauge)         | Number of under replicated blocks*Shown as block*          |
| **hdfs.namenode.volume\_failures\_total**(gauge)           | Total volume failures                                      |

### Events{% #events %}

The HDFS-namenode check does not include any events.

### Service Checks{% #service-checks %}

**hdfs.namenode.jmx.can\_connect**

Returns `CRITICAL` if the Agent cannot connect to the NameNode's JMX interface for any reason. Returns `OK` otherwise.

*Statuses: ok, critical*

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).

## Further Reading{% #further-reading %}

- [Hadoop & HDFS Architecture: An Overview](https://www.datadoghq.com/blog/hadoop-architecture-overview)
- [How to monitor Hadoop metrics](https://www.datadoghq.com/blog/monitor-hadoop-metrics)
- [How to collect Hadoop metrics](https://www.datadoghq.com/blog/collecting-hadoop-metrics)
- [How to monitor Hadoop with Datadog](https://www.datadoghq.com/blog/monitor-hadoop-metrics-datadog)
