---
title: Ceph
description: Collect per-pool performance metrics and monitor overall cluster status.
breadcrumbs: Docs > Integrations > Ceph
---

# Ceph
Supported OS Integration version4.4.0


## Overview{% #overview %}

Enable the Datadog-Ceph integration to:

- Track disk usage across storage pools
- Receive service checks in case of issues
- Monitor I/O performance metrics

**Minimum Agent version:** 6.0.0

## Setup{% #setup %}

### Installation{% #installation %}

The Ceph check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package, so you don't need to install anything else on your Ceph servers.

### Configuration{% #configuration %}

Edit the file `ceph.d/conf.yaml` in the `conf.d/` folder at the root of your [Agent's configuration directory](https://docs.datadoghq.com/agent/guide/agent-configuration-files/#agent-configuration-directory). See the [sample ceph.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/ceph/datadog_checks/ceph/data/conf.yaml.example) for all available configuration options:

```yaml
init_config:

instances:
  - ceph_cmd: /path/to/your/ceph # default is /usr/bin/ceph
    use_sudo: true # only if the ceph binary needs sudo on your nodes
```

If you enabled `use_sudo`, add a line like the following to your `sudoers` file:

```text
dd-agent ALL=(ALL) NOPASSWD:/path/to/your/ceph
```

#### Log collection{% #log-collection %}

*Available for Agent versions >6.0*

1. Collecting logs is disabled by default in the Datadog Agent, enable it in your `datadog.yaml` file:

   ```yaml
   logs_enabled: true
   ```

1. Next, edit `ceph.d/conf.yaml` by uncommenting the `logs` lines at the bottom. Update the logs `path` with the correct path to your Ceph log files.

   ```yaml
   logs:
     - type: file
       path: /var/log/ceph/*.log
       source: ceph
       service: "<APPLICATION_NAME>"
   ```

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands/#start-stop-and-restart-the-agent).

### Validation{% #validation %}

[Run the Agent's status subcommand](https://docs.datadoghq.com/agent/guide/agent-commands/#agent-status-and-information) and look for `ceph` under the Checks section.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **ceph.aggregate\_pct\_used**(gauge)        | Overall capacity usage metric*Shown as percent*                           |
| **ceph.apply\_latency\_ms**(gauge)          | Time taken to flush an update to disks*Shown as millisecond*              |
| **ceph.class\_pct\_used**(gauge)            | Per-class percentage of raw storage used*Shown as percent*                |
| **ceph.commit\_latency\_ms**(gauge)         | Time taken to commit an operation to the journal*Shown as millisecond*    |
| **ceph.misplaced\_objects**(gauge)          | Number of objects misplaced*Shown as item*                                |
| **ceph.misplaced\_total**(gauge)            | Total number of objects if there are misplaced objects*Shown as item*     |
| **ceph.num\_full\_osds**(gauge)             | Number of full osds*Shown as item*                                        |
| **ceph.num\_in\_osds**(gauge)               | Number of participating storage daemons*Shown as item*                    |
| **ceph.num\_mons**(gauge)                   | Number of monitor daemons*Shown as item*                                  |
| **ceph.num\_near\_full\_osds**(gauge)       | Number of nearly full osds*Shown as item*                                 |
| **ceph.num\_objects**(gauge)                | Object count for a given pool*Shown as item*                              |
| **ceph.num\_osds**(gauge)                   | Number of known storage daemons*Shown as item*                            |
| **ceph.num\_pgs**(gauge)                    | Number of placement groups available*Shown as item*                       |
| **ceph.num\_pools**(gauge)                  | Number of pools*Shown as item*                                            |
| **ceph.num\_up\_osds**(gauge)               | Number of online storage daemons*Shown as item*                           |
| **ceph.op\_per\_sec**(gauge)                | IO operations per second for given pool*Shown as operation*               |
| **ceph.osd.pct\_used**(gauge)               | Percentage used of full/near full osds*Shown as percent*                  |
| **ceph.pgstate.active\_clean**(gauge)       | Number of active+clean placement groups*Shown as item*                    |
| **ceph.read\_bytes**(gauge)                 | Per-pool read bytes*Shown as byte*                                        |
| **ceph.read\_bytes\_sec**(gauge)            | Bytes/second being read*Shown as byte*                                    |
| **ceph.read\_op\_per\_sec**(gauge)          | Per-pool read operations/second*Shown as operation*                       |
| **ceph.recovery\_bytes\_per\_sec**(gauge)   | Rate of recovered bytes*Shown as byte*                                    |
| **ceph.recovery\_keys\_per\_sec**(gauge)    | Rate of recovered keys*Shown as item*                                     |
| **ceph.recovery\_objects\_per\_sec**(gauge) | Rate of recovered objects*Shown as item*                                  |
| **ceph.total\_objects**(gauge)              | Object count from the underlying object store. [v<=3 only]*Shown as item* |
| **ceph.write\_bytes**(gauge)                | Per-pool write bytes*Shown as byte*                                       |
| **ceph.write\_bytes\_sec**(gauge)           | Bytes/second being written*Shown as byte*                                 |
| **ceph.write\_op\_per\_sec**(gauge)         | Per-pool write operations/second*Shown as operation*                      |

**Note**: If you are running Ceph luminous or later, the `ceph.osd.pct_used` metric is not included.

### Events{% #events %}

The Ceph check does not include any events.

### Service Checks{% #service-checks %}

**ceph.overall\_status**

Returns `OK` if your ceph cluster status is HEALTH_OK, `WARNING` if it's HEALTH_WARNING, `CRITICAL` otherwise.

*Statuses: ok, warning, critical*

**ceph.osd\_down**

Returns `OK` if you have no down OSD. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.osd\_orphan**

Returns `OK` if you have no orphan OSD. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.osd\_full**

Returns `OK` if your OSDs are not full. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.osd\_nearfull**

Returns `OK` if your OSDs are not near full. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.pool\_full**

Returns `OK` if your pools have not reached their quota. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.pool\_near\_full**

Returns `OK` if your pools are not near reaching their quota. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.pg\_availability**

Returns `OK` if there is full data availability. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.pg\_degraded**

Returns `OK` if there is full data redundancy. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.pg\_degraded\_full**

Returns `OK` if there is enough space in the cluster for data redundancy. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.pg\_damaged**

Returns `OK` if there are no inconsistencies after data scrubing. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.pg\_not\_scrubbed**

Returns `OK` if the PGs were scrubbed recently. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.pg\_not\_deep\_scrubbed**

Returns `OK` if the PGs were deep scrubbed recently. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.cache\_pool\_near\_full**

Returns `OK` if the cache pools are not near full. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.too\_few\_pgs**

Returns `OK` if the number of PGs is above the min threshold. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.too\_many\_pgs**

Returns `OK` if the number of PGs is below the max threshold. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.object\_unfound**

Returns `OK` if all objects can be found. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.request\_slow**

Returns `OK` requests are taking a normal time to process. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

**ceph.request\_stuck**

Returns `OK` requests are taking a normal time to process. Otherwise, returns `WARNING` if the severity is `HEALTH_WARN`, else `CRITICAL`.

*Statuses: ok, warning, critical*

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).

## Further Reading{% #further-reading %}

- [Monitor Ceph: From node status to cluster-wide performance](https://www.datadoghq.com/blog/monitor-ceph-datadog)
