Datadog-Ceph Integration

Ceph Graph

Overview

Enable the Datadog-Ceph integration to:

  • Track disk usage across storage pools
  • Receive service checks in case of issues
  • Monitor I/O performance metrics

Setup

Installation

The Ceph check is packaged with the Agent, so simply install the Agent on your Ceph servers.

Configuration

Create a file ceph.yaml in the Agent’s conf.d directory. See the sample ceph.yaml for all available configuration options:

init_config:

instances:
  - ceph_cmd: /path/to/your/ceph # default is /usr/bin/ceph
    use_sudo: true               # only if the ceph binary needs sudo on your nodes

If you enabled use_sudo, add a line like the following to your sudoers file:

dd-agent ALL=(ALL) NOPASSWD:/path/to/your/ceph

Validation

Run the Agent’s info subcommand and look for ceph under the Checks section:

  Checks
  ======
    [...]

   ceph (5.19.0) 
   ------------- 
   - instance #0 [OK] 
   - Collected 24 metrics, 0 events & 1 service check

    [...]

Data Collected

Metrics

ceph.commit_latency_ms
(gauge)
Time taken to commit an operation to the journal
shown as millisecond
ceph.apply_latency_ms
(gauge)
Time taken to flush an update to disks
shown as millisecond
ceph.op_per_sec
(gauge)
IO operations per second for given pool
shown as operation
ceph.read_bytes_sec
(gauge)
Bytes/second being read
shown as byte
ceph.write_bytes_sec
(gauge)
Bytes/second being written
shown as byte
ceph.num_osds
(gauge)
Number of known storage daemons
shown as item
ceph.num_in_osds
(gauge)
Number of participating storage daemons
shown as item
ceph.num_up_osds
(gauge)
Number of online storage daemons
shown as item
ceph.num_pgs
(gauge)
Number of placement groups available
shown as item
ceph.num_mons
(gauge)
Number of monitor daemons
shown as item
ceph.aggregate_pct_used
(gauge)
Overall capacity usage metric
shown as percent
ceph.total_objects
(gauge)
Object count from the underlying object store
shown as item
ceph.num_objects
(gauge)
Object count for a given pool
shown as item
ceph.read_bytes
(rate)
Per-pool read bytes
shown as byte
ceph.write_bytes
(rate)
Per-pool write bytes
shown as byte
ceph.num_pools
(gauge)
Number of pools
shown as item
ceph.pgstate.active_clean
(gauge)
Number of active+clean placement groups
shown as item
ceph.read_op_per_sec
(gauge)
Per-pool read operations/second
shown as operation
ceph.write_op_per_sec
(gauge)
Per-pool write operations/second
shown as operation
ceph.num_near_full_osds
(gauge)
Number of nearly full osds
shown as item
ceph.num_full_osds
(gauge)
Number of full osds
shown as item
ceph.osd.pct_used
(gauge)
Percentage used of full/near full osds
shown as percent

Events

The Ceph check does not include any event at this time.

Service Checks

  • ceph.overall_status : The Datadog Agent submits a service check for each of Ceph’s host health checks.

Troubleshooting

Need help? Contact Datadog Support.

Further Reading