Enable the Datadog-Ceph integration to:
The Ceph check is included in the Datadog Agent package, so you don’t need to install anything else on your Ceph servers.
Edit the file ceph.d/conf.yaml
in the conf.d/
folder at the root of your Agent’s configuration directory.
See the sample ceph.d/conf.yaml for all available configuration options:
init_config:
instances:
- ceph_cmd: /path/to/your/ceph # default is /usr/bin/ceph
use_sudo: true # only if the ceph binary needs sudo on your nodes
If you enabled use_sudo
, add a line like the following to your sudoers
file:
dd-agent ALL=(ALL) NOPASSWD:/path/to/your/ceph
Available for Agent versions >6.0
Collecting logs is disabled by default in the Datadog Agent, enable it in your datadog.yaml
file:
logs_enabled: true
Next, edit ceph.d/conf.yaml
by uncommenting the logs
lines at the bottom. Update the logs path
with the correct path to your Ceph log files.
logs:
- type: file
path: /var/log/ceph/*.log
source: ceph
service: "<APPLICATION_NAME>"
Run the Agent’s status subcommand and look for ceph
under the Checks section.
ceph.commit_latency_ms (gauge) | Time taken to commit an operation to the journal Shown as millisecond |
ceph.apply_latency_ms (gauge) | Time taken to flush an update to disks Shown as millisecond |
ceph.op_per_sec (gauge) | IO operations per second for given pool Shown as operation |
ceph.read_bytes_sec (gauge) | Bytes/second being read Shown as byte |
ceph.write_bytes_sec (gauge) | Bytes/second being written Shown as byte |
ceph.num_osds (gauge) | Number of known storage daemons Shown as item |
ceph.num_in_osds (gauge) | Number of participating storage daemons Shown as item |
ceph.num_up_osds (gauge) | Number of online storage daemons Shown as item |
ceph.num_pgs (gauge) | Number of placement groups available Shown as item |
ceph.num_mons (gauge) | Number of monitor daemons Shown as item |
ceph.aggregate_pct_used (gauge) | Overall capacity usage metric Shown as percent |
ceph.total_objects (gauge) | Object count from the underlying object store Shown as item |
ceph.num_objects (gauge) | Object count for a given pool Shown as item |
ceph.read_bytes (gauge) | Per-pool read bytes Shown as byte |
ceph.write_bytes (gauge) | Per-pool write bytes Shown as byte |
ceph.num_pools (gauge) | Number of pools Shown as item |
ceph.pgstate.active_clean (gauge) | Number of active+clean placement groups Shown as item |
ceph.read_op_per_sec (gauge) | Per-pool read operations/second Shown as operation |
ceph.write_op_per_sec (gauge) | Per-pool write operations/second Shown as operation |
ceph.num_near_full_osds (gauge) | Number of nearly full osds Shown as item |
ceph.num_full_osds (gauge) | Number of full osds Shown as item |
ceph.osd.pct_used (gauge) | Percentage used of full/near full osds Shown as percent |
Note: If you are running ceph luminous or later, you will not see the metric ceph.osd.pct_used
.
The Ceph check does not include any events.
ceph.overall_status:
The Datadog Agent submits a service check for each of Ceph’s host health checks.
In addition to this service check, the Ceph check also collects a configurable list of health checks for Ceph luminous and later. By default, these are:
ceph.osd_down:
Returns OK
if your OSDs are all up. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.osd_orphan:
Returns OK
if you have no orphan OSD. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.osd_full:
Returns OK
if your OSDs are not full. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.osd_nearfull:
Returns OK
if your OSDs are not near full. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.pool_full:
Returns OK
if your pools have not reached their quota. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.pool_near_full:
Returns OK
if your pools are not near reaching their quota. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.pg_availability:
Returns OK
if there is full data availability. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.pg_degraded:
Returns OK
if there is full data redundancy. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.pg_degraded_full:
Returns OK
if there is enough space in the cluster for data redundancy. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.pg_damaged:
Returns OK
if there are no inconsistencies after data scrubing. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.pg_not_scrubbed:
Returns OK
if the PGs were scrubbed recently. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.pg_not_deep_scrubbed:
Returns OK
if the PGs were deep scrubbed recently. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.cache_pool_near_full:
Returns OK
if the cache pools are not near full. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.too_few_pgs:
Returns OK
if the number of PGs is above the min threshold. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.too_many_pgs:
Returns OK
if the number of PGs is below the max threshold. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.object_unfound:
Returns OK
if all objects can be found. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.request_slow:
Returns OK
requests are taking a normal time to process. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
ceph.request_stuck:
Returns OK
requests are taking a normal time to process. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Need help? Contact Datadog support.
On this Page