Supported OS
![Mac OS]()

개요
Datadog-Ceph 통합을 활성화해 다음 작업을 수행할 수 있습니다.
- 스토리지 풀 전반의 디스크 사용량 추적
- 문제 시 서비스 점검 수신
- I/O 성능 메트릭 모니터링
설정
설치
Ceph 점검은 Datadog 에이전트 패키지에 포함되어 있으므로 Ceph 서버에서 아무 것도 설치할 필요가 없습니다.
구성
에이전트 설정 디렉터리 루트에 있는 conf.d/
폴더에서 ceph.d/conf.yaml
파일을 편집합니다.
사용 가능한 모든 옵션은 샘플 ceph.d/conf.yaml을 참조하세요.
init_config:
instances:
- ceph_cmd: /path/to/your/ceph # default is /usr/bin/ceph
use_sudo: true # only if the ceph binary needs sudo on your nodes
use_sudo
을 활성화하면 다음과 같은 라인을 sudoers
파일에 추가합니다.
dd-agent ALL=(ALL) NOPASSWD:/path/to/your/ceph
로그 수집
Agent 버전 6.0 이상에서 사용 가능
Datadog 에이전트에서 로그 수집은 기본적으로 사용하지 않도록 설정되어 있습니다. datadog.yaml
파일에서 로그 수집을 사용하도록 설정합니다.
다음으로 아래에서 logs
라인의 주석을 제거하여 ceph.d/conf.yaml
을 편집합니다. Ceph 로그 파일에 대한 올바른 경로를 사용해 로그 path
를 업데이트합니다.
logs:
- type: file
path: /var/log/ceph/*.log
source: ceph
service: "<APPLICATION_NAME>"
Agent를 재시작합니다.
검증
에이전트 상태 하위 명령을 실행하고 점검 섹션 아래에서 ceph
를 찾으세요.
수집한 데이터
메트릭
| |
---|
ceph.aggregate_pct_used (gauge) | Overall capacity usage metric Shown as percent |
ceph.apply_latency_ms (gauge) | Time taken to flush an update to disks Shown as millisecond |
ceph.class_pct_used (gauge) | Per-class percentage of raw storage used Shown as percent |
ceph.commit_latency_ms (gauge) | Time taken to commit an operation to the journal Shown as millisecond |
ceph.misplaced_objects (gauge) | Number of objects misplaced Shown as item |
ceph.misplaced_total (gauge) | Total number of objects if there are misplaced objects Shown as item |
ceph.num_full_osds (gauge) | Number of full osds Shown as item |
ceph.num_in_osds (gauge) | Number of participating storage daemons Shown as item |
ceph.num_mons (gauge) | Number of monitor daemons Shown as item |
ceph.num_near_full_osds (gauge) | Number of nearly full osds Shown as item |
ceph.num_objects (gauge) | Object count for a given pool Shown as item |
ceph.num_osds (gauge) | Number of known storage daemons Shown as item |
ceph.num_pgs (gauge) | Number of placement groups available Shown as item |
ceph.num_pools (gauge) | Number of pools Shown as item |
ceph.num_up_osds (gauge) | Number of online storage daemons Shown as item |
ceph.op_per_sec (gauge) | IO operations per second for given pool Shown as operation |
ceph.osd.pct_used (gauge) | Percentage used of full/near full osds Shown as percent |
ceph.pgstate.active_clean (gauge) | Number of active+clean placement groups Shown as item |
ceph.read_bytes (gauge) | Per-pool read bytes Shown as byte |
ceph.read_bytes_sec (gauge) | Bytes/second being read Shown as byte |
ceph.read_op_per_sec (gauge) | Per-pool read operations/second Shown as operation |
ceph.recovery_bytes_per_sec (gauge) | Rate of recovered bytes Shown as byte |
ceph.recovery_keys_per_sec (gauge) | Rate of recovered keys Shown as item |
ceph.recovery_objects_per_sec (gauge) | Rate of recovered objects Shown as item |
ceph.total_objects (gauge) | Object count from the underlying object store. [v<=3 only] Shown as item |
ceph.write_bytes (gauge) | Per-pool write bytes Shown as byte |
ceph.write_bytes_sec (gauge) | Bytes/second being written Shown as byte |
ceph.write_op_per_sec (gauge) | Per-pool write operations/second Shown as operation |
Note: If you are running Ceph luminous or later, the ceph.osd.pct_used
metric is not included.
참고: Ceph luminous 이상 버전을 실행 중인 경우 ceph.osd.pct_used
메트릭이 포함되지 않습니다.
이벤트
Ceph 점검은 이벤트를 포함하지 않습니다.
서비스 점검
ceph.overall_status
Returns OK
if your ceph cluster status is HEALTH_OK, WARNING
if it’s HEALTH_WARNING, CRITICAL
otherwise.
Statuses: ok, warning, critical
ceph.osd_down
Returns OK
if you have no down OSD. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.osd_orphan
Returns OK
if you have no orphan OSD. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.osd_full
Returns OK
if your OSDs are not full. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.osd_nearfull
Returns OK
if your OSDs are not near full. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pool_full
Returns OK
if your pools have not reached their quota. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pool_near_full
Returns OK
if your pools are not near reaching their quota. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pg_availability
Returns OK
if there is full data availability. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pg_degraded
Returns OK
if there is full data redundancy. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pg_degraded_full
Returns OK
if there is enough space in the cluster for data redundancy. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pg_damaged
Returns OK
if there are no inconsistencies after data scrubing. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pg_not_scrubbed
Returns OK
if the PGs were scrubbed recently. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pg_not_deep_scrubbed
Returns OK
if the PGs were deep scrubbed recently. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.cache_pool_near_full
Returns OK
if the cache pools are not near full. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.too_few_pgs
Returns OK
if the number of PGs is above the min threshold. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.too_many_pgs
Returns OK
if the number of PGs is below the max threshold. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.object_unfound
Returns OK
if all objects can be found. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.request_slow
Returns OK
requests are taking a normal time to process. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.request_stuck
Returns OK
requests are taking a normal time to process. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
트러블슈팅
도움이 필요하신가요? Datadog 지원팀에 문의하세요.
참고 자료