- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Supported OS
Datadog-Ceph 통합을 활성화해 다음 작업을 수행할 수 있습니다.
Ceph 점검은 Datadog 에이전트 패키지에 포함되어 있으므로 Ceph 서버에서 아무 것도 설치할 필요가 없습니다.
에이전트 설정 디렉터리 루트에 있는 conf.d/
폴더에서 ceph.d/conf.yaml
파일을 편집합니다.
사용 가능한 모든 옵션은 sample ceph.d/conf.yaml을 참조하세요.
init_config:
instances:
- ceph_cmd: /path/to/your/ceph # default is /usr/bin/ceph
use_sudo: true # only if the ceph binary needs sudo on your nodes
use_sudo
을 활성화하면 다음과 같은 라인을 sudoers
파일에 추가합니다.
dd-agent ALL=(ALL) NOPASSWD:/path/to/your/ceph
에이전트 버전 > 6.0 이상 사용 가능
Datadog 에이전트에서 로그 수집은 기본적으로 사용하지 않도록 설정되어 있습니다. datadog.yaml
파일에서 로그 수집을 사용하도록 설정합니다.
logs_enabled: true
다음으로 아래에서 logs
라인의 주석을 제거하여 ceph.d/conf.yaml
을 편집합니다. Ceph 로그 파일에 대한 올바른 경로를 사용해 로그 path
를 업데이트합니다.
logs:
- type: file
path: /var/log/ceph/*.log
source: ceph
service: "<APPLICATION_NAME>"
에이전트 상태 하위 명령을 실행하고 점검 섹션 아래에서 ceph
를 찾으세요.
ceph.aggregate_pct_used (gauge) | Overall capacity usage metric Shown as percent |
ceph.apply_latency_ms (gauge) | Time taken to flush an update to disks Shown as millisecond |
ceph.commit_latency_ms (gauge) | Time taken to commit an operation to the journal Shown as millisecond |
ceph.misplaced_objects (gauge) | Number of objects misplaced Shown as item |
ceph.misplaced_total (gauge) | Total number of objects if there are misplaced objects Shown as item |
ceph.num_full_osds (gauge) | Number of full osds Shown as item |
ceph.num_in_osds (gauge) | Number of participating storage daemons Shown as item |
ceph.num_mons (gauge) | Number of monitor daemons Shown as item |
ceph.num_near_full_osds (gauge) | Number of nearly full osds Shown as item |
ceph.num_objects (gauge) | Object count for a given pool Shown as item |
ceph.num_osds (gauge) | Number of known storage daemons Shown as item |
ceph.num_pgs (gauge) | Number of placement groups available Shown as item |
ceph.num_pools (gauge) | Number of pools Shown as item |
ceph.num_up_osds (gauge) | Number of online storage daemons Shown as item |
ceph.op_per_sec (gauge) | IO operations per second for given pool Shown as operation |
ceph.osd.pct_used (gauge) | Percentage used of full/near full osds Shown as percent |
ceph.pgstate.active_clean (gauge) | Number of active+clean placement groups Shown as item |
ceph.read_bytes (gauge) | Per-pool read bytes Shown as byte |
ceph.read_bytes_sec (gauge) | Bytes/second being read Shown as byte |
ceph.read_op_per_sec (gauge) | Per-pool read operations/second Shown as operation |
ceph.recovery_bytes_per_sec (gauge) | Rate of recovered bytes Shown as byte |
ceph.recovery_keys_per_sec (gauge) | Rate of recovered keys Shown as item |
ceph.recovery_objects_per_sec (gauge) | Rate of recovered objects Shown as item |
ceph.total_objects (gauge) | Object count from the underlying object store. [v<=3 only] Shown as item |
ceph.write_bytes (gauge) | Per-pool write bytes Shown as byte |
ceph.write_bytes_sec (gauge) | Bytes/second being written Shown as byte |
ceph.write_op_per_sec (gauge) | Per-pool write operations/second Shown as operation |
참고: Ceph luminous 이상 버전을 실행 중인 경우 ceph.osd.pct_used
메트릭이 포함되지 않습니다.
Ceph 점검은 이벤트를 포함하지 않습니다.
ceph.overall_status
Returns OK
if your ceph cluster status is HEALTH_OK, WARNING
if it’s HEALTH_WARNING, CRITICAL
otherwise.
Statuses: ok, warning, critical
ceph.osd_down
Returns OK
if you have no down OSD. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.osd_orphan
Returns OK
if you have no orphan OSD. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.osd_full
Returns OK
if your OSDs are not full. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.osd_nearfull
Returns OK
if your OSDs are not near full. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pool_full
Returns OK
if your pools have not reached their quota. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pool_near_full
Returns OK
if your pools are not near reaching their quota. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pg_availability
Returns OK
if there is full data availability. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pg_degraded
Returns OK
if there is full data redundancy. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pg_degraded_full
Returns OK
if there is enough space in the cluster for data redundancy. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pg_damaged
Returns OK
if there are no inconsistencies after data scrubing. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pg_not_scrubbed
Returns OK
if the PGs were scrubbed recently. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.pg_not_deep_scrubbed
Returns OK
if the PGs were deep scrubbed recently. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.cache_pool_near_full
Returns OK
if the cache pools are not near full. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.too_few_pgs
Returns OK
if the number of PGs is above the min threshold. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.too_many_pgs
Returns OK
if the number of PGs is below the max threshold. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.object_unfound
Returns OK
if all objects can be found. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.request_slow
Returns OK
requests are taking a normal time to process. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
ceph.request_stuck
Returns OK
requests are taking a normal time to process. Otherwise, returns WARNING
if the severity is HEALTH_WARN
, else CRITICAL
.
Statuses: ok, warning, critical
도움이 필요하신가요? Datadog 지원팀에 문의하세요.