Cassandra

Supported OS Linux Windows Mac OS

통합 버전1.18.0

Cassandra default dashboard

개요

실시간으로 Cassandra에서 메트릭을 받아 다음을 수행할 수 있습니다.

  • Cassandra 상태를 가시화하고 모니터링합니다.
  • Cassandra 페일오버와 이벤트에 대한 알림을 받습니다.

설정

설치

Cassandra 점검은 Datadog 에이전트 패키지에 포함되어 있으므로 Cassandra 노드에 아무 것도 설치할 필요가 없습니다. 이 통합을 위해 Oracle의 JDK를 사용하는 것이 좋습니다.

참고: 이 점검에는 인스턴트당 350개 메트릭 제한이 적용됩니다. 반환되는 메트릭의 수는 상태 페이지에 나와 있습니다. 아래 설정을 편집하여 관심 있는 메트릭을 지정할 수 있습니다. 메트릭을 커스터마이즈하는 방법을 알아보려면 상세한 지침을 JMX 설명서를 참조하세요. 더 많은 메트릭을 모니터링해야 한다면 Datadog 지원팀에 문의해 주세요.

설정

메트릭 수집
  1. cassandra.d/conf.yaml 파일의 기본 설정은 Cassandra 메트릭 수집을 활성화합니다. 사용 가능한 모든 설정 옵션은 sample cassandra.d/conf.yaml을 참조하세요.

  2. Agent를 다시 시작합니다.

로그 수집

에이전트 버전 > 6.0 이상 사용 가능

컨테이너화된 환경의 경우 쿠버네티스(Kubernetes) 로그 수집 또는 도커(Docker) 로그 수집 페이지의 지침을 따르세요.

  1. Datadog 에이전트에서 로그 수집은 기본적으로 사용하지 않도록 설정되어 있습니다. datadog.yaml 파일에서 로그 수집을 사용하도록 설정합니다.

    logs_enabled: true
    
  2. 이 설정 블록을 cassandra.d/conf.yaml 파일에 추가하여 Cassandra 로그 수집을 시작하세요.

      logs:
        - type: file
          path: /var/log/cassandra/*.log
          source: cassandra
          service: myapplication
          log_processing_rules:
             - type: multi_line
               name: log_start_with_date
               # pattern to match: DEBUG [ScheduledTasks:1] 2019-12-30
               pattern: '[A-Z]+ +\[[^\]]+\] +\d{4}-\d{2}-\d{2}'
    

    pathservice 파라미터 값을 변경하고 환경에 맞게 설정하세요. 사용 가능한 모든 설정 옵션은 sample cassandra.d/conf.yaml을 참조하세요.

    스택트레이스가 적절하게 단일 로그로 집계되었는지 확인하려면 멀티라인 프로세싱 규칙을 추가할 수 있습니다.

  3. Agent를 다시 시작합니다.

검증

에이전트 상태 하위 명령을 실행하고 점검 섹션 아래에서 cassandra를 찾습니다.

수집한 데이터

메트릭

cassandra.active_tasks
(gauge)
The number of tasks that the thread pool is actively executing.
Shown as task
cassandra.bloom_filter_false_ratio
(gauge)
The ratio of Bloom filter false positives to total checks.
Shown as fraction
cassandra.bytes_flushed.count
(gauge)
The amount of data that was flushed since (re)start.
Shown as byte
cassandra.cas_commit_latency.75th_percentile
(gauge)
The latency of paxos commit round - p75.
Shown as microsecond
cassandra.cas_commit_latency.95th_percentile
(gauge)
The latency of paxos commit round - p95.
Shown as microsecond
cassandra.cas_commit_latency.one_minute_rate
(gauge)
The number of paxos commit round per second.
Shown as operation
cassandra.cas_prepare_latency.75th_percentile
(gauge)
The latency of paxos prepare round - p75.
Shown as microsecond
cassandra.cas_prepare_latency.95th_percentile
(gauge)
The latency of paxos prepare round - p95.
Shown as microsecond
cassandra.cas_prepare_latency.one_minute_rate
(gauge)
The number of paxos prepare round per second.
Shown as operation
cassandra.cas_propose_latency.75th_percentile
(gauge)
The latency of paxos propose round - p75.
Shown as microsecond
cassandra.cas_propose_latency.95th_percentile
(gauge)
The latency of paxos propose round - p95.
Shown as microsecond
cassandra.cas_propose_latency.one_minute_rate
(gauge)
The number of paxos propose round per second.
Shown as operation
cassandra.col_update_time_delta_histogram.75th_percentile
(gauge)
The column update time delta - p75.
Shown as microsecond
cassandra.col_update_time_delta_histogram.95th_percentile
(gauge)
The column update time delta - p95.
Shown as microsecond
cassandra.col_update_time_delta_histogram.min
(gauge)
The column update time delta - min.
Shown as microsecond
cassandra.compaction_bytes_written.count
(gauge)
The amount of data that was compacted since (re)start.
Shown as byte
cassandra.compression_ratio
(gauge)
The compression ratio for all SSTables. /!\ A low value means a high compression contrary to what the name suggests. Formula used is: 'size of the compressed SSTable / size of original'
Shown as fraction
cassandra.currently_blocked_tasks
(gauge)
The number of currently blocked tasks for the thread pool.
Shown as task
cassandra.currently_blocked_tasks.count
(gauge)
The number of currently blocked tasks for the thread pool.
Shown as task
cassandra.db.droppable_tombstone_ratio
(gauge)
The estimate of the droppable tombstone ratio.
Shown as fraction
cassandra.dropped.one_minute_rate
(gauge)
The tasks dropped during execution for the thread pool.
Shown as thread
cassandra.exceptions.count
(gauge)
The number of exceptions thrown from 'Storage' metrics.
Shown as error
cassandra.key_cache_hit_rate
(gauge)
The key cache hit rate.
Shown as fraction
cassandra.latency.75th_percentile
(gauge)
The client request latency - p75.
Shown as microsecond
cassandra.latency.95th_percentile
(gauge)
The client request latency - p95.
Shown as microsecond
cassandra.latency.one_minute_rate
(gauge)
The number of client requests.
Shown as request
cassandra.live_disk_space_used.count
(gauge)
The disk space used by "live" SSTables (only counts in use files).
Shown as byte
cassandra.live_ss_table_count
(gauge)
Number of "live" (in use) SSTables.
Shown as file
cassandra.load.count
(gauge)
The disk space used by live data on a node.
Shown as byte
cassandra.max_partition_size
(gauge)
The size of the largest compacted partition.
Shown as byte
cassandra.max_row_size
(gauge)
The size of the largest compacted row.
Shown as byte
cassandra.mean_partition_size
(gauge)
The average size of compacted partition.
Shown as byte
cassandra.mean_row_size
(gauge)
The average size of compacted rows.
Shown as byte
cassandra.net.down_endpoint_count
(gauge)
The number of unhealthy nodes in the cluster. They represent each individual node's view of the cluster and thus should not be summed across reporting nodes.
Shown as node
cassandra.net.up_endpoint_count
(gauge)
The number of healthy nodes in the cluster. They represent each individual node's view of the cluster and thus should not be summed across reporting nodes.
Shown as node
cassandra.pending_compactions
(gauge)
The number of pending compactions.
Shown as task
cassandra.pending_flushes.count
(gauge)
The number of pending flushes.
Shown as flush
cassandra.pending_tasks
(gauge)
The number of pending tasks for the thread pool.
Shown as task
cassandra.range_latency.75th_percentile
(gauge)
The local range request latency - p75.
Shown as microsecond
cassandra.range_latency.95th_percentile
(gauge)
The local range request latency - p95.
Shown as microsecond
cassandra.range_latency.one_minute_rate
(gauge)
The number of local range requests.
Shown as request
cassandra.read_latency.75th_percentile
(gauge)
The local read latency - p75.
Shown as microsecond
cassandra.read_latency.95th_percentile
(gauge)
The local read latency - p95.
Shown as microsecond
cassandra.read_latency.99th_percentile
(gauge)
The local read latency - p99.
Shown as microsecond
cassandra.read_latency.one_minute_rate
(gauge)
The number of local read requests.
Shown as read
cassandra.row_cache_hit.count
(gauge)
The number of row cache hits.
Shown as hit
cassandra.row_cache_hit_out_of_range.count
(gauge)
The number of row cache hits that do not satisfy the query filter and went to disk.
Shown as hit
cassandra.row_cache_miss.count
(gauge)
The number of table row cache misses.
Shown as miss
cassandra.snapshots_size
(gauge)
The disk space truly used by snapshots.
Shown as byte
cassandra.ss_tables_per_read_histogram.75th_percentile
(gauge)
The number of SSTable data files accessed per read - p75.
Shown as file
cassandra.ss_tables_per_read_histogram.95th_percentile
(gauge)
The number of SSTable data files accessed per read - p95.
Shown as file
cassandra.timeouts.count
(gauge)
Count of requests not acknowledged within configurable timeout window.
Shown as timeout
cassandra.timeouts.one_minute_rate
(gauge)
Recent timeout rate, as an exponentially weighted moving average over a one-minute interval.
Shown as timeout
cassandra.tombstone_scanned_histogram.75th_percentile
(gauge)
Number of tombstones scanned per read - p75.
Shown as record
cassandra.tombstone_scanned_histogram.95th_percentile
(gauge)
Number of tombstones scanned per read - p95.
Shown as record
cassandra.total_blocked_tasks
(gauge)
Total blocked tasks
Shown as task
cassandra.total_blocked_tasks.count
(count)
Total count of blocked tasks
Shown as task
cassandra.total_commit_log_size
(gauge)
The size used on disk by commit logs.
Shown as byte
cassandra.total_disk_space_used.count
(gauge)
Total disk space used by SSTables including obsolete ones waiting to be GC'd.
Shown as byte
cassandra.view_lock_acquire_time.75th_percentile
(gauge)
The time taken acquiring a partition lock for materialized view updates - p75.
Shown as microsecond
cassandra.view_lock_acquire_time.95th_percentile
(gauge)
The time taken acquiring a partition lock for materialized view updates - p95.
Shown as microsecond
cassandra.view_lock_acquire_time.one_minute_rate
(gauge)
The number of requests to acquire a partition lock for materialized view updates.
Shown as request
cassandra.view_read_time.75th_percentile
(gauge)
The time taken during the local read of a materialized view update - p75.
Shown as microsecond
cassandra.view_read_time.95th_percentile
(gauge)
The time taken during the local read of a materialized view update - p95.
Shown as microsecond
cassandra.view_read_time.one_minute_rate
(gauge)
The number of local reads for materialized view updates.
Shown as request
cassandra.waiting_on_free_memtable_space.75th_percentile
(gauge)
The time spent waiting for free memtable space either on- or off-heap - p75.
Shown as microsecond
cassandra.waiting_on_free_memtable_space.95th_percentile
(gauge)
The time spent waiting for free memtable space either on- or off-heap - p95.
Shown as microsecond
cassandra.write_latency.75th_percentile
(gauge)
The local write latency - p75.
Shown as microsecond
cassandra.write_latency.95th_percentile
(gauge)
The local write latency - p95.
Shown as microsecond
cassandra.write_latency.99th_percentile
(gauge)
The local write latency - p99.
Shown as microsecond
cassandra.write_latency.one_minute_rate
(gauge)
The number of local write requests.
Shown as write

이벤트

Cassandra 점검은 이벤트를 포함하지 않습니다.

서비스 검사

cassandra.can_connect
Returns CRITICAL if the Agent is unable to connect to and collect metrics from the monitored Cassandra instance, WARNING if no metrics are collected, and OK otherwise.
Statuses: ok, critical, warning

트러블슈팅

도움이 필요하신가요? Datadog 지원팀에 문의하세요.

참고 자료

Cassandra Nodetool 통합

Cassandra default dashboard

개요

이 점검은 jmx 통합을 통해 사용할 수 없는 Cassandra 클러스터에 대한 메트릭을 수집합니다. nodetool 유틸리티를 사용해 수집하세요.

설정

설치

Cassandra Nodetool 점검은 Datadog 에이전트 패키지에 포함되어 있으므로 Cassandra 노드에 아무 것도 설치할 필요가 없습니다.

설정

아래 지침에 따라 호스트에서 실행되는 에이전트에 대해 이 점검을 설정하세요. 컨테이너화된 환경의 경우 컨테이너화 섹션을 참조하세요.

호스트

  1. 에이전트 설정 디렉터리 루트에 있는 conf.d/ 폴더에서 cassandra_nodetool.d/conf.yaml 파일을 편집하세요. 사용 가능한 모든 설정 옵션은 sample cassandra_nodetool.d/conf.yaml을 참조하세요.

    init_config:
    
    instances:
      ## @param keyspaces - list of string - required
      ## The list of keyspaces to monitor.
      ## An empty list results in no metrics being sent.
      #
      - keyspaces:
          - "<KEYSPACE_1>"
          - "<KEYSPACE_2>"
    
  2. Agent를 다시 시작합니다.

로그 수집

Cassandra 통합이 Cassandra Nodetool 로그를 수집합니다. Cassandra 로그 수집 지침을 참조하세요.

컨테이너화된 환경

컨테이너화된 환경의 경우 포드에서 공식 프로모테우스 엑스포터를 사용하세요. 그런 다음 에이전트의 자동탐지를 사용해 포드를 찾고 엔드포인트를 쿼리하세요.

검증

에이전트의 status 하위 명령을 실행하고 점검 섹션 아래에서 cassandra_nodetool를 찾습니다.

수집한 데이터

메트릭

cassandra.nodetool.status.load
(gauge)
Amount of file system data under the cassandra data directory without snapshot content
Shown as byte
cassandra.nodetool.status.owns
(gauge)
Percentage of the data owned by the node per datacenter times the replication factor
Shown as percent
cassandra.nodetool.status.replication_availability
(gauge)
Percentage of data available per keyspace times replication factor
Shown as percent
cassandra.nodetool.status.replication_factor
(gauge)
Replication factor per keyspace
cassandra.nodetool.status.status
(gauge)
Node status: up (1) or down (0)

이벤트

Cassandra_nodetool 점검은 이벤트를 포함하지 않습니다.

서비스 검사

cassandra.nodetool.node_up
The agent sends this service check for each node of the monitored cluster. Returns CRITICAL if the node is down, otherwise OK.
Statuses: ok, critical

트러블슈팅

도움이 필요하신가요? Datadog 지원팀에 문의하세요.

참고 자료