Cassandra

Supported OS Linux Windows Mac OS

Intégration1.16.1

dashboard par défaut de Cassandra

Présentation

Recueillez des métriques de Cassandra en temps réel pour :

  • Visualiser et surveiller les états de Cassandra
  • Être informé des failovers et des événements de Cassandra

Implémentation

Installation

Le check Cassandra est inclus avec le package de l’Agent Datadog : vous n’avez donc rien d’autre à installer sur vos nœuds Cassandra. Nous vous conseillons d’utiliser le JDK d’Oracle pour cette intégration.

Remarque : ce check prévoit une limite de 350 métriques par instance. Le nombre de métriques renvoyées est indiqué dans la page d’information. Vous pouvez choisir des métriques pertinentes en modifiant la configuration ci-dessous. Pour découvrir comment modifier la liste des métriques à recueillir, consultez la section JMX afin d’obtenir des instructions détaillées. Si vous souhaitez surveiller plus de 350 métriques, contactez l’assistance Datadog.

Configuration

Collecte de métriques
  1. La configuration par défaut de votre fichier cassandra.d/conf.yaml active la collecte de vos métriques Cassandra. Consultez le fichier d’exemple cassandra.d/conf.yaml pour découvrir toutes les options de configuration disponibles.

  2. Redémarrez l’Agent.

Collecte de logs

Disponible à partir des versions > 6.0 de l’Agent

Pour les environnements conteneurisés, suivez les instructions de la section Collecte de logs Kubernetes ou de la section Collecte de logs Docker.

  1. La collecte de logs est désactivée par défaut dans l’Agent Datadog. Vous devez l’activer dans datadog.yaml :

    logs_enabled: true
    
  2. Ajoutez ce bloc de configuration à votre fichier cassandra.d/conf.yaml pour commencer à recueillir vos logs Cassandra :

      logs:
        - type: file
          path: /var/log/cassandra/*.log
          source: cassandra
          service: myapplication
          log_processing_rules:
             - type: multi_line
               name: log_start_with_date
               # pattern to match: DEBUG [ScheduledTasks:1] 2019-12-30
               pattern: '[A-Z]+ +\[[^\]]+\] +\d{4}-\d{2}-\d{2}'
    

    Modifiez les valeurs des paramètres path et service et configurez-les pour votre environnement. Consultez le fichier d’exemple cassandra.d/conf.yaml pour découvrir toutes les options de configuration disponibles.

    Pour vous assurer que les stack traces sont bien agrégées en un seul log, vous pouvez ajouter une règle de traitement multiligne.

  3. Redémarrez l’Agent.

Validation

Lancez la sous-commande « status » de l’Agent et cherchez cassandra dans la section Checks.

Données collectées

Métriques

cassandra.active_tasks
(gauge)
The number of tasks that the thread pool is actively executing.
Shown as task
cassandra.bloom_filter_false_ratio
(gauge)
The ratio of Bloom filter false positives to total checks.
Shown as fraction
cassandra.bytes_flushed.count
(gauge)
The amount of data that was flushed since (re)start.
Shown as byte
cassandra.cas_commit_latency.75th_percentile
(gauge)
The latency of paxos commit round - p75.
Shown as microsecond
cassandra.cas_commit_latency.95th_percentile
(gauge)
The latency of paxos commit round - p95.
Shown as microsecond
cassandra.cas_commit_latency.one_minute_rate
(gauge)
The number of paxos commit round per second.
Shown as operation
cassandra.cas_prepare_latency.75th_percentile
(gauge)
The latency of paxos prepare round - p75.
Shown as microsecond
cassandra.cas_prepare_latency.95th_percentile
(gauge)
The latency of paxos prepare round - p95.
Shown as microsecond
cassandra.cas_prepare_latency.one_minute_rate
(gauge)
The number of paxos prepare round per second.
Shown as operation
cassandra.cas_propose_latency.75th_percentile
(gauge)
The latency of paxos propose round - p75.
Shown as microsecond
cassandra.cas_propose_latency.95th_percentile
(gauge)
The latency of paxos propose round - p95.
Shown as microsecond
cassandra.cas_propose_latency.one_minute_rate
(gauge)
The number of paxos propose round per second.
Shown as operation
cassandra.col_update_time_delta_histogram.75th_percentile
(gauge)
The column update time delta - p75.
Shown as microsecond
cassandra.col_update_time_delta_histogram.95th_percentile
(gauge)
The column update time delta - p95.
Shown as microsecond
cassandra.col_update_time_delta_histogram.min
(gauge)
The column update time delta - min.
Shown as microsecond
cassandra.compaction_bytes_written.count
(gauge)
The amount of data that was compacted since (re)start.
Shown as byte
cassandra.compression_ratio
(gauge)
The compression ratio for all SSTables. /!\ A low value means a high compression contrary to what the name suggests. Formula used is: 'size of the compressed SSTable / size of original'
Shown as fraction
cassandra.currently_blocked_tasks
(gauge)
The number of currently blocked tasks for the thread pool.
Shown as task
cassandra.currently_blocked_tasks.count
(gauge)
The number of currently blocked tasks for the thread pool.
Shown as task
cassandra.db.droppable_tombstone_ratio
(gauge)
The estimate of the droppable tombstone ratio.
Shown as fraction
cassandra.dropped.one_minute_rate
(gauge)
The tasks dropped during execution for the thread pool.
Shown as thread
cassandra.exceptions.count
(gauge)
The number of exceptions thrown from 'Storage' metrics.
Shown as error
cassandra.key_cache_hit_rate
(gauge)
The key cache hit rate.
Shown as fraction
cassandra.latency.75th_percentile
(gauge)
The client request latency - p75.
Shown as microsecond
cassandra.latency.95th_percentile
(gauge)
The client request latency - p95.
Shown as microsecond
cassandra.latency.one_minute_rate
(gauge)
The number of client requests.
Shown as request
cassandra.live_disk_space_used.count
(gauge)
The disk space used by "live" SSTables (only counts in use files).
Shown as byte
cassandra.live_ss_table_count
(gauge)
Number of "live" (in use) SSTables.
Shown as file
cassandra.load.count
(gauge)
The disk space used by live data on a node.
Shown as byte
cassandra.max_partition_size
(gauge)
The size of the largest compacted partition.
Shown as byte
cassandra.max_row_size
(gauge)
The size of the largest compacted row.
Shown as byte
cassandra.mean_partition_size
(gauge)
The average size of compacted partition.
Shown as byte
cassandra.mean_row_size
(gauge)
The average size of compacted rows.
Shown as byte
cassandra.net.down_endpoint_count
(gauge)
The number of unhealthy nodes in the cluster. They represent each individual node's view of the cluster and thus should not be summed across reporting nodes.
Shown as node
cassandra.net.up_endpoint_count
(gauge)
The number of healthy nodes in the cluster. They represent each individual node's view of the cluster and thus should not be summed across reporting nodes.
Shown as node
cassandra.pending_compactions
(gauge)
The number of pending compactions.
Shown as task
cassandra.pending_flushes.count
(gauge)
The number of pending flushes.
Shown as flush
cassandra.pending_tasks
(gauge)
The number of pending tasks for the thread pool.
Shown as task
cassandra.range_latency.75th_percentile
(gauge)
The local range request latency - p75.
Shown as microsecond
cassandra.range_latency.95th_percentile
(gauge)
The local range request latency - p95.
Shown as microsecond
cassandra.range_latency.one_minute_rate
(gauge)
The number of local range requests.
Shown as request
cassandra.read_latency.75th_percentile
(gauge)
The local read latency - p75.
Shown as microsecond
cassandra.read_latency.95th_percentile
(gauge)
The local read latency - p95.
Shown as microsecond
cassandra.read_latency.99th_percentile
(gauge)
The local read latency - p99.
Shown as microsecond
cassandra.read_latency.one_minute_rate
(gauge)
The number of local read requests.
Shown as read
cassandra.row_cache_hit.count
(gauge)
The number of row cache hits.
Shown as hit
cassandra.row_cache_hit_out_of_range.count
(gauge)
The number of row cache hits that do not satisfy the query filter and went to disk.
Shown as hit
cassandra.row_cache_miss.count
(gauge)
The number of table row cache misses.
Shown as miss
cassandra.snapshots_size
(gauge)
The disk space truly used by snapshots.
Shown as byte
cassandra.ss_tables_per_read_histogram.75th_percentile
(gauge)
The number of SSTable data files accessed per read - p75.
Shown as file
cassandra.ss_tables_per_read_histogram.95th_percentile
(gauge)
The number of SSTable data files accessed per read - p95.
Shown as file
cassandra.timeouts.count
(gauge)
Count of requests not acknowledged within configurable timeout window.
Shown as timeout
cassandra.timeouts.one_minute_rate
(gauge)
Recent timeout rate, as an exponentially weighted moving average over a one-minute interval.
Shown as timeout
cassandra.tombstone_scanned_histogram.75th_percentile
(gauge)
Number of tombstones scanned per read - p75.
Shown as record
cassandra.tombstone_scanned_histogram.95th_percentile
(gauge)
Number of tombstones scanned per read - p95.
Shown as record
cassandra.total_blocked_tasks
(gauge)
Total blocked tasks
Shown as task
cassandra.total_blocked_tasks.count
(count)
Total count of blocked tasks
Shown as task
cassandra.total_commit_log_size
(gauge)
The size used on disk by commit logs.
Shown as byte
cassandra.total_disk_space_used.count
(gauge)
Total disk space used by SSTables including obsolete ones waiting to be GC'd.
Shown as byte
cassandra.view_lock_acquire_time.75th_percentile
(gauge)
The time taken acquiring a partition lock for materialized view updates - p75.
Shown as microsecond
cassandra.view_lock_acquire_time.95th_percentile
(gauge)
The time taken acquiring a partition lock for materialized view updates - p95.
Shown as microsecond
cassandra.view_lock_acquire_time.one_minute_rate
(gauge)
The number of requests to acquire a partition lock for materialized view updates.
Shown as request
cassandra.view_read_time.75th_percentile
(gauge)
The time taken during the local read of a materialized view update - p75.
Shown as microsecond
cassandra.view_read_time.95th_percentile
(gauge)
The time taken during the local read of a materialized view update - p95.
Shown as microsecond
cassandra.view_read_time.one_minute_rate
(gauge)
The number of local reads for materialized view updates.
Shown as request
cassandra.waiting_on_free_memtable_space.75th_percentile
(gauge)
The time spent waiting for free memtable space either on- or off-heap - p75.
Shown as microsecond
cassandra.waiting_on_free_memtable_space.95th_percentile
(gauge)
The time spent waiting for free memtable space either on- or off-heap - p95.
Shown as microsecond
cassandra.write_latency.75th_percentile
(gauge)
The local write latency - p75.
Shown as microsecond
cassandra.write_latency.95th_percentile
(gauge)
The local write latency - p95.
Shown as microsecond
cassandra.write_latency.99th_percentile
(gauge)
The local write latency - p99.
Shown as microsecond
cassandra.write_latency.one_minute_rate
(gauge)
The number of local write requests.
Shown as write

Événements

Le check Cassandra n’inclut aucun événement.

Checks de service

cassandra.can_connect
Renvoie CRITICAL si l’Agent n’est pas capable de se connecter à l’instance Cassandra qu’il surveille et d’y recueillir des métriques. Si ce n’est pas le cas, renvoie OK.
Statuses: ok, critical

Dépannage

Besoin d’aide ? Contactez l’assistance Datadog.

Pour aller plus loin

Intégration Cassandra Nodetool

Dashboard par défaut Cassandra

Présentation

Ce check permet de recueillir des métriques pour votre cluster Cassandra qui ne sont pas proposées par l’intégration jmx. Cette collecte repose sur l’utilitaire nodetool.

Implémentation

Installation

Le check Cassandra Nodetool est inclus avec le package de l’Agent Datadog : vous n’avez donc rien d’autre à installer sur vos nœuds Cassandra.

Configuration

Suivez les instructions ci-dessous pour configurer ce check lorsque l’Agent est exécuté sur un host. Consultez la section Environnement conteneurisé pour la configuration dans un environnement conteneurisé.

Host

  1. Modifiez le fichier cassandra_nodetool.d/conf.yaml dans le dossier conf.d/ à la racine du répertoire de configuration de votre Agent. Consultez le fichier d’exemple cassandra_nodetool.d/conf.yaml pour découvrir toutes les options de configuration disponibles :

    init_config:
    
    instances:
      ## @param keyspaces - list of string - required
      ## The list of keyspaces to monitor.
      ## An empty list results in no metrics being sent.
      #
      - keyspaces:
          - "<KEYSPACE_1>"
          - "<KEYSPACE_2>"
    
  2. Redémarrez l’Agent.

Collecte de logs

Les logs Cassandra Nodetool sont recueillis par l’intégration Cassandra. Consultez les instructions à ce sujet.

Environnement conteneurisé

Pour les environnements conteneurisés, utilisez l’exportateur Prometheus officiel dans le pod, puis servez-vous de la fonction Autodiscovery dans l’Agent pour rechercher le pod et interroger l’endpoint.

Validation

Lancez la sous-commande status de l’Agent et cherchez cassandra_nodetool dans la section Checks.

Données collectées

Métriques

cassandra.nodetool.status.load
(gauge)
Amount of file system data under the cassandra data directory without snapshot content
Shown as byte
cassandra.nodetool.status.owns
(gauge)
Percentage of the data owned by the node per datacenter times the replication factor
Shown as percent
cassandra.nodetool.status.replication_availability
(gauge)
Percentage of data available per keyspace times replication factor
Shown as percent
cassandra.nodetool.status.replication_factor
(gauge)
Replication factor per keyspace
cassandra.nodetool.status.status
(gauge)
Node status: up (1) or down (0)

Événements

Le check Cassandra_nodetool n’inclut aucun événement.

Checks de service

cassandra.nodetool.node_up
L’Agent envoie ce check de service pour chaque nœud du cluster surveillé. Renvoie CRITICAL si le nœud est indisponible. Si ce n’est pas le cas, renvoie OK.
Statuses: ok, critical

Dépannage

Besoin d’aide ? Contactez l’assistance Datadog.

Pour aller plus loin