Scylla
Security Monitoring is now available Security Monitoring is now available

Scylla

Agent Check Agent Check

Supported OS: Linux Mac OS Windows

Overview

This Datadog-Scylla integration collects a majority of the exposed metrics by default, with the ability to customize additional groups based on specific user needs.

Scylla is an open-source NoSQL data store that can act as “a drop-in Apache Cassandra alternative.” It has rearchitected the Cassandra model tuned for modern hardware, reducing the size of required clusters while improving theoretical throughput and performance.

Setup

Follow the instructions below to install and configure this check for an Agent running on a host.

Installation

The Scylla check is included in the Datadog Agent package. No additional installation is needed on your server.

Configuration

  1. Edit the scylla.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your scylla performance data. See the sample scylla.d/conf.yaml for all available configuration options.

  2. Restart the Agent.

Log collection

Scylla has different modes of outputting logs depending on the environment it’s running in. See the Scylla documentation for more specifics on how the application generates logs.

  1. Collecting logs is disabled by default in the Datadog Agent, enable it in your datadog.yaml file:

       logs_enabled: true
  2. Uncomment and edit the logs configuration block in your scylla.d/conf.yaml file. Change the type, path, and service parameter values based on your environment. See the sample scylla.d/conf.yaml for all available configuration options.

       logs:
         - type: file
           path: <LOG_FILE_PATH>
           source: scylla
           service: <SERVICE_NAME>
           #To handle multi line that starts with yyyy-mm-dd use the following pattern
           #log_processing_rules:
           #  - type: multi_line
           #    pattern: \d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])
           #    name: new_log_start_with_date
  3. Restart the Agent.

See Datadog’s documentation for additional information on how to configure the Agent for log collection in Kubernetes environments.

Validation

Run the Agent’s status subcommand and look for scylla under the Checks section.

Data Collected

Metrics

scylla.alien.receive_batch_queue_length
(gauge)
Current receive batch queue length
scylla.alien.total_received_messages
(count)
Total number of received messages
scylla.alien.total_sent_messages
(count)
Total number of sent messages
scylla.batchlog_manager.total_write_replay_attempts
(count)
Counts write operations issued in a batchlog replay flow. The high value of this metric indicates that we have a long batch replay list.
Shown as write
scylla.cache.active_reads
(gauge)
number of currently active reads
Shown as read
scylla.cache.bytes_total
(gauge)
total size of memory for the cache
Shown as byte
scylla.cache.bytes_used
(gauge)
current bytes used by the cache out of the total size of memory
Shown as byte
scylla.cache.concurrent_misses_same_key
(count)
total number of operation with misses same key
Shown as miss
scylla.cache.mispopulations
(count)
number of entries not inserted by reads
scylla.cache.partition_evictions
(count)
total number of evicted partitions
Shown as eviction
scylla.cache.partition_hits
(count)
number of partitions needed by reads and found in cache
Shown as hit
scylla.cache.partition_insertions
(count)
total number of partitions added to cache
scylla.cache.partition_merges
(count)
total number of partitions merged
scylla.cache.partition_misses
(count)
number of partitions needed by reads and missing in cache
Shown as miss
scylla.cache.partition_removals
(count)
total number of invalidated partitions
scylla.cache.partitions
(gauge)
total number of cached partitions
scylla.cache.pinned_dirty_memory_overload
(count)
amount of pinned bytes that we tried to unpin over the limit. This should sit constantly at 0; and any number different than 0 is indicative of a bug
Shown as byte
scylla.cache.reads
(count)
number of started reads
Shown as get
scylla.cache.reads_with_misses
(count)
number of reads which had to read from sstables
Shown as read
scylla.cache.row_evictions
(count)
total number of rows evicted from cache
Shown as eviction
scylla.cache.row_hits
(count)
total number of rows needed by reads and found in cache
Shown as hit
scylla.cache.row_insertions
(count)
total number of rows added to cache
Shown as set
scylla.cache.row_misses
(count)
total number of rows needed by reads and missing in cache
Shown as miss
scylla.cache.row_removals
(count)
total number of invalidated rows
Shown as eviction
scylla.cache.rows
(gauge)
total number of cached rows
Shown as row
scylla.cache.rows_dropped_from_memtable
(count)
total number of rows in memtables which were dropped during cache update on memtable flush
Shown as row
scylla.cache.rows_merged_from_memtable
(count)
total number of rows in memtables which were merged with existing rows during cache update on memtable flush
Shown as row
scylla.cache.rows_processed_from_memtable
(count)
total number of rows in memtables which were processed during cache update on memtable flush
Shown as row
scylla.cache.sstable_partition_skips
(count)
number of times sstable reader was fast forwarded across partitions
scylla.cache.sstable_reader_recreations
(count)
number of times sstable reader was recreated due to memtable flush
scylla.cache.sstable_row_skips
(count)
number of times sstable reader was fast forwarded within a partition
Shown as row
scylla.cache.static_row_insertions
(count)
total number of static rows added to cache
Shown as row
scylla.commitlog.alloc
(count)
Counts a number of times a new mutation has been added to a segment. Divide bytes_written by this value to get the average number of bytes per mutation written to the disk.
scylla.commitlog.allocating_segments
(gauge)
Holds the number of not closed segments that still have some free space. This value should not get too high.
scylla.commitlog.bytes_written
(count)
Counts a number of bytes written to the disk. Divide this value by "alloc" to get the average number of bytes per mutation written to the disk.
Shown as byte
scylla.commitlog.cycle
(count)
Counts a number of commitlog write cycles - when the data is written from the internal memory buffer to the disk.
scylla.commitlog.disk_total_bytes
(gauge)
Holds a size of disk space in bytes used for data so far. A too high value indicates that we have some bottleneck in the writing to sstables path.
Shown as byte
scylla.commitlog.flush
(count)
Counts a number of times the flush() method was called for a file.
scylla.commitlog.flush_limit_exceeded
(count)
Counts a number of times a flush limit was exceeded. A non-zero value indicates that there are too many pending flush operations (see pending_flushes) and some of them will be blocked till the total amount of pending flush operations drops below 5.
scylla.commitlog.memory_buffer_bytes
(gauge)
Holds the total number of bytes in internal memory buffers.
Shown as byte
scylla.commitlog.pending_allocations
(gauge)
Holds a number of currently pending allocations. A non-zero value indicates that we have a bottleneck in the disk write flow.
scylla.commitlog.pending_flushes
(gauge)
Holds a number of currently pending flushes. See the related flush_limit_exceeded metric.
scylla.commitlog.requests_blocked_memory
(count)
Counts a number of requests blocked due to memory pressure. A non-zero value indicates that the commitlog memory quota is not enough to serve the required amount of requests.
Shown as request
scylla.commitlog.segments
(gauge)
Holds the current number of segments.
scylla.commitlog.slack
(count)
Counts a number of unused bytes written to the disk due to disk segment alignment.
Shown as byte
scylla.commitlog.unused_segments
(gauge)
Holds the current number of unused segments. A non-zero value indicates that the disk write path became temporary slow.
scylla.compaction_manager.compactions
(gauge)
Holds the number of currently active compactions.
scylla.cql.authorized_prepared_statements_cache_evictions
(count)
Counts a number of authenticated prepared statements cache entries evictions.
scylla.cql.authorized_prepared_statements_cache_size
(gauge)
A number of entries in the authenticated prepared statements cache.
scylla.cql.batches
(count)
Counts a total number of CQL BATCH requests.
scylla.cql.batches_pure_logged
(count)
Counts a total number of LOGGED batches that were executed as LOGGED batches.
scylla.cql.batches_pure_unlogged
(count)
Counts a total number of UNLOGGED batches that were executed as UNLOGGED batches.
scylla.cql.batches_unlogged_from_logged
(count)
Counts a total number of LOGGED batches that were executed as UNLOGGED batches.
scylla.cql.deletes
(count)
Counts a total number of CQL DELETE requests.
scylla.cql.filtered_read_requests
(count)
Counts a total number of CQL read requests that required ALLOW FILTERING. See filtered_rows_read_total to compare how many rows needed to be filtered.
scylla.cql.filtered_rows_dropped_total
(count)
Counts a number of rows read during CQL requests that required ALLOW FILTERING and dropped by the filter. Number similar to filtered_rows_read_total indicates that filtering is not accurate and might cause performance degradation.
Shown as row
scylla.cql.filtered_rows_matched_total
(count)
Counts a number of rows read during CQL requests that required ALLOW FILTERING and accepted by the filter. Number similar to filtered_rows_read_total indicates that filtering is accurate.
Shown as row
scylla.cql.filtered_rows_read_total
(count)
Counts a total number of rows read during CQL requests that required ALLOW FILTERING. See filtered_rows_matched_total and filtered_rows_dropped_total for information how accurate filtering queries are.
Shown as row
scylla.cql.inserts
(count)
Counts a total number of CQL INSERT requests.
scylla.cql.prepared_cache_evictions
(count)
Counts a number of prepared statements cache entries evictions.
Shown as eviction
scylla.cql.prepared_cache_memory_footprint
(gauge)
Size (in bytes) of the prepared statements cache.
Shown as byte
scylla.cql.prepared_cache_size
(gauge)
A number of entries in the prepared statements cache.
scylla.cql.reads
(count)
Counts a total number of CQL read requests.
Shown as read
scylla.cql.reverse_queries
(count)
Counts number of CQL SELECT requests with ORDER BY DESC.
Shown as query
scylla.cql.rows_read
(count)
Counts a total number of rows read during CQL requests.
Shown as row
scylla.cql.secondary_index_creates
(count)
Counts a total number of CQL CREATE INDEX requests.
Shown as query
scylla.cql.secondary_index_drops
(count)
Counts a total number of CQL DROP INDEX requests.
Shown as query
scylla.cql.secondary_index_reads
(count)
Counts a total number of CQL read requests performed using secondary indexes.
Shown as query
scylla.cql.secondary_index_rows_read
(count)
Counts a total number of rows read during CQL requests performed using secondary indexes.
Shown as row
scylla.cql.statements_in_batches
(count)
Counts a total number of sub-statements in CQL BATCH requests.
Shown as query
scylla.cql.unpaged_select_queries
(count)
Counts number of unpaged CQL SELECT requests.
Shown as query
scylla.cql.updates
(count)
Counts a total number of CQL UPDATE requests.
Shown as query
scylla.cql.user_prepared_auth_cache_footprint
(gauge)
Size (in bytes) of the authenticated prepared statements cache.
Shown as byte
scylla.database.active_reads
(gauge)
Holds the number of currently active read operations.
Shown as read
scylla.database.active_reads_memory_consumption
(gauge)
Holds the amount of memory consumed by currently active read operations. If this value gets close to 2474639 we are likely to start dropping new read requests. In that case sstable_read_queue_overloads is going to get a non-zero value.
Shown as byte
scylla.database.clustering_filter_count
(count)
Counts bloom filter invocations.
scylla.database.clustering_filter_fast_path_count
(count)
Counts number of times bloom filtering short cut to include all sstables when only one full range was specified.
scylla.database.clustering_filter_sstables_checked
(count)
Counts sstables checked after applying the bloom filter. High value indicates that bloom filter is not very efficient.
scylla.database.clustering_filter_surviving_sstables
(count)
Counts sstables that survived the clustering key filtering. High value indicates that bloom filter is not very efficient and still have to access a lot of sstables to get data.
scylla.database.counter_cell_lock_acquisition
(count)
The number of acquired counter cell locks.
scylla.database.counter_cell_lock_pending
(gauge)
The number of counter updates waiting for a lock.
scylla.database.dropped_view_updates
(count)
Counts the number of view updates that have been dropped due to cluster overload.
scylla.database.large_partition_exceeding_threshold
(count)
Number of large partitions exceeding compaction_large_partition_warning_threshold_mb. Large partitions have performance impact and should be avoided; check the documentation for details.
scylla.database.multishard_query_failed_reader_saves
(count)
The number of times the saving of a shard reader failed.
scylla.database.multishard_query_failed_reader_stops
(count)
The number of times the stopping of a shard reader failed.
scylla.database.multishard_query_unpopped_bytes
(count)
The total number of bytes that were extracted from the shard reader but were unconsumed by the query and moved back into the reader.
Shown as byte
scylla.database.multishard_query_unpopped_fragments
(count)
The total number of fragments that were extracted from the shard reader but were unconsumed by the query and moved back into the reader.
scylla.database.paused_reads
(gauge)
The number of currently active reads that are temporarily paused.
Shown as read
scylla.database.paused_reads_permit_based_evictions
(count)
The number of paused reads evicted to free up permits. Permits are required for new reads to start; and the database will evict paused reads (if any) to be able to admit new ones; if there is a shortage of permits.
Shown as eviction
scylla.database.querier_cache_drops
(count)
Counts querier cache lookups that found a cached querier but had to drop it due to position mismatch
Shown as eviction
scylla.database.querier_cache_lookups
(count)
Counts querier cache lookups (paging queries)
Shown as get
scylla.database.querier_cache_memory_based_evictions
(count)
Counts querier cache entries that were evicted because the memory usage of the cached queriers were above the limit.
Shown as eviction
scylla.database.querier_cache_misses
(count)
Counts querier cache lookups that failed to find a cached querier
Shown as miss
scylla.database.querier_cache_population
(gauge)
The number of entries currently in the querier cache.
scylla.database.querier_cache_resource_based_evictions
(count)
Counts querier cache entries that were evicted to free up resources (limited by reader concurency limits) necessary to create new readers.
Shown as eviction
scylla.database.querier_cache_time_based_evictions
(count)
Counts querier cache entries that timed out and were evicted.
Shown as eviction
scylla.database.queued_reads
(gauge)
Holds the number of currently queued read operations.
Shown as read
scylla.database.requests_blocked_memory
(count)
Holds the current number of requests blocked due to reaching the memory quota (27839692B). Non-zero value indicates that our bottleneck is memory and more specifically - the memory quota allocated for the "database" component.
Shown as request
scylla.database.requests_blocked_memory_current
(gauge)
Holds the current number of requests blocked due to reaching the memory quota (27839692B). Non-zero value indicates that our bottleneck is memory and more specifically - the memory quota allocated for the "database" component.
Shown as request
scylla.database.short_data_queries
(count)
The rate of data queries (data or digest reads) that returned less rows than requested due to result size limiting.
Shown as query
scylla.database.short_mutation_queries
(count)
The rate of mutation queries that returned less rows than requested due to result size limiting.
Shown as query
scylla.database.sstable_read_queue_overloads
(count)
Counts the number of times the sstable read queue was overloaded. A non-zero value indicates that we have to drop read requests because they arrive faster than we can serve them.
scylla.database.total_reads
(count)
Counts the total number of successful reads on this shard.
Shown as read
scylla.database.total_reads_failed
(count)
Counts the total number of failed read operations. Add the total_reads to this value to get the total amount of reads issued on this shard.
Shown as read
scylla.database.total_result_bytes
(gauge)
Holds the current amount of memory used for results.
Shown as byte
scylla.database.total_view_updates_failed_local
(count)
Total number of view updates generated for tables and failed to be applied locally.
scylla.database.total_view_updates_failed_remote
(count)
Total number of view updates generated for tables and failed to be sent to remote replicas.
scylla.database.total_view_updates_pushed_local
(count)
Total number of view updates generated for tables and applied locally.
scylla.database.total_view_updates_pushed_remote
(count)
Total number of view updates generated for tables and sent to remote replicas.
scylla.database.total_writes
(count)
Counts the total number of successful write operations performed by this shard.
Shown as write
scylla.database.total_writes_failed
(count)
Counts the total number of failed write operations. A sum of this value plus total_writes represents a total amount of writes attempted on this shard.
Shown as write
scylla.database.total_writes_timedout
(count)
Counts write operations failed due to a timeout. A positive value is a sign of storage being overloaded.
Shown as write
scylla.database.view_building_paused
(count)
Counts the number of times view building process was paused (e.g. due to node unavailability).
scylla.database.view_update_backlog
(count)
Holds the current size in bytes of the pending view updates for all tables
Shown as byte
scylla.execution_stages.function_calls_enqueued
(count)
Counts function calls added to execution stages queues
scylla.execution_stages.function_calls_executed
(count)
Counts function calls executed by execution stages
scylla.execution_stages.tasks_preempted
(count)
Counts tasks which were preempted before execution all queued operations
scylla.execution_stages.tasks_scheduled
(count)
Counts tasks scheduled by execution stages
scylla.gossip.heart_beat
(count)
Heartbeat of the current Node.
scylla.hints.for_views_manager_corrupted_files
(count)
Number of hints files that were discarded during sending because the file was corrupted.
Shown as file
scylla.hints.for_views_manager_discarded
(count)
Number of hints that were discarded during sending (too old; schema changed; etc.).
scylla.hints.for_views_manager_dropped
(count)
Number of dropped hints.
scylla.hints.for_views_manager_errors
(count)
Number of errors during hints writes.
Shown as error
scylla.hints.for_views_manager_sent
(count)
Number of sent hints.
scylla.hints.for_views_manager_size_of_hints_in_progress
(gauge)
Size of hinted mutations that are scheduled to be written.
scylla.hints.for_views_manager_written
(count)
Number of successfully written hints.
scylla.hints.manager_corrupted_files
(count)
Number of hints files that were discarded during sending because the file was corrupted.
scylla.hints.manager_discarded
(count)
Number of hints that were discarded during sending (too old; schema changed; etc.).
scylla.hints.manager_dropped
(count)
Number of dropped hints.
scylla.hints.manager_errors
(count)
Number of errors during hints writes.
Shown as error
scylla.hints.manager_sent
(count)
Number of sent hints.
scylla.hints.manager_size_of_hints_in_progress
(gauge)
Size of hinted mutations that are scheduled to be written.
scylla.hints.manager_written
(count)
Number of successfully written hints.
scylla.httpd.connections_current
(gauge)
The current number of open connections
Shown as connection
scylla.httpd.connections_total
(count)
The total number of connections opened
Shown as connection
scylla.httpd.read_errors
(count)
The total number of errors while reading http requests
Shown as error
scylla.httpd.reply_errors
(count)
The total number of errors while replying to http
Shown as error
scylla.httpd.requests_served
(count)
The total number of http requests served
Shown as request
scylla.io_queue.delay
(gauge)
total delay time in the queue
scylla.io_queue.queue_length
(gauge)
Number of requests in the queue
Shown as request
scylla.io_queue.shares
(gauge)
current amount of shares
scylla.io_queue.total_bytes
(count)
Total bytes passed in the queue
Shown as byte
scylla.io_queue.total_operations
(count)
Total bytes passed in the queue
Shown as byte
scylla.lsa.free_space
(gauge)
Holds a current amount of free memory that is under lsa control.
Shown as byte
scylla.lsa.large_objects_total_space_bytes
(gauge)
Holds a current size of allocated non-LSA memory.
Shown as byte
scylla.lsa.memory_allocated
(count)
Counts number of bytes which were requested from LSA allocator.
Shown as byte
scylla.lsa.memory_compacted
(count)
Counts number of bytes which were copied as part of segment compaction.
Shown as byte
scylla.lsa.non_lsa_used_space_bytes
(gauge)
Holds a current amount of used non-LSA memory.
Shown as byte
scylla.lsa.occupancy
(gauge)
Holds a current portion (in percents) of the used memory.
Shown as percent
scylla.lsa.segments_compacted
(count)
Counts a number of compacted segments.
scylla.lsa.segments_migrated
(count)
Counts a number of migrated segments.
scylla.lsa.small_objects_total_space_bytes
(gauge)
Holds a current size of "small objects" memory region in bytes.
Shown as byte
scylla.lsa.small_objects_used_space_bytes
(gauge)
Holds a current amount of used "small objects" memory in bytes.
Shown as byte
scylla.lsa.total_space_bytes
(gauge)
Holds a current size of allocated memory in bytes.
Shown as byte
scylla.lsa.used_space_bytes
(gauge)
Holds a current amount of used memory in bytes.
Shown as byte
scylla.memory.allocated_memory
(count)
Allocated memory size in bytes
Shown as byte
scylla.memory.cross_cpu_free_operations
(count)
Total number of cross cpu free
scylla.memory.dirty_bytes
(gauge)
Holds the current size of all ("regular"; "system" and "streaming") non-free memory in bytes: used memory + released memory that hasn't been returned to a free memory pool yet. Total memory size minus this value represents the amount of available memory. If this value minus virtual_dirty_bytes is too high then this means that the dirty memory eviction lags behind.
Shown as byte
scylla.memory.free_memory
(count)
Free memory size in bytes
Shown as byte
scylla.memory.free_operations
(count)
Total number of free operations
scylla.memory.malloc_live_objects
(gauge)
Number of live objects
scylla.memory.malloc_operations
(count)
Total number of malloc operations
scylla.memory.reclaims_operations
(count)
Total reclaims operations
scylla.memory.regular_dirty_bytes
(gauge)
Holds the current size of a all non-free memory in bytes: used memory + released memory that hasn't been returned to a free memory pool yet. Total memory size minus this value represents the amount of available memory. If this value minus virtual_dirty_bytes is too high then this means that the dirty memory eviction lags behind.
Shown as byte
scylla.memory.regular_virtual_dirty_bytes
(gauge)
Holds the size of used memory in bytes. Compare it to "dirty_bytes" to see how many memory is wasted (neither used nor available).
Shown as byte
scylla.memory.streaming_dirty_bytes
(gauge)
Holds the current size of a all non-free memory in bytes: used memory + released memory that hasn't been returned to a free memory pool yet. Total memory size minus this value represents the amount of available memory. If this value minus virtual_dirty_bytes is too high then this means that the dirty memory eviction lags behind.
Shown as byte
scylla.memory.streaming_virtual_dirty_bytes
(gauge)
Holds the size of used memory in bytes. Compare it to "dirty_bytes" to see how many memory is wasted (neither used nor available).
Shown as byte
scylla.memory.system_dirty_bytes
(gauge)
Holds the current size of a all non-free memory in bytes: used memory + released memory that hasn't been returned to a free memory pool yet. Total memory size minus this value represents the amount of available memory. If this value minus virtual_dirty_bytes is too high then this means that the dirty memory eviction lags behind.
Shown as byte
scylla.memory.system_virtual_dirty_bytes
(gauge)
Holds the size of used memory in bytes. Compare it to "dirty_bytes" to see how many memory is wasted (neither used nor available).
Shown as byte
scylla.memory.total_memory
(count)
Total memory size in bytes
Shown as byte
scylla.memory.virtual_dirty_bytes
(gauge)
Holds the size of all ("regular"; "system" and "streaming") used memory in bytes. Compare it to "dirty_bytes" to see how many memory is wasted (neither used nor available).
Shown as byte
scylla.memtables.pending_flushes
(gauge)
Holds the current number of memtables that are currently being flushed to sstables. High value in this metric may be an indication of storage being a bottleneck.
scylla.memtables.pending_flushes_bytes
(gauge)
Holds the current number of bytes in memtables that are currently being flushed to sstables. High value in this metric may be an indication of storage being a bottleneck.
Shown as byte
scylla.node.operation_mode
(gauge)
The operation mode of the current node. UNKNOWN = 0; STARTING = 1; JOINING = 2; NORMAL = 3; LEAVING = 4; DECOMMISSIONED = 5; DRAINING = 6; DRAINED = 7; MOVING = 8
scylla.query_processor.queries
(count)
Counts queries by consistency level.
Shown as query
scylla.query_processor.statements_prepared
(count)
Counts a total number of parsed CQL requests.
scylla.reactor.aio_bytes_read
(count)
Total aio-reads bytes
Shown as byte
scylla.reactor.aio_bytes_write
(count)
Total aio-writes bytes
Shown as byte
scylla.reactor.aio_errors
(count)
Total aio errors
Shown as error
scylla.reactor.aio_reads
(count)
Total aio-reads operations
Shown as read
scylla.reactor.aio_writes
(count)
Total aio-writes operations
Shown as write
scylla.reactor.cpp_exceptions
(count)
Total number of C++ exceptions
scylla.reactor.cpu_busy_ms
(count)
Total cpu busy time in milliseconds
Shown as millisecond
scylla.reactor.cpu_steal_time_ms
(count)
Total steal time; the time in which some other process was running while Seastar was not trying to run (not sleeping).Because this is in userspace; some time that could be legitimally thought as steal time is not accounted as such. For example; if we are sleeping and can wake up but the kernel hasn't woken us up yet.
Shown as millisecond
scylla.reactor.fstream_read_bytes
(count)
Counts bytes read from disk file streams. A high rate indicates high disk activity. Divide by fstream_reads to determine average read size.
Shown as byte
scylla.reactor.fstream_read_bytes_blocked
(count)
Counts the number of bytes read from disk that could not be satisfied from read-ahead buffers; and had to block. Indicates short streams; or incorrect read ahead configuration.
Shown as byte
scylla.reactor.fstream_reads
(count)
Counts reads from disk file streams. A high rate indicates high disk activity. Contrast with other fstream_read* counters to locate bottlenecks.
Shown as read
scylla.reactor.fstream_reads_ahead_bytes_discarded
(count)
Counts the number of buffered bytes that were read ahead of time and were discarded because they were not needed; wasting disk bandwidth. Indicates over-eager read ahead configuration.
Shown as byte
scylla.reactor.fstream_reads_aheads_discarded
(count)
Counts the number of times a buffer that was read ahead of time and was discarded because it was not needed; wasting disk bandwidth. Indicates over-eager read ahead configuration.
scylla.reactor.fstream_reads_blocked
(count)
Counts the number of times a disk read could not be satisfied from read-ahead buffers; and had to block. Indicates short streams; or incorrect read ahead configuration.
scylla.reactor.fsyncs
(count)
Total number of fsync operations
scylla.reactor.io_queue_requests
(gauge)
Number of requests in the io queue
Shown as request
scylla.reactor.io_threaded_fallbacks
(count)
Total number of io-threaded-fallbacks operations
scylla.reactor.logging_failures
(count)
Total number of logging failures
scylla.reactor.polls
(count)
Number of times pollers were executed
scylla.reactor.tasks_pending
(gauge)
Number of pending tasks in the queue
Shown as task
scylla.reactor.tasks_processed
(count)
Total tasks processed
Shown as task
scylla.reactor.timers_pending
(count)
Number of tasks in the timer-pending queue
Shown as task
scylla.reactor.utilization
(gauge)
CPU utilization
Shown as percent
scylla.scheduler.queue_length
(gauge)
Size of backlog on this queue; in tasks; indicates whether the queue is busy and/or contended
scylla.scheduler.runtime_ms
(count)
Accumulated runtime of this task queue; an increment rate of 1000ms per second indicates full utilization
Shown as millisecond
scylla.scheduler.shares
(gauge)
Shares allocated to this queue
scylla.scheduler.tasks_processed
(count)
Count of tasks executing on this queue; indicates together with runtime_ms indicates length of tasks
scylla.scheduler.time_spent_on_task_quota_violations_ms
(count)
Total amount in milliseconds we were in violation of the task quota
Shown as millisecond
scylla.sstables.capped_local_deletion_time
(count)
Was local deletion time capped at maximum allowed value in Statistics
Shown as time
scylla.sstables.capped_tombstone_deletion_time
(count)
Was partition tombstone deletion time capped at maximum allowed value
Shown as time
scylla.sstables.cell_tombstone_writes
(count)
Number of cell tombstones written
scylla.sstables.cell_writes
(count)
Number of cells written
scylla.sstables.index_page_blocks
(count)
Index page requests which needed to wait due to page not being loaded yet
Shown as request
scylla.sstables.index_page_hits
(count)
Index page requests which could be satisfied without waiting
Shown as request
scylla.sstables.index_page_misses
(count)
Index page requests which initiated a read from disk
Shown as request
scylla.sstables.partition_reads
(count)
Number of partitions read
Shown as read
scylla.sstables.partition_seeks
(count)
Number of partitions seeked
scylla.sstables.partition_writes
(count)
Number of partitions written
Shown as write
scylla.sstables.range_partition_reads
(count)
Number of partition range flat mutation reads
Shown as read
scylla.sstables.range_tombstone_writes
(count)
Number of range tombstones written
scylla.sstables.row_reads
(count)
Number of rows read
Shown as row
scylla.sstables.row_writes
(count)
Number of clustering rows written
Shown as row
scylla.sstables.single_partition_reads
(count)
Number of single partition flat mutation reads
Shown as read
scylla.sstables.sstable_partition_reads
(count)
Number of whole sstable flat mutation reads
Shown as read
scylla.sstables.static_row_writes
(count)
Number of static rows written
Shown as row
scylla.sstables.tombstone_writes
(count)
Number of tombstones written
scylla.storage.proxy.coordinator_background_read_repairs
(count)
number of background read repairs
Shown as read
scylla.storage.proxy.coordinator_background_reads
(gauge)
number of currently pending background read requests
Shown as read
scylla.storage.proxy.coordinator_background_replica_writes_failed_local_node
(count)
number of replica writes that timed out or failed after CL was reachedon a local Node
scylla.storage.proxy.coordinator_background_write_bytes
(count)
number of bytes in pending background write requests
Shown as byte
scylla.storage.proxy.coordinator_background_writes
(gauge)
number of currently pending background write requests
Shown as write
scylla.storage.proxy.coordinator_background_writes_failed
(count)
number of write requests that failed after CL was reached
Shown as write
scylla.storage.proxy.coordinator_canceled_read_repairs
(count)
number of global read repairs canceled due to a concurrent write
Shown as read
scylla.storage.proxy.coordinator_completed_reads_local_node
(count)
number of data read requests that completed on a local Node
Shown as read
scylla.storage.proxy.coordinator_current_throttled_base_writes
(gauge)
number of currently throttled base replica write requests
Shown as write
scylla.storage.proxy.coordinator_current_throttled_writes
(gauge)
number of currently throttled write requests
Shown as write
scylla.storage.proxy.coordinator_foreground_read_repair
(count)
number of foreground read repairs
Shown as read
scylla.storage.proxy.coordinator_foreground_reads
(gauge)
number of currently pending foreground read requests
Shown as read
scylla.storage.proxy.coordinator_foreground_writes
(gauge)
number of currently pending foreground write requests
Shown as write
scylla.storage.proxy.coordinator_last_mv_flow_control_delay
(gauge)
delay (in seconds) added for MV flow control in the last request
Shown as second
scylla.storage.proxy.coordinator_queued_write_bytes
(count)
number of bytes in pending write requests
Shown as byte
scylla.storage.proxy.coordinator_range_timeouts
(count)
number of range read operations failed due to a timeout
Shown as error
scylla.storage.proxy.coordinator_range_unavailable
(count)
number of range read operations failed due to an "unavailable" error
Shown as error
scylla.storage.proxy.coordinator_read_errors_local_node
(count)
number of data read requests that failed on a local Node
Shown as error
scylla.storage.proxy.coordinator_read_latency.count
(count)
The general read latency histogram
Shown as read
scylla.storage.proxy.coordinator_read_latency.sum
(gauge)
The general read latency histogram
Shown as read
scylla.storage.proxy.coordinator_read_repair_write_attempts_local_node
(count)
number of write operations in a read repair context on a local Node
Shown as write
scylla.storage.proxy.coordinator_read_retries
(count)
number of read retry attempts
Shown as read
scylla.storage.proxy.coordinator_read_timeouts
(count)
number of read request failed due to a timeout
Shown as error
scylla.storage.proxy.coordinator_read_unavailable
(count)
number read requests failed due to an "unavailable" error
Shown as error
scylla.storage.proxy.coordinator_reads_local_node
(count)
number of data read requests on a local Node
Shown as read
scylla.storage.proxy.coordinator_speculative_data_reads
(count)
number of speculative data read requests that were sent
Shown as read
scylla.storage.proxy.coordinator_speculative_digest_reads
(count)
number of speculative digest read requests that were sent
Shown as read
scylla.storage.proxy.coordinator_throttled_writes
(count)
number of throttled write requests
Shown as write
scylla.storage.proxy.coordinator_total_write_attempts_local_node
(count)
total number of write requests on a local Node
Shown as write
scylla.storage.proxy.coordinator_write_errors_local_node
(count)
number of write requests that failed on a local Node
Shown as error
scylla.storage.proxy.coordinator_write_latency.count
(count)
The general write latency histogram
Shown as write
scylla.storage.proxy.coordinator_write_latency.sum
(gauge)
The general write latency histogram
Shown as write
scylla.storage.proxy.coordinator_write_timeouts
(count)
number of write request failed due to a timeout
Shown as error
scylla.storage.proxy.coordinator_write_unavailable
(count)
number write requests failed due to an "unavailable" error
Shown as error
scylla.storage.proxy.replica_cross_shard_ops
(count)
number of operations that crossed a shard boundary
scylla.storage.proxy.replica_forwarded_mutations
(count)
number of mutations forwarded to other replica Nodes
scylla.storage.proxy.replica_forwarding_errors
(count)
number of errors during forwarding mutations to other replica Nodes
Shown as error
scylla.storage.proxy.replica_reads
(count)
number of remote data read requests this Node received
Shown as read
scylla.storage.proxy.replica_received_counter_updates
(count)
number of counter updates received by this node acting as an update leader
scylla.storage.proxy.replica_received_mutations
(count)
number of mutations received by a replica Node
scylla.streaming.total_incoming_bytes
(count)
Total number of bytes received on this shard.
Shown as byte
scylla.streaming.total_outgoing_bytes
(count)
Total number of bytes sent on this shard.
Shown as byte
scylla.thrift.current_connections
(gauge)
Holds a current number of opened Thrift connections.
Shown as connection
scylla.thrift.served
(count)
Rate of serving Thrift requests.
scylla.thrift.thrift_connections
(count)
Rate of creation of new Thrift connections.
Shown as connection
scylla.tracing.active_sessions
(gauge)
Holds a number of a currently active tracing sessions.
scylla.tracing.cached_records
(gauge)
Holds a number of tracing records cached in the tracing sessions that are not going to be written in the next write event. If sum of this metric; pending_for_write_records and flushing_records is close to 11000 we are likely to start dropping tracing records.
scylla.tracing.dropped_records
(count)
Counts a number of dropped records due to too many pending records. High value indicates that backend is saturated with the rate with which new tracing records are created.
scylla.tracing.dropped_sessions
(count)
Counts a number of dropped sessions due to too many pending sessions/records. High value indicates that backend is saturated with the rate with which new tracing records are created.
scylla.tracing.flushing_records
(gauge)
Holds a number of tracing records that currently being written to the I/O backend. If sum of this metric; cached_records and pending_for_write_records is close to 11000 we are likely to start dropping tracing records.
scylla.tracing.keyspace_helper_bad_column_family_errors
(count)
Counts a number of times write failed due to one of the tables in the system_traces keyspace has an incompatible schema. One error may result one or more tracing records to be lost. Non-zero value indicates that the administrator has to take immediate steps to fix the corresponding schema. The appropriate error message will be printed in the syslog.
Shown as error
scylla.tracing.keyspace_helper_tracing_errors
(count)
Counts a number of errors during writing to a system_traces keyspace. One error may cause one or more tracing records to be lost.
Shown as error
scylla.tracing.pending_for_write_records
(gauge)
Holds a number of tracing records that are going to be written in the next write event. If sum of this metric; cached_records and flushing_records is close to 11000 we are likely to start dropping tracing records.
scylla.tracing.trace_errors
(count)
Counts a number of trace records dropped due to an error (e.g. OOM).
Shown as error
scylla.tracing.trace_records_count
(count)
This metric is a rate of tracing records generation.
scylla.transport.cql_connections
(count)
Counts a number of client connections.
Shown as connection
scylla.transport.current_connections
(gauge)
Holds a current number of client connections.
Shown as connection
scylla.transport.requests_blocked_memory
(count)
Holds an incrementing counter with the requests that ever blocked due to reaching the memory quota limit (12373196B). The first derivative of this value shows how often we block due to memory exhaustion in the "CQL transport" component.
Shown as request
scylla.transport.requests_blocked_memory_current
(gauge)
Holds the number of requests that are currently blocked due to reaching the memory quota limit (12373196B). Non-zero value indicates that our bottleneck is memory and more specifically - the memory quota allocated for the "CQL transport" component.
Shown as request
scylla.transport.requests_served
(count)
Counts a number of served requests.
Shown as request
scylla.transport.requests_serving
(gauge)
Holds a number of requests that are being processed right now.
Shown as request

Service Checks

scylla.prometheus.health: Returns CRITICAL if the Agent cannot reach the metrics endpoints, OK otherwise.

Events

The Scylla check does not include any events.

Troubleshooting

Need help? Contact Datadog support.