ClickHouse
Rapport de recherche Datadog : Bilan sur l'adoption de l'informatique sans serveur Rapport : Bilan sur l'adoption de l'informatique sans serveur

ClickHouse

Agent Check Check de l'Agent

Supported OS: Linux Mac OS Windows

Présentation

Ce check permet de surveiller ClickHouse avec l’Agent Datadog.

Configuration

Suivez les instructions ci-dessous pour installer et configurer ce check lorsque l’Agent est exécuté sur un host. Consultez la documentation relative aux modèles d’intégration Autodiscovery pour découvrir comment appliquer ces instructions à un environnement conteneurisé.

Installation

Le check ClickHouse est inclus avec le paquet de l’Agent Datadog. Vous n’avez donc rien d’autre à installer sur votre serveur.

Configuration

Host

Collecte de métriques

  1. Pour commencer à recueillir vos données de performance ClickHouse, modifiez le fichier clickhouse.d/conf.yaml dans le dossier conf.d/ à la racine du répertoire de configuration de votre Agent. Consultez le fichier d’exemple clickhouse.d/conf.yaml pour découvrir toutes les options de configuration disponibles.

  2. Redémarrez l’Agent.

Collecte de logs
  1. La collecte de logs est désactivée par défaut dans l’Agent Datadog. Vous devez l’activer dans datadog.yaml :

    logs_enabled: true
  2. Ajoutez les fichiers de logs qui vous intéressent à votre fichier clickhouse.d/conf.yaml pour commencer à recueillir vos logs ClickHouse :

     logs:
       - type: file
         path: /var/log/clickhouse-server/clickhouse-server.log
         source: clickhouse
         service: "<SERVICE_NAME>"

    Modifiez les valeurs des paramètres path et service et configurez-les pour votre environnement. Consultez le fichier d’exemple clickhouse.d/conf.yaml pour découvrir toutes les options de configuration disponibles.

  3. Redémarrez l’Agent.

Environnement conteneurisé

Consultez la documentation relative aux modèles d’intégration Autodiscovery pour découvrir comment appliquer les paramètres ci-dessous à un environnement conteneurisé.

Collecte de métriques

ParamètreValeur
<NOM_INTÉGRATION>clickhouse
<CONFIG_INIT>vide ou {}
<CONFIG_INSTANCE>{"server": "%%host%%", "port": "%%port%%", "username": "<UTILISATEUR>", "password": "<MOTDEPASSE>"}
Collecte de logs

La collecte des logs est désactivée par défaut dans l’Agent Datadog. Pour l’activer, consultez la section Collecte de logs avec Kubernetes.

ParamètreValeur
<CONFIG_LOG>{"source": "clickhouse", "service": "<NOM_SERVICE>"}

Validation

Lancez la sous-commande status de l’Agent et cherchez clickhouse dans la section Checks.

Données collectées

Métriques

clickhouse.background_pool.processing.task.active
(gauge)
The number of active tasks in BackgroundProcessingPool (merges, mutations, fetches, or replication queue bookkeeping)
Shown as task
clickhouse.background_pool.move.task.active
(gauge)
The number of active tasks in BackgroundProcessingPool for moves
Shown as task
clickhouse.background_pool.schedule.task.active
(gauge)
The number of active tasks in BackgroundSchedulePool. This pool is used for periodic ReplicatedMergeTree tasks, like cleaning old data parts, altering data parts, replica re-initialization, etc.
Shown as task
clickhouse.cache_dictionary.update_queue.batches
(gauge)
Number of 'batches' (a set of keys) in update queue in CacheDictionaries.
clickhouse.cache_dictionary.update_queue.keys
(gauge)
Exact number of keys in update queue in CacheDictionaries.
Shown as key
clickhouse.thread.lock.context.waiting
(gauge)
The number of threads waiting for lock in Context. This is global lock.
Shown as thread
clickhouse.query.insert.delayed
(gauge)
The number of INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree table.
Shown as query
clickhouse.dictionary.request.cache
(gauge)
The number of requests in fly to data sources of dictionaries of cache type.
Shown as request
clickhouse.merge.disk.reserved
(gauge)
Disk space reserved for currently running background merges. It is slightly more than the total size of currently merging parts.
Shown as byte
clickhouse.table.distributed.file.insert.pending
(gauge)
The number of pending files to process for asynchronous insertion into Distributed tables. Number of files for every shard is summed.
Shown as file
clickhouse.table.distributed.connection.inserted
(gauge)
The number of connections to remote servers sending data that was INSERTed into Distributed tables. Both synchronous and asynchronous mode.
Shown as connection
clickhouse.zk.node.ephemeral
(gauge)
The number of ephemeral nodes hold in ZooKeeper.
Shown as node
clickhouse.thread.global.total
(gauge)
The number of threads in global thread pool.
Shown as thread
clickhouse.thread.global.active
(gauge)
The number of threads in global thread pool running a task.
Shown as thread
clickhouse.connection.http
(gauge)
The number of connections to HTTP server
Shown as connection
clickhouse.connection.interserver
(gauge)
The number of connections from other replicas to fetch parts
Shown as connection
clickhouse.replica.leader.election
(gauge)
The number of Replicas participating in leader election. Equals to total number of replicas in usual cases.
Shown as shard
clickhouse.table.replicated.leader
(gauge)
The number of Replicated tables that are leaders. Leader replica is responsible for assigning merges, cleaning old blocks for deduplications and a few more bookkeeping tasks. There may be no more than one leader across all replicas at one moment of time. If there is no leader it will be elected soon or it indicate an issue.
Shown as table
clickhouse.thread.local.total
(gauge)
The number of threads in local thread pools. Should be similar to GlobalThreadActive.
Shown as thread
clickhouse.thread.local.active
(gauge)
The number of threads in local thread pools running a task.
Shown as thread
clickhouse.query.memory
(gauge)
Total amount of memory allocated in currently executing queries. Note that some memory allocations may not be accounted.
Shown as byte
clickhouse.merge.memory
(gauge)
Total amount of memory allocated for background merges. Included in MemoryTrackingInBackgroundProcessingPool. Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks.
Shown as byte
clickhouse.background_pool.processing.memory
(gauge)
Total amount of memory allocated in background processing pool (that is dedicated for backround merges, mutations and fetches). Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks.
Shown as byte
clickhouse.background_pool.move.memory
(gauge)
Total amount of memory (bytes) allocated in background processing pool (that is dedicated for backround moves). Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks.
Shown as byte
clickhouse.background_pool.schedule.memory
(gauge)
Total amount of memory allocated in background schedule pool (that is dedicated for bookkeeping tasks of Replicated tables).
Shown as byte
clickhouse.merge.active
(gauge)
The number of executing background merges
Shown as merge
clickhouse.file.open.read
(gauge)
The number of files open for reading
Shown as file
clickhouse.file.open.write
(gauge)
The number of files open for writing
Shown as file
clickhouse.query.mutation
(gauge)
The number of mutations (ALTER DELETE/UPDATE)
Shown as query
clickhouse.query.active
(gauge)
The number of executing queries
Shown as query
clickhouse.query.waiting
(gauge)
The number of queries that are stopped and waiting due to 'priority' setting.
Shown as query
clickhouse.thread.query
(gauge)
The number of query processing threads
Shown as thread
clickhouse.thread.lock.rw.active.read
(gauge)
The number of threads holding read lock in a table RWLock.
Shown as thread
clickhouse.thread.lock.rw.active.write
(gauge)
The number of threads holding write lock in a table RWLock.
Shown as thread
clickhouse.thread.lock.rw.waiting.read
(gauge)
The number of threads waiting for read on a table RWLock.
Shown as thread
clickhouse.thread.lock.rw.waiting.write
(gauge)
The number of threads waiting for write on a table RWLock.
Shown as thread
clickhouse.syscall.read
(gauge)
The number of read (read, pread, io_getevents, etc.) syscalls in fly
Shown as read
clickhouse.table.replicated.readonly
(gauge)
The number of Replicated tables that are currently in readonly state due to re-initialization after ZooKeeper session loss or due to startup without ZooKeeper configured.
Shown as table
clickhouse.table.replicated.part.check
(gauge)
The number of data parts checking for consistency
Shown as item
clickhouse.table.replicated.part.fetch
(gauge)
The number of data parts being fetched from replica
Shown as item
clickhouse.table.replicated.part.send
(gauge)
The number of data parts being sent to replicas
Shown as item
clickhouse.connection.send.external
(gauge)
The number of connections that are sending data for external tables to remote servers. External tables are used to implement GLOBAL IN and GLOBAL JOIN operators with distributed subqueries.
Shown as connection
clickhouse.connection.send.scalar
(gauge)
The number of connections that are sending data for scalars to remote servers.
Shown as connection
clickhouse.table.buffer.size
(gauge)
Size of buffers of Buffer tables
Shown as byte
clickhouse.table.buffer.row
(gauge)
The number of rows in buffers of Buffer tables
Shown as row
clickhouse.connection.mysql
(gauge)
Number of client connections using MySQL protocol
Shown as connection
clickhouse.connection.tcp
(gauge)
The number of connections to TCP server (clients with native interface)
Shown as connection
clickhouse.syscall.write
(gauge)
The number of write (write, pwrite, io_getevents, etc.) syscalls in fly
Shown as write
clickhouse.zk.request
(gauge)
The number of requests to ZooKeeper in fly.
Shown as request
clickhouse.zk.connection
(gauge)
The number of sessions (connections) to ZooKeeper. Should be no more than one, because using more than one connection to ZooKeeper may lead to bugs due to lack of linearizability (stale reads) that ZooKeeper consistency model allows.
Shown as connection
clickhouse.zk.watch
(gauge)
The number of watches (event subscriptions) in ZooKeeper.
Shown as event
clickhouse.lock.context.acquisition.count
(count)
The number of times the lock of Context was acquired or tried to acquire during the last interval. This is global lock.
Shown as event
clickhouse.lock.context.acquisition.total
(gauge)
The total number of times the lock of Context was acquired or tried to acquire. This is global lock.
Shown as event
clickhouse.syscall.write.wait
(gauge)
The percentage of time spent waiting for write syscall during the last interval. This include writes to page cache.
Shown as percent
clickhouse.file.open.count
(count)
The number of files opened during the last interval.
Shown as file
clickhouse.file.open.total
(gauge)
The total number of files opened.
Shown as file
clickhouse.query.count
(count)
The number of queries to be interpreted and potentially executed during the last interval. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.
Shown as query
clickhouse.query.total
(gauge)
The total number of queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.
Shown as query
clickhouse.file.read.count
(count)
The number of reads (read/pread) from a file descriptor during the last interval. Does not include sockets.
Shown as read
clickhouse.file.read.total
(gauge)
The total number of reads (read/pread) from a file descriptor. Does not include sockets.
Shown as read
clickhouse.thread.process_time
(gauge)
The percentage of time spent processing (queries and other tasks) threads during the last interval.
Shown as percent
clickhouse.query.insert.count
(count)
The number of INSERT queries to be interpreted and potentially executed during the last interval. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.
Shown as query
clickhouse.query.insert.total
(gauge)
The total number of INSERT queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.
Shown as query
clickhouse.query.select.count
(count)
The number of SELECT queries to be interpreted and potentially executed during the last interval. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.
Shown as query
clickhouse.query.select.total
(gauge)
The total number of SELECT queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.
Shown as query
clickhouse.thread.system.process_time
(gauge)
The percentage of time spent processing (queries and other tasks) threads executing CPU instructions in OS kernel space during the last interval. This includes time CPU pipeline was stalled due to cache misses, branch mispredictions, hyper-threading, etc.
Shown as percent
clickhouse.thread.user.process_time
(gauge)
The percentage of time spent processing (queries and other tasks) threads executing CPU instructions in user space during the last interval. This includes time CPU pipeline was stalled due to cache misses, branch mispredictions, hyper-threading, etc.
Shown as percent
clickhouse.file.write.count
(count)
The number of writes (write/pwrite) to a file descriptor during the last interval. Does not include sockets.
Shown as write
clickhouse.file.write.total
(gauge)
The total number of writes (write/pwrite) to a file descriptor. Does not include sockets.
Shown as write
clickhouse.file.write.size.count
(count)
The number of bytes written to file descriptors during the last interval. If the file is compressed, this will show compressed data size.
Shown as byte
clickhouse.file.write.size.total
(gauge)
The total number of bytes written to file descriptors during the last interval. If the file is compressed, this will show compressed data size.
Shown as byte
clickhouse.node.remove.count
(count)
The number of times an error happened while trying to remove ephemeral node during the last interval. This is usually not an issue, because ClickHouse's implementation of ZooKeeper library guarantees that the session will expire and the node will be removed.
Shown as error
clickhouse.node.remove.total
(gauge)
The total number of times an error happened while trying to remove ephemeral node. This is usually not an issue, because ClickHouse's implementation of ZooKeeper library guarantees that the session will expire and the node will be removed.
Shown as error
clickhouse.buffer.write.discard.count
(count)
The number of stack traces dropped by query profiler or signal handler because pipe is full or cannot write to pipe during the last interval.
Shown as error
clickhouse.buffer.write.discard.total
(gauge)
The total number of stack traces dropped by query profiler or signal handler because pipe is full or cannot write to pipe.
Shown as error
clickhouse.compilation.attempt.count
(count)
The number of times a compilation of generated C++ code was initiated during the last interval.
Shown as event
clickhouse.compilation.attempt.total
(gauge)
The total number of times a compilation of generated C++ code was initiated.
Shown as event
clickhouse.compilation.size.count
(count)
The number of bytes used for expressions compilation during the last interval.
Shown as byte
clickhouse.compilation.size.total
(gauge)
The total number of bytes used for expressions compilation.
Shown as byte
clickhouse.compilation.time
(gauge)
The percentage of time spent for compilation of expressions to LLVM code during the last interval.
Shown as percent
clickhouse.compilation.llvm.attempt.count
(count)
The number of times a compilation of generated LLVM code (to create fused function for complex expressions) was initiated during the last interval.
Shown as event
clickhouse.compilation.llvm.attempt.total
(gauge)
The total number of times a compilation of generated LLVM code (to create fused function for complex expressions) was initiated.
Shown as event
clickhouse.compilation.success.count
(count)
The number of times a compilation of generated C++ code was successful during the last interval.
Shown as event
clickhouse.compilation.success.total
(gauge)
The total number of times a compilation of generated C++ code was successful.
Shown as event
clickhouse.compilation.function.execute.count
(count)
The number of times a compiled function was executed during the last interval.
Shown as execution
clickhouse.compilation.function.execute.total
(gauge)
The total number of times a compiled function was executed.
Shown as execution
clickhouse.connection.http.create.count
(count)
The number of created HTTP connections (closed or opened) during the last interval.
Shown as connection
clickhouse.connection.http.create.total
(gauge)
The total number of created HTTP connections (closed or opened).
Shown as connection
clickhouse.table.mergetree.insert.delayed.count
(count)
The number of times the INSERT of a block to a MergeTree table was throttled due to high number of active data parts for partition during the last interval.
Shown as throttle
clickhouse.table.mergetree.insert.delayed.total
(gauge)
The total number of times the INSERT of a block to a MergeTree table was throttled due to high number of active data parts for partition.
Shown as throttle
clickhouse.table.mergetree.insert.delayed.time
(gauge)
The percentage of time spent while the INSERT of a block to a MergeTree table was throttled due to high number of active data parts for partition during the last interval.
Shown as percent
clickhouse.syscall.read.wait
(gauge)
The percentage of time spent waiting for read syscall during the last interval. This includes reads from page cache.
Shown as percent
clickhouse.table.mergetree.replicated.insert.deduplicate.count
(count)
The number of times the INSERTed block to a ReplicatedMergeTree table was deduplicated during the last interval.
Shown as operation
clickhouse.table.mergetree.replicated.insert.deduplicate.total
(gauge)
The total number of times the INSERTed block to a ReplicatedMergeTree table was deduplicated.
Shown as operation
clickhouse.table.insert.size.count
(count)
The number of bytes (uncompressed; for columns as they stored in memory) INSERTed to all tables during the last interval.
Shown as byte
clickhouse.table.insert.size.total
(gauge)
The total number of bytes (uncompressed; for columns as they stored in memory) INSERTed to all tables.
Shown as byte
clickhouse.table.insert.row.count
(count)
The number of rows INSERTed to all tables during the last interval.
Shown as row
clickhouse.table.insert.row.total
(gauge)
The total number of rows INSERTed to all tables.
Shown as row
clickhouse.table.mergetree.replicated.leader.elected.count
(count)
The number of times a ReplicatedMergeTree table became a leader during the last interval. Leader replica is responsible for assigning merges, cleaning old blocks for deduplications and a few more bookkeeping tasks.
Shown as event
clickhouse.table.mergetree.replicated.leader.elected.total
(gauge)
The total number of times a ReplicatedMergeTree table became a leader. Leader replica is responsible for assigning merges, cleaning old blocks for deduplications and a few more bookkeeping tasks.
Shown as event
clickhouse.merge.count
(count)
The number of launched background merges during the last interval.
Shown as merge
clickhouse.merge.total
(gauge)
The total number of launched background merges.
Shown as merge
clickhouse.table.mergetree.insert.block.count
(count)
The number of blocks INSERTed to MergeTree tables during the last interval. Each block forms a data part of level zero.
Shown as block
clickhouse.table.mergetree.insert.block.total
(gauge)
The total number of blocks INSERTed to MergeTree tables. Each block forms a data part of level zero.
Shown as block
clickhouse.table.mergetree.insert.block.already_sorted.count
(count)
The number of blocks INSERTed to MergeTree tables that appeared to be already sorted during the last interval.
Shown as block
clickhouse.table.mergetree.insert.block.already_sorted.total
(gauge)
The total number of blocks INSERTed to MergeTree tables that appeared to be already sorted.
Shown as block
clickhouse.table.mergetree.insert.write.size.compressed.count
(count)
The number of bytes written to filesystem for data INSERTed to MergeTree tables during the last interval.
Shown as byte
clickhouse.table.mergetree.insert.write.size.compressed.total
(gauge)
The total number of bytes written to filesystem for data INSERTed to MergeTree tables.
Shown as byte
clickhouse.table.mergetree.insert.row.count
(count)
The number of rows INSERTed to MergeTree tables during the last interval.
Shown as row
clickhouse.table.mergetree.insert.row.total
(gauge)
The total number of rows INSERTed to MergeTree tables.
Shown as row
clickhouse.table.mergetree.insert.write.size.uncompressed.count
(count)
The number of uncompressed bytes (for columns as they are stored in memory) INSERTed to MergeTree tables during the last interval.
Shown as byte
clickhouse.table.mergetree.insert.write.size.uncompressed.total
(gauge)
The total number of uncompressed bytes (for columns as they are stored in memory) INSERTed to MergeTree tables.
Shown as byte
clickhouse.merge.row.read.count
(count)
The number of rows read for background merges during the last interval. This is the number of rows before merge.
Shown as row
clickhouse.merge.row.read.total
(gauge)
The total number of rows read for background merges. This is the number of rows before merge.
Shown as row
clickhouse.merge.read.size.uncompressed.count
(count)
The number of uncompressed bytes (for columns as they are stored in memory) that was read for background merges during the last interval. This is the number before merge.
Shown as byte
clickhouse.merge.read.size.uncompressed.total
(gauge)
The total number of uncompressed bytes (for columns as they are stored in memory) that was read for background merges. This is the number before merge.
Shown as byte
clickhouse.merge.time
(gauge)
The percentage of time spent for background merges during the last interval.
Shown as percent
clickhouse.cpu.time
(gauge)
The percentage of CPU time spent seen by OS during the last interval. Does not include involuntary waits due to virtualization.
Shown as percent
clickhouse.thread.cpu.wait
(gauge)
The percentage of time a thread was ready for execution but waiting to be scheduled by OS (from the OS point of view) during the last interval.
Shown as percent
clickhouse.thread.io.wait
(gauge)
The percentage of time a thread spent waiting for a result of IO operation (from the OS point of view) during the last interval. This is real IO that doesn't include page cache.
Shown as percent
clickhouse.disk.read.size.count
(count)
The number of bytes read from disks or block devices during the last interval. Doesn't include bytes read from page cache. May include excessive data due to block size, readahead, etc.
Shown as byte
clickhouse.disk.read.size.total
(gauge)
The total number of bytes read from disks or block devices. Doesn't include bytes read from page cache. May include excessive data due to block size, readahead, etc.
Shown as byte
clickhouse.fs.read.size.count
(count)
The number of bytes read from filesystem (including page cache) during the last interval.
Shown as byte
clickhouse.fs.read.size.total
(gauge)
The total number of bytes read from filesystem (including page cache).
Shown as byte
clickhouse.disk.write.size.count
(count)
The number of bytes written to disks or block devices during the last interval. Doesn't include bytes that are in page cache dirty pages. May not include data that was written by OS asynchronously.
Shown as byte
clickhouse.disk.write.size.total
(gauge)
The total number of bytes written to disks or block devices. Doesn't include bytes that are in page cache dirty pages. May not include data that was written by OS asynchronously.
Shown as byte
clickhouse.fs.write.size.count
(count)
The number of bytes written to filesystem (including page cache) during the last interval.
Shown as byte
clickhouse.fs.write.size.total
(gauge)
The total number of bytes written to filesystem (including page cache).
Shown as byte
clickhouse.query.mask.match.count
(count)
The number of times query masking rules were successfully matched during the last interval.
Shown as occurrence
clickhouse.query.mask.match.total
(gauge)
The total number of times query masking rules were successfully matched.
Shown as occurrence
clickhouse.query.signal.dropped.count
(count)
The number of times the processing of a signal was dropped due to overrun plus the number of signals that the OS has not delivered due to overrun during the last interval.
Shown as occurrence
clickhouse.query.signal.dropped.total
(gauge)
The total number of times the processing of a signal was dropped due to overrun plus the number of signals that the OS has not delivered due to overrun.
Shown as occurrence
clickhouse.query.read.backoff.count
(count)
The number of times the number of query processing threads was lowered due to slow reads during the last interval.
Shown as occurrence
clickhouse.query.read.backoff.total
(gauge)
The total number of times the number of query processing threads was lowered due to slow reads.
Shown as occurrence
clickhouse.file.read.size.count
(count)
The number of bytes read from file descriptors during the last interval. If the file is compressed, this will show the compressed data size.
Shown as byte
clickhouse.file.read.size.total
(gauge)
The total number of bytes read from file descriptors. If the file is compressed, this will show the compressed data size.
Shown as byte
clickhouse.file.read.fail.count
(count)
The number of times the read (read/pread) from a file descriptor have failed during the last interval.
Shown as read
clickhouse.file.read.fail.total
(gauge)
The total number of times the read (read/pread) from a file descriptor have failed.
Shown as read
clickhouse.compilation.regex.count
(count)
The number of regular expressions compiled during the last interval. Identical regular expressions are compiled just once and cached forever.
Shown as event
clickhouse.compilation.regex.total
(gauge)
The total number of regular expressions compiled. Identical regular expressions are compiled just once and cached forever.
Shown as event
clickhouse.table.mergetree.insert.block.rejected.count
(count)
The number of times the INSERT of a block to a MergeTree table was rejected with `Too many parts` exception due to high number of active data parts for partition during the last interval.
Shown as block
clickhouse.table.mergetree.insert.block.rejected.total
(gauge)
The total number of times the INSERT of a block to a MergeTree table was rejected with `Too many parts` exception due to high number of active data parts for partition.
Shown as block
clickhouse.table.replicated.leader.yield.count
(count)
The number of times Replicated table yielded its leadership due to large replication lag relative to other replicas during the last interval.
Shown as event
clickhouse.table.replicated.leader.yield.total
(gauge)
The total number of times Replicated table yielded its leadership due to large replication lag relative to other replicas.
Shown as event
clickhouse.table.replicated.part.loss.count
(count)
The number of times a data part we wanted doesn't exist on any replica (even on replicas that are offline right now) during the last interval. Those data parts are definitely lost. This is normal due to asynchronous replication (if quorum inserts were not enabled), when the replica on which the data part was written failed and when it became online after fail it doesn't contain that data part.
Shown as item
clickhouse.table.replicated.part.loss.total
(gauge)
The total number of times a data part that we wanted doesn't exist on any replica (even on replicas that are offline right now). That data parts are definitely lost. This is normal due to asynchronous replication (if quorum inserts were not enabled), when the replica on which the data part was written failed and when it became online after fail it doesn't contain that data part.
Shown as item
clickhouse.table.mergetree.replicated.fetch.replica.count
(count)
The number of times a data part was downloaded from replica of a ReplicatedMergeTree table during the last interval.
Shown as fetch
clickhouse.table.mergetree.replicated.fetch.replica.total
(gauge)
The total number of times a data part was downloaded from replica of a ReplicatedMergeTree table.
Shown as fetch
clickhouse.table.mergetree.replicated.fetch.merged.count
(count)
The number of times ClickHouse prefers to download already merged part from replica of ReplicatedMergeTree table instead of performing a merge itself (usually it prefers doing a merge itself to save network traffic) during the last interval. This happens when ClickHouse does not have all source parts to perform a merge or when the data part is old enough.
Shown as fetch
clickhouse.table.mergetree.replicated.fetch.merged.total
(gauge)
The total number of times ClickHouse prefers to download already merged part from replica of ReplicatedMergeTree table instead of performing a merge itself (usually it prefers doing a merge itself to save network traffic). This happens when ClickHouse does not have all source parts to perform a merge or when the data part is old enough.
Shown as fetch
clickhouse.file.seek.count
(count)
The number of times the `lseek` function was called during the last interval.
Shown as operation
clickhouse.file.seek.total
(gauge)
The total number of times the `lseek` function was called.
Shown as operation
clickhouse.table.mergetree.mark.selected.count
(count)
The number of marks (index granules) selected to read from a MergeTree table during the last interval.
Shown as index
clickhouse.table.mergetree.mark.selected.total
(gauge)
The total number of marks (index granules) selected to read from a MergeTree table.
Shown as index
clickhouse.table.mergetree.part.selected.count
(count)
The number of data parts selected to read from a MergeTree table during the last interval.
Shown as item
clickhouse.table.mergetree.part.selected.total
(gauge)
The total number of data parts selected to read from a MergeTree table.
Shown as item
clickhouse.table.mergetree.range.selected.count
(count)
The number of non-adjacent ranges in all data parts selected to read from a MergeTree table during the last interval.
Shown as item
clickhouse.table.mergetree.range.selected.total
(gauge)
The total number of non-adjacent ranges in all data parts selected to read from a MergeTree table.
Shown as item
clickhouse.file.read.slow.count
(count)
The number of reads from a file that were slow during the last interval. This indicates system overload. Thresholds are controlled by read_backoff_* settings.
Shown as read
clickhouse.file.read.slow.total
(gauge)
The total number of reads from a file that were slow. This indicates system overload. Thresholds are controlled by read_backoff_* settings.
Shown as read
clickhouse.query.sleep.time
(gauge)
The percentage of time a query was sleeping to conform to the `max_network_bandwidth` setting during the last interval.
Shown as percent
clickhouse.file.write.fail.count
(count)
The number of times the write (write/pwrite) to a file descriptor have failed during the last interval.
Shown as write
clickhouse.file.write.fail.total
(gauge)
The total number of times the write (write/pwrite) to a file descriptor have failed.
Shown as write
clickhouse.CompiledExpressionCacheCount
(gauge)
(EXPERIMENTAL) This metric will be renamed in a future minor release.
Shown as item
clickhouse.table.mergetree.storage.mark.cache
(gauge)
The size of the cache of `marks` for StorageMergeTree.
Shown as byte
clickhouse.MarkCacheFiles
(gauge)
(EXPERIMENTAL) This metric will be renamed in a future minor release.
Shown as item
clickhouse.part.max
(gauge)
The maximum number of active parts in partitions.
Shown as item
clickhouse.database.total
(gauge)
The current number of databases.
Shown as instance
clickhouse.table.total
(gauge)
The current number of tables.
Shown as table
clickhouse.replica.delay.absolute
(gauge)
The maximum replica queue delay relative to current time.
Shown as millisecond
clickhouse.ReplicasMaxInsertsInQueue
(gauge)
(EXPERIMENTAL) This metric will be renamed in a future minor release.
Shown as item
clickhouse.ReplicasMaxMergesInQueue
(gauge)
(EXPERIMENTAL) This metric will be renamed in a future minor release.
Shown as item
clickhouse.ReplicasMaxQueueSize
(gauge)
(EXPERIMENTAL) This metric will be renamed in a future minor release.
Shown as item
clickhouse.replica.delay.relative
(gauge)
The maximum difference of absolute delay from any other replica.
Shown as millisecond
clickhouse.ReplicasSumInsertsInQueue
(gauge)
(EXPERIMENTAL) This metric will be renamed in a future minor release.
Shown as item
clickhouse.ReplicasSumMergesInQueue
(gauge)
(EXPERIMENTAL) This metric will be renamed in a future minor release.
Shown as item
clickhouse.replica.queue.size
(gauge)
The number of replication tasks in queue.
Shown as task
clickhouse.UncompressedCacheBytes
(gauge)
(EXPERIMENTAL) This metric will be renamed in a future minor release.
Shown as byte
clickhouse.UncompressedCacheCells
(gauge)
(EXPERIMENTAL) This metric will be renamed in a future minor release.
Shown as item
clickhouse.uptime
(gauge)
The amount of time ClickHouse has been active.
Shown as second
clickhouse.jemalloc.active
(gauge)
(EXPERIMENTAL)
Shown as byte
clickhouse.jemalloc.allocated
(gauge)
The amount of memory allocated by ClickHouse.
Shown as byte
clickhouse.jemalloc.background_thread.num_runs
(gauge)
(EXPERIMENTAL)
Shown as byte
clickhouse.jemalloc.background_thread.num_threads
(gauge)
(EXPERIMENTAL)
Shown as thread
clickhouse.jemalloc.background_thread.run_interval
(gauge)
(EXPERIMENTAL)
Shown as byte
clickhouse.jemalloc.mapped
(gauge)
The amount of memory in active extents mapped by the allocator.
Shown as byte
clickhouse.jemalloc.metadata
(gauge)
The amount of memory dedicated to metadata, which comprise base allocations used for bootstrap-sensitive allocator metadata structures and internal allocations.
Shown as byte
clickhouse.jemalloc.metadata_thp
(gauge)
(EXPERIMENTAL)
Shown as byte
clickhouse.jemalloc.resident
(gauge)
The amount of memory in physically resident data pages mapped by the allocator, comprising all pages dedicated to allocator metadata, pages backing active allocations, and unused dirty pages.
Shown as byte
clickhouse.jemalloc.retained
(gauge)
The amount of memory in virtual memory mappings that were retained rather than being returned to the operating system.
Shown as byte
clickhouse.dictionary.memory.used
(gauge)
The total amount of memory used by a dictionary.
Shown as byte
clickhouse.dictionary.item.current
(gauge)
The number of items stored in a dictionary.
Shown as item
clickhouse.dictionary.load
(gauge)
The percentage filled in a dictionary (for a hashed dictionary, the percentage filled in the hash table).
Shown as percent
clickhouse.table.mergetree.size
(gauge)
The total size of all data part files of a MergeTree table.
Shown as byte
clickhouse.table.mergetree.part.current
(gauge)
The total number of data parts of a MergeTree table.
Shown as object
clickhouse.table.mergetree.row.current
(gauge)
The total number of rows in a MergeTree table.
Shown as row
clickhouse.table.replicated.part.future
(gauge)
The number of data parts that will appear as the result of INSERTs or merges that haven't been done yet.
Shown as item
clickhouse.table.replicated.part.suspect
(gauge)
The number of data parts in the queue for verification. A part is put in the verification queue if there is suspicion that it might be damaged.
Shown as item
clickhouse.table.replicated.version
(gauge)
Version number of the table structure indicating how many times ALTER was performed. If replicas have different versions, it means some replicas haven't made all of the ALTERs yet.
Shown as operation
clickhouse.table.replicated.queue.size
(gauge)
Size of the queue for operations waiting to be performed. Operations include inserting blocks of data, merges, and certain other actions. It usually coincides with `clickhouse.table.replicated.part.future`.
Shown as operation
clickhouse.table.replicated.queue.insert
(gauge)
The number of inserts of blocks of data that need to be made. Insertions are usually replicated fairly quickly. If this number is large, it means something is wrong.
Shown as operation
clickhouse.table.replicated.queue.merge
(gauge)
The number of merges waiting to be made. Sometimes merges are lengthy, so this value may be greater than zero for a long time.
Shown as merge
clickhouse.table.replicated.log.max
(gauge)
Maximum entry number in the log of general activity.
Shown as item
clickhouse.table.replicated.log.pointer
(gauge)
Maximum entry number in the log of general activity that the replica copied to its execution queue, plus one. If this is much smaller than `clickhouse.table.replicated.log.max`, something is wrong.
Shown as item
clickhouse.table.replicated.total
(gauge)
The total number of known replicas of this table.
Shown as table
clickhouse.table.replicated.active
(gauge)
The number of replicas of this table that have a session in ZooKeeper (i.e., the number of functioning replicas).
Shown as table
clickhouse.read.compressed.block.count
(count)
The number of compressed blocks (the blocks of data that are compressed independent of each other) read from compressed sources (files, network) during the last interval.
Shown as block
clickhouse.read.compressed.block.total
(gauge)
The total number of compressed blocks (the blocks of data that are compressed independent of each other) read from compressed sources (files, network).
Shown as block
clickhouse.read.compressed.raw.size.count
(count)
The number of uncompressed bytes (the number of bytes after decompression) read from compressed sources (files, network) during the last interval.
Shown as byte
clickhouse.read.compressed.raw.size.total
(gauge)
The total number of uncompressed bytes (the number of bytes after decompression) read from compressed sources (files, network).
Shown as byte
clickhouse.read.compressed.size.count
(count)
The number of bytes (the number of bytes before decompression) read from compressed sources (files, network) during the last interval.
Shown as byte
clickhouse.read.compressed.size.total
(gauge)
The total number of bytes (the number of bytes before decompression) read from compressed sources (files, network).
Shown as byte
clickhouse.table.mergetree.replicated.fetch.replica.fail.count
(count)
The number of times a data part was failed to download from replica of a ReplicatedMergeTree table during the last interval.
Shown as byte
clickhouse.table.mergetree.replicated.fetch.replica.fail.total
(gauge)
The total number of times a data part was failed to download from replica of a ReplicatedMergeTree table.
Shown as byte
clickhouse.table.mergetree.replicated.merge.count
(count)
The number of times data parts of ReplicatedMergeTree tables were successfully merged during the last interval.
Shown as byte
clickhouse.table.mergetree.replicated.merge.total
(gauge)
The total number of times data parts of ReplicatedMergeTree tables were successfully merged.
Shown as byte

Checks de service

clickhouse.can_connect :
Renvoie CRITICAL si l’Agent ne parvient pas à se connecter à la base de données ClickHouse surveillée. Si ce n’est pas le cas, renvoie OK.

Événements

Le check ClickHouse n’inclut aucun événement.

Dépannage

Besoin d’aide ? Contactez l’assistance Datadog.