ArangoDB

Supported OS Linux Windows Mac OS

Integration version2.2.2

Overview

This check monitors ArangoDB through the Datadog Agent. ArangoDB 3.8 and above are supported.

Enable the Datadog-ArangoDB integration to:

  • Identify slow queries based on user-defined thresholds.
  • Understand the impact of a long request and troubleshoot latency issues.
  • Monitor underlying RocksDB memory, disk, and cache limits.

Setup

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates to apply these instructions.

Installation

The ArangoDB check is included in the Datadog Agent package.

Configuration

  1. Edit the arangodb.d/conf.yaml file in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your ArangoDB performance data. See the sample arangodb.d/conf.yaml for all available configuration options.

  2. Restart the Agent.

Validation

Run the Agent’s status subcommand and look for arangodb under the Checks section.

Data Collected

Metrics

arangodb.agency.cache.callback
(gauge)
The current number of entries in Agency cache callbacks table.
arangodb.agency.callback
(gauge)
The current number of Agency callbacks registered.
arangodb.agency.callback.registered.count
(count)
The total number of Agency callbacks ever registered.
arangodb.agency.client.lookup.table_size
(gauge)
The current number of entries in Agency client id lookup table.
Shown as entry
arangodb.agency.commit.bucket
(count)
The distribution of commit times for all Agency write operations.
Shown as millisecond
arangodb.agency.commit.count
(count)
The distribution of commit times for all Agency write operations.
Shown as millisecond
arangodb.agency.commit.sum
(count)
The distribution of commit times for all Agency write operations.
Shown as millisecond
arangodb.agency.compaction.bucket
(count)
The distribution of Agency compaction run times.
Shown as millisecond
arangodb.agency.compaction.count
(count)
The distribution of Agency compaction run times.
Shown as millisecond
arangodb.agency.compaction.sum
(count)
The distribution of Agency compaction run times.
Shown as millisecond
arangodb.agency.log.size
(gauge)
Size of the Agency's in-memory part of replicated log in bytes.
Shown as byte
arangodb.agency.read.no_leader.count
(count)
The number of Agency read operations with no leader or on followers.
arangodb.agency.read.ok.count
(count)
Number of Agency read operations which were successful.
arangodb.agency.request.time.bucket
(count)
How long requests to the Agency took.
Shown as millisecond
arangodb.agency.request.time.count
(count)
How long requests to the Agency took.
Shown as millisecond
arangodb.agency.request.time.sum
(count)
How long requests to the Agency took.
Shown as millisecond
arangodb.agency.supervision.failed.server.count
(count)
This counter is increased whenever a supervision run encounters a failed server and starts a FailedServer job.
arangodb.agency.write.bucket
(count)
Agency write time histogram.
Shown as millisecond
arangodb.agency.write.count
(count)
Agency write time histogram.
Shown as millisecond
arangodb.agency.write.no_leader.count
(count)
The number of Agency write operations with no leader or on followers.
arangodb.agency.write.ok.count
(count)
The number of Agency write operations which were successful.
arangodb.agency.write.sum
(count)
Agency write time histogram.
Shown as millisecond
arangodb.aql.all.query.count
(count)
Total number of AQL queries finished.
Shown as query
arangodb.aql.current.query
(gauge)
Current number of AQL queries executing.
Shown as query
arangodb.aql.global.memory.limit
(gauge)
Total memory limit for all AQL queries combined.
Shown as byte
arangodb.aql.global.memory.usage
(gauge)
Total memory usage of all AQL queries executing; granularity: 32768 bytes steps.
Shown as byte
arangodb.aql.global.query.memory.limit.reached.count
(count)
Number of times the global query memory limit threshold was reached.
arangodb.aql.local.query.memory.limit.reached.count
(count)
Number of times a local query memory limit threshold was reached.
arangodb.aql.query.time.bucket
(count)
Execution time histogram for all AQL queries.
arangodb.aql.query.time.count
(count)
Execution time histogram for all AQL queries.
arangodb.aql.query.time.sum
(count)
Execution time histogram for all AQL queries.
arangodb.aql.slow.query.time.bucket
(count)
Execution time histogram for slow AQL queries.
arangodb.aql.slow.query.time.count
(count)
Execution time histogram for slow AQL queries.
arangodb.aql.slow.query.time.sum
(count)
Execution time histogram for slow AQL queries.
arangodb.client.connection.bytes.received.bucket
(count)
Bytes received for a request.
Shown as byte
arangodb.client.connection.bytes.received.count
(count)
Bytes received for a request.
Shown as byte
arangodb.client.connection.bytes.received.sum
(count)
Bytes received for a request.
Shown as byte
arangodb.client.connection.io.time.bucket
(count)
I/O time needed to answer a request.
arangodb.client.connection.io.time.count
(count)
I/O time needed to answer a request.
arangodb.client.connection.io.time.sum
(count)
I/O time needed to answer a request.
arangodb.client.connection.queue.time.bucket
(count)
Queueing time needed for requests.
Shown as second
arangodb.client.connection.queue.time.count
(count)
Queueing time needed for requests.
Shown as second
arangodb.client.connection.queue.time.sum
(count)
Queueing time needed for requests.
Shown as second
arangodb.client.connection.request.time.bucket
(count)
Request time needed to answer a request.
Shown as second
arangodb.client.connection.request.time.count
(count)
Request time needed to answer a request.
Shown as second
arangodb.client.connection.request.time.sum
(count)
Request time needed to answer a request.
Shown as second
arangodb.client.connection.time.bucket
(count)
Total connection time of a client.
Shown as second
arangodb.client.connection.time.count
(count)
Total connection time of a client.
Shown as second
arangodb.client.connection.time.sum
(count)
Total connection time of a client.
Shown as second
arangodb.client.connection.total.time.bucket
(count)
Total time needed to answer a request.
Shown as second
arangodb.client.connection.total.time.count
(count)
Total time needed to answer a request.
Shown as second
arangodb.client.connection.total.time.sum
(count)
Total time needed to answer a request.
Shown as second
arangodb.client.connections
(gauge)
The number of client connections that are currently open.
Shown as connection
arangodb.collection.lock.acquisition.count
(count)
Total amount of collection lock acquisition time.
Shown as microsecond
arangodb.collection.lock.sequential_mode.count
(count)
Number of transactions using sequential locking of collections to avoid deadlocking.
Shown as transaction
arangodb.collection.lock.timeouts_exclusive.count
(count)
Number of timeouts when trying to acquire collection exclusive locks.
Shown as timeout
arangodb.collection.lock.timeouts_write.count
(count)
Number of timeouts when trying to acquire collection write locks.
Shown as timeout
arangodb.connection_pool.connections.created.count
(count)
Total number of connections created for connection pool.
arangodb.connection_pool.connections.current
(gauge)
Current number of connections in pool.
arangodb.connection_pool.lease_time.bucket
(count)
Count of time to lease a connection from the connection pool.
Shown as millisecond
arangodb.connection_pool.lease_time.count
(count)
Count of time to lease a connection from the connection pool.
Shown as millisecond
arangodb.connection_pool.lease_time.sum
(count)
Count of time to lease a connection from the connection pool.
Shown as millisecond
arangodb.connection_pool.leases.failed.count
(count)
Total number of failed connection leases.
arangodb.connection_pool.leases.successful.count
(count)
Total number of successful connection leases from connection pool.
arangodb.health.dropped_followers.count
(count)
Total number of drop-follower events.
Shown as event
arangodb.health.heartbeat.sent.time.bucket
(count)
Count of times required to send heartbeats.
Shown as millisecond
arangodb.health.heartbeat.sent.time.count
(count)
Count of times required to send heartbeats.
Shown as millisecond
arangodb.health.heartbeat.sent.time.sum
(count)
Count of times required to send heartbeats.
Shown as millisecond
arangodb.health.heartbeat_failures.count
(count)
Total number of failed heartbeat transmissions.
arangodb.http.async.requests.count
(count)
Number of asynchronously executed HTTP requests.
Shown as request
arangodb.http.delete.requests.count
(count)
Number of HTTP DELETE requests.
Shown as request
arangodb.http.get.requests.count
(count)
Number of HTTP GET requests.
Shown as request
arangodb.http.head.requests.count
(count)
Number of HTTP HEAD requests.
Shown as request
arangodb.http.options.requests.count
(count)
Number of HTTP OPTIONS requests.
Shown as request
arangodb.http.other.requests.count
(count)
Number of other/illegal HTTP requests.
Shown as request
arangodb.http.patch.requests.count
(count)
Number of HTTP PATCH requests.
Shown as request
arangodb.http.post.requests.count
(count)
Number of HTTP POST requests.
Shown as request
arangodb.http.put.requests.count
(count)
Number of HTTP PUT requests.
Shown as request
arangodb.http.total.requests.count
(count)
Total number of HTTP requests.
Shown as request
arangodb.http.user.requests.count
(count)
Total number of HTTP requests executed by user clients.
Shown as request
arangodb.http2.connections.count
(count)
Total number of connections accepted for HTTP/2.
arangodb.network.forwarded.requests.count
(count)
Number of requests forwarded to another Coordinator.
Shown as request
arangodb.network.request.timeouts.count
(count)
Number of internal requests that have timed out.
Shown as request
arangodb.network.requests.in.flight
(gauge)
Number of outgoing internal requests in flight.
Shown as request
arangodb.process.page.faults.major.count
(count)
Number of major page faults.
Shown as fault
arangodb.process.page.faults.minor.count
(count)
Number of minor page faults.
Shown as fault
arangodb.process.resident_set_size
(gauge)
The total size of the number of pages the process has in real memory.
Shown as byte
arangodb.process.system_time
(gauge)
Amount of time that this process has been scheduled in kernel mode.
Shown as second
arangodb.process.threads
(gauge)
Number of threads.
Shown as thread
arangodb.process.user_time
(gauge)
Amount of time that this process has been scheduled in user mode.
Shown as second
arangodb.process.virtual_memory_size
(gauge)
The side of the virtual memory the process is using.
Shown as byte
arangodb.rocksdb.actual.delayed.write.rate
(gauge)
Actual delayed RocksDB write rate.
arangodb.rocksdb.archived.wal.files
(gauge)
Number of RocksDB WAL files in the archive.
Shown as file
arangodb.rocksdb.background.errors
(gauge)
Total number of RocksDB background errors.
Shown as error
arangodb.rocksdb.base.level
(gauge)
The number of the level to which L0 data will be compacted.
arangodb.rocksdb.block.cache.capacity
(gauge)
The block cache capacity in bytes.
Shown as byte
arangodb.rocksdb.block.cache.pinned.usage
(gauge)
The memory size for the RocksDB block cache for the entries which are pinned.
Shown as byte
arangodb.rocksdb.block.cache.usage
(gauge)
The total memory size for the entries residing in the block cache.
Shown as byte
arangodb.rocksdb.cache.allocated
(gauge)
The current global allocation for the ArangoDB cache which sits in front of RocksDB.
Shown as byte
arangodb.rocksdb.cache.hit.rate.lifetime
(gauge)
The recent hit rate of the ArangoDB in-memory cache which is sitting in front of RocksDB.
arangodb.rocksdb.cache.limit
(gauge)
The current global allocation limit for the ArangoDB caches which sit in front of RocksDB.
Shown as byte
arangodb.rocksdb.collection_lock.acquisition_time.bucket
(count)
Histogram of the collection/shard lock acquisition times.
Shown as second
arangodb.rocksdb.collection_lock.acquisition_time.count
(count)
Histogram of the collection/shard lock acquisition times.
Shown as second
arangodb.rocksdb.collection_lock.acquisition_time.sum
(count)
Histogram of the collection/shard lock acquisition times.
Shown as second
arangodb.rocksdb.compaction.pending
(gauge)
The number of column families for which at least one compaction is pending.
arangodb.rocksdb.cur.size.active.mem.table
(gauge)
The approximate size of the active memtable in bytes, summed over all column families.
Shown as byte
arangodb.rocksdb.cur.size.all.mem.tables
(gauge)
The approximate size of active and unflushed immutable memtables in bytes, summed over all column families.
Shown as byte
arangodb.rocksdb.engine.throttle.bps
(gauge)
The current write rate limit of the ArangoDB RocksDB throttle.
Shown as byte
arangodb.rocksdb.estimate.live.data.size
(gauge)
The estimate of the amount of live data in bytes, summed over all column families.
Shown as byte
arangodb.rocksdb.estimate.num.keys
(gauge)
The estimated number of total keys in the active and unflushed immutable memtables and storage, summed over all column families.
Shown as key
arangodb.rocksdb.estimate.pending.compaction.bytes
(gauge)
The estimated total number of bytes compaction needs to rewrite to get all levels down to under target size.
Shown as byte
arangodb.rocksdb.estimate.table.readers.mem
(gauge)
The estimated memory used for reading SST tables, excluding memory used in block cache (e.g. filter and index blocks), summed over all column families.
Shown as byte
arangodb.rocksdb.free.disk.space
(gauge)
The currently free disk space in bytes on the volume which is used by RocksDB.
Shown as byte
arangodb.rocksdb.free.inodes
(gauge)
The currently free number of inodes on the disk volume used by RocksDB.
Shown as inode
arangodb.rocksdb.live.sst.files.size
(gauge)
The total size in bytes of all SST files belonging to the latest LSM tree, summed over all column families.
Shown as byte
arangodb.rocksdb.mem.table.flush.pending
(gauge)
The number of column families for which a memtable flush is pending.
arangodb.rocksdb.min.log.number.to.keep
(gauge)
The minimum log number of the log files that should be kept.
arangodb.rocksdb.num.deletes.active.mem.table
(gauge)
The total number of delete entries in the active memtable, summed over all column families.
Shown as entry
arangodb.rocksdb.num.deletes.imm.mem.tables
(gauge)
The total number of delete entries in the unflushed immutable memtables, summed over all column families.
Shown as entry
arangodb.rocksdb.num.entries.active.mem.table
(gauge)
The total number of entries in the active memtable, summed over all column families.
Shown as entry
arangodb.rocksdb.num.entries.imm_mem.tables
(gauge)
The total number of entries in the unflushed immutable memtables, summed over all column families.
Shown as entry
arangodb.rocksdb.num.immutable.mem.table
(gauge)
The number of immutable memtables that have not yet been flushed.
arangodb.rocksdb.num.immutable.mem.table.flushed
(gauge)
The number of immutable memtables that have already been flushed.
arangodb.rocksdb.num.live.versions
(gauge)
The number of live versions.
arangodb.rocksdb.num.running.compactions
(gauge)
The number of currently running compactions.
arangodb.rocksdb.num.running.flushes
(gauge)
The number of currently running flushes.
Shown as flush
arangodb.rocksdb.num.snapshots
(gauge)
The number of unreleased snapshots of the database.
arangodb.rocksdb.prunable.wal.files
(gauge)
The total number of RocksDB WAL files in the archive subdirectory that can be pruned.
Shown as file
arangodb.rocksdb.size.all.mem.tables
(gauge)
The approximate size of all active, unflushed immutable, and pinned immutable memtables in bytes, summed over all column families.
Shown as byte
arangodb.rocksdb.total.disk.space
(gauge)
The total size in bytes of all SST files, summed over all column families.
Shown as byte
arangodb.rocksdb.total.inodes
(gauge)
The currently used number of inodes on the disk volume used by RocksDB.
Shown as inode
arangodb.rocksdb.total.sst.files.size
(gauge)
The total size in bytes of all SST files, summed over all column families.
Shown as byte
arangodb.rocksdb.write.stalls.count
(count)
Number of times RocksDB has entered a stalled (slowed) write state.
arangodb.rocksdb.write.stop.count
(count)
Number of times RocksDB has entered a stopped write state.
arangodb.rocksdb.write.stops.count
(count)
The number of times RocksDB was observed by ArangoDB to have entered a stopped write state.
arangodb.server.cpu_cores
(gauge)
Number of CPU cores visible to the arangod process.
arangodb.server.idle_percent
(gauge)
Percentage of time that the system CPUs have been idle.
Shown as percent
arangodb.server.iowait_percent
(gauge)
Percentage of time that the system CPUs have been waiting for I/O.
Shown as percent
arangodb.server.kernel_mode.percent
(gauge)
Percentage of time that system CPUs have spent in kernel mode.
Shown as percent
arangodb.server.physical_memory
(gauge)
Physical memory of the system in bytes.
Shown as byte
arangodb.server.user_mode.percent
(gauge)
Percentage of time that system CPUs have spent in user mode.
Shown as percent
arangodb.transactions.aborted.count
(count)
Number of transactions aborted.
Shown as transaction
arangodb.transactions.committed.count
(count)
Number of transactions committed.
Shown as transaction
arangodb.transactions.expired.count
(count)
Number of expired transactions, i.e. transactions that have been begun but that were automatically garbage-collected due to inactivity within the transactions' time-to-live (TTL) period.
Shown as transaction
arangodb.transactions.read.count
(count)
Number of read-only transactions.
Shown as transaction
arangodb.transactions.started.count
(count)
Number of transactions started/begun.
Shown as transaction
arangodb.vst.connections.count
(count)
Total number of connections accepted for VST.

Log collection

Available for Agent versions >6.0

To collect logs from your ArangoDB instance, first make sure that your ArangoDB is configured to output logs to a file. For example, if using the arangod.conf file to configure your ArangoDB instance, you should include the following:

# ArangoDB configuration file
#
# Documentation:
# https://www.arangodb.com/docs/stable/administration-configuration.html
#

...

[log]
file = /var/log/arangodb3/arangod.log 

...

ArangoDB logs contain many options for log verbosity and output files. Datadog’s integration pipeline supports the default conversion pattern.

  1. Collecting logs is disabled by default in the Datadog Agent. Enable it in your datadog.yaml file:

    logs_enabled: true
    
  2. Uncomment and edit the logs configuration block in your arangodb.d/conf.yaml file:

    logs:
       - type: file
         path: /var/log/arangodb3/arangod.log
         source: arangodb
    

Events

The ArangoDB integration does not include any events.

Service Checks

arangodb.openmetrics.health
Returns CRITICAL if the Agent is unable to connect to the OpenMetrics endpoint, otherwise returns OK.
Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog Support.