Supported OS Linux Windows Mac OS

インテグレーションバージョン2.2.2

概要

このチェックでは、Datadog Agent を通じて ArangoDB を監視します。ArangoDB 3.8 以降に対応しています。

Datadog-ArangoDB インテグレーションを有効にすると、以下のことができます。

  • ユーザー定義のしきい値に基づいて、遅いクエリを特定する。
  • 長いリクエストの影響を理解し、レイテンシーの問題をトラブルシュートする。
  • RocksDB のメモリ、ディスク、キャッシュの制限を監視する。

セットアップ

ホストで実行されている Agent 用にこのチェックをインストールおよび構成する場合は、以下の手順に従ってください。コンテナ環境の場合は、オートディスカバリーのインテグレーションテンプレートを参照してこの手順を行ってください。

インストール

ArangoDB チェックは Datadog Agent パッケージに含まれています。

構成

  1. ArangoDB のパフォーマンスデータの収集を開始するには、Agent のコンフィギュレーションディレクトリのルートにある conf.d/ フォルダーの arangodb.d/conf.yaml ファイルを編集します。使用可能なすべてのコンフィギュレーションオプションについては、サンプル arangodb.d/conf.yaml を参照してください。

  2. Agent を再起動します

検証

Agent の status サブコマンドを実行し、Checks セクションで arangodb を探します。

収集データ

メトリクス

arangodb.agency.cache.callback
(gauge)
The current number of entries in Agency cache callbacks table.
arangodb.agency.callback
(gauge)
The current number of Agency callbacks registered.
arangodb.agency.callback.registered.count
(count)
The total number of Agency callbacks ever registered.
arangodb.agency.client.lookup.table_size
(gauge)
The current number of entries in Agency client id lookup table.
Shown as entry
arangodb.agency.commit.bucket
(count)
The distribution of commit times for all Agency write operations.
Shown as millisecond
arangodb.agency.commit.count
(count)
The distribution of commit times for all Agency write operations.
Shown as millisecond
arangodb.agency.commit.sum
(count)
The distribution of commit times for all Agency write operations.
Shown as millisecond
arangodb.agency.compaction.bucket
(count)
The distribution of Agency compaction run times.
Shown as millisecond
arangodb.agency.compaction.count
(count)
The distribution of Agency compaction run times.
Shown as millisecond
arangodb.agency.compaction.sum
(count)
The distribution of Agency compaction run times.
Shown as millisecond
arangodb.agency.log.size
(gauge)
Size of the Agency's in-memory part of replicated log in bytes.
Shown as byte
arangodb.agency.read.no_leader.count
(count)
The number of Agency read operations with no leader or on followers.
arangodb.agency.read.ok.count
(count)
Number of Agency read operations which were successful.
arangodb.agency.request.time.bucket
(count)
How long requests to the Agency took.
Shown as millisecond
arangodb.agency.request.time.count
(count)
How long requests to the Agency took.
Shown as millisecond
arangodb.agency.request.time.sum
(count)
How long requests to the Agency took.
Shown as millisecond
arangodb.agency.supervision.failed.server.count
(count)
This counter is increased whenever a supervision run encounters a failed server and starts a FailedServer job.
arangodb.agency.write.bucket
(count)
Agency write time histogram.
Shown as millisecond
arangodb.agency.write.count
(count)
Agency write time histogram.
Shown as millisecond
arangodb.agency.write.no_leader.count
(count)
The number of Agency write operations with no leader or on followers.
arangodb.agency.write.ok.count
(count)
The number of Agency write operations which were successful.
arangodb.agency.write.sum
(count)
Agency write time histogram.
Shown as millisecond
arangodb.aql.all.query.count
(count)
Total number of AQL queries finished.
Shown as query
arangodb.aql.current.query
(gauge)
Current number of AQL queries executing.
Shown as query
arangodb.aql.global.memory.limit
(gauge)
Total memory limit for all AQL queries combined.
Shown as byte
arangodb.aql.global.memory.usage
(gauge)
Total memory usage of all AQL queries executing; granularity: 32768 bytes steps.
Shown as byte
arangodb.aql.global.query.memory.limit.reached.count
(count)
Number of times the global query memory limit threshold was reached.
arangodb.aql.local.query.memory.limit.reached.count
(count)
Number of times a local query memory limit threshold was reached.
arangodb.aql.query.time.bucket
(count)
Execution time histogram for all AQL queries.
arangodb.aql.query.time.count
(count)
Execution time histogram for all AQL queries.
arangodb.aql.query.time.sum
(count)
Execution time histogram for all AQL queries.
arangodb.aql.slow.query.time.bucket
(count)
Execution time histogram for slow AQL queries.
arangodb.aql.slow.query.time.count
(count)
Execution time histogram for slow AQL queries.
arangodb.aql.slow.query.time.sum
(count)
Execution time histogram for slow AQL queries.
arangodb.client.connection.bytes.received.bucket
(count)
Bytes received for a request.
Shown as byte
arangodb.client.connection.bytes.received.count
(count)
Bytes received for a request.
Shown as byte
arangodb.client.connection.bytes.received.sum
(count)
Bytes received for a request.
Shown as byte
arangodb.client.connection.io.time.bucket
(count)
I/O time needed to answer a request.
arangodb.client.connection.io.time.count
(count)
I/O time needed to answer a request.
arangodb.client.connection.io.time.sum
(count)
I/O time needed to answer a request.
arangodb.client.connection.queue.time.bucket
(count)
Queueing time needed for requests.
Shown as second
arangodb.client.connection.queue.time.count
(count)
Queueing time needed for requests.
Shown as second
arangodb.client.connection.queue.time.sum
(count)
Queueing time needed for requests.
Shown as second
arangodb.client.connection.request.time.bucket
(count)
Request time needed to answer a request.
Shown as second
arangodb.client.connection.request.time.count
(count)
Request time needed to answer a request.
Shown as second
arangodb.client.connection.request.time.sum
(count)
Request time needed to answer a request.
Shown as second
arangodb.client.connection.time.bucket
(count)
Total connection time of a client.
Shown as second
arangodb.client.connection.time.count
(count)
Total connection time of a client.
Shown as second
arangodb.client.connection.time.sum
(count)
Total connection time of a client.
Shown as second
arangodb.client.connection.total.time.bucket
(count)
Total time needed to answer a request.
Shown as second
arangodb.client.connection.total.time.count
(count)
Total time needed to answer a request.
Shown as second
arangodb.client.connection.total.time.sum
(count)
Total time needed to answer a request.
Shown as second
arangodb.client.connections
(gauge)
The number of client connections that are currently open.
Shown as connection
arangodb.collection.lock.acquisition.count
(count)
Total amount of collection lock acquisition time.
Shown as microsecond
arangodb.collection.lock.sequential_mode.count
(count)
Number of transactions using sequential locking of collections to avoid deadlocking.
Shown as transaction
arangodb.collection.lock.timeouts_exclusive.count
(count)
Number of timeouts when trying to acquire collection exclusive locks.
Shown as timeout
arangodb.collection.lock.timeouts_write.count
(count)
Number of timeouts when trying to acquire collection write locks.
Shown as timeout
arangodb.connection_pool.connections.created.count
(count)
Total number of connections created for connection pool.
arangodb.connection_pool.connections.current
(gauge)
Current number of connections in pool.
arangodb.connection_pool.lease_time.bucket
(count)
Count of time to lease a connection from the connection pool.
Shown as millisecond
arangodb.connection_pool.lease_time.count
(count)
Count of time to lease a connection from the connection pool.
Shown as millisecond
arangodb.connection_pool.lease_time.sum
(count)
Count of time to lease a connection from the connection pool.
Shown as millisecond
arangodb.connection_pool.leases.failed.count
(count)
Total number of failed connection leases.
arangodb.connection_pool.leases.successful.count
(count)
Total number of successful connection leases from connection pool.
arangodb.health.dropped_followers.count
(count)
Total number of drop-follower events.
Shown as event
arangodb.health.heartbeat.sent.time.bucket
(count)
Count of times required to send heartbeats.
Shown as millisecond
arangodb.health.heartbeat.sent.time.count
(count)
Count of times required to send heartbeats.
Shown as millisecond
arangodb.health.heartbeat.sent.time.sum
(count)
Count of times required to send heartbeats.
Shown as millisecond
arangodb.health.heartbeat_failures.count
(count)
Total number of failed heartbeat transmissions.
arangodb.http.async.requests.count
(count)
Number of asynchronously executed HTTP requests.
Shown as request
arangodb.http.delete.requests.count
(count)
Number of HTTP DELETE requests.
Shown as request
arangodb.http.get.requests.count
(count)
Number of HTTP GET requests.
Shown as request
arangodb.http.head.requests.count
(count)
Number of HTTP HEAD requests.
Shown as request
arangodb.http.options.requests.count
(count)
Number of HTTP OPTIONS requests.
Shown as request
arangodb.http.other.requests.count
(count)
Number of other/illegal HTTP requests.
Shown as request
arangodb.http.patch.requests.count
(count)
Number of HTTP PATCH requests.
Shown as request
arangodb.http.post.requests.count
(count)
Number of HTTP POST requests.
Shown as request
arangodb.http.put.requests.count
(count)
Number of HTTP PUT requests.
Shown as request
arangodb.http.total.requests.count
(count)
Total number of HTTP requests.
Shown as request
arangodb.http.user.requests.count
(count)
Total number of HTTP requests executed by user clients.
Shown as request
arangodb.http2.connections.count
(count)
Total number of connections accepted for HTTP/2.
arangodb.network.forwarded.requests.count
(count)
Number of requests forwarded to another Coordinator.
Shown as request
arangodb.network.request.timeouts.count
(count)
Number of internal requests that have timed out.
Shown as request
arangodb.network.requests.in.flight
(gauge)
Number of outgoing internal requests in flight.
Shown as request
arangodb.process.page.faults.major.count
(count)
Number of major page faults.
Shown as fault
arangodb.process.page.faults.minor.count
(count)
Number of minor page faults.
Shown as fault
arangodb.process.resident_set_size
(gauge)
The total size of the number of pages the process has in real memory.
Shown as byte
arangodb.process.system_time
(gauge)
Amount of time that this process has been scheduled in kernel mode.
Shown as second
arangodb.process.threads
(gauge)
Number of threads.
Shown as thread
arangodb.process.user_time
(gauge)
Amount of time that this process has been scheduled in user mode.
Shown as second
arangodb.process.virtual_memory_size
(gauge)
The side of the virtual memory the process is using.
Shown as byte
arangodb.rocksdb.actual.delayed.write.rate
(gauge)
Actual delayed RocksDB write rate.
arangodb.rocksdb.archived.wal.files
(gauge)
Number of RocksDB WAL files in the archive.
Shown as file
arangodb.rocksdb.background.errors
(gauge)
Total number of RocksDB background errors.
Shown as error
arangodb.rocksdb.base.level
(gauge)
The number of the level to which L0 data will be compacted.
arangodb.rocksdb.block.cache.capacity
(gauge)
The block cache capacity in bytes.
Shown as byte
arangodb.rocksdb.block.cache.pinned.usage
(gauge)
The memory size for the RocksDB block cache for the entries which are pinned.
Shown as byte
arangodb.rocksdb.block.cache.usage
(gauge)
The total memory size for the entries residing in the block cache.
Shown as byte
arangodb.rocksdb.cache.allocated
(gauge)
The current global allocation for the ArangoDB cache which sits in front of RocksDB.
Shown as byte
arangodb.rocksdb.cache.hit.rate.lifetime
(gauge)
The recent hit rate of the ArangoDB in-memory cache which is sitting in front of RocksDB.
arangodb.rocksdb.cache.limit
(gauge)
The current global allocation limit for the ArangoDB caches which sit in front of RocksDB.
Shown as byte
arangodb.rocksdb.collection_lock.acquisition_time.bucket
(count)
Histogram of the collection/shard lock acquisition times.
Shown as second
arangodb.rocksdb.collection_lock.acquisition_time.count
(count)
Histogram of the collection/shard lock acquisition times.
Shown as second
arangodb.rocksdb.collection_lock.acquisition_time.sum
(count)
Histogram of the collection/shard lock acquisition times.
Shown as second
arangodb.rocksdb.compaction.pending
(gauge)
The number of column families for which at least one compaction is pending.
arangodb.rocksdb.cur.size.active.mem.table
(gauge)
The approximate size of the active memtable in bytes, summed over all column families.
Shown as byte
arangodb.rocksdb.cur.size.all.mem.tables
(gauge)
The approximate size of active and unflushed immutable memtables in bytes, summed over all column families.
Shown as byte
arangodb.rocksdb.engine.throttle.bps
(gauge)
The current write rate limit of the ArangoDB RocksDB throttle.
Shown as byte
arangodb.rocksdb.estimate.live.data.size
(gauge)
The estimate of the amount of live data in bytes, summed over all column families.
Shown as byte
arangodb.rocksdb.estimate.num.keys
(gauge)
The estimated number of total keys in the active and unflushed immutable memtables and storage, summed over all column families.
Shown as key
arangodb.rocksdb.estimate.pending.compaction.bytes
(gauge)
The estimated total number of bytes compaction needs to rewrite to get all levels down to under target size.
Shown as byte
arangodb.rocksdb.estimate.table.readers.mem
(gauge)
The estimated memory used for reading SST tables, excluding memory used in block cache (e.g. filter and index blocks), summed over all column families.
Shown as byte
arangodb.rocksdb.free.disk.space
(gauge)
The currently free disk space in bytes on the volume which is used by RocksDB.
Shown as byte
arangodb.rocksdb.free.inodes
(gauge)
The currently free number of inodes on the disk volume used by RocksDB.
Shown as inode
arangodb.rocksdb.live.sst.files.size
(gauge)
The total size in bytes of all SST files belonging to the latest LSM tree, summed over all column families.
Shown as byte
arangodb.rocksdb.mem.table.flush.pending
(gauge)
The number of column families for which a memtable flush is pending.
arangodb.rocksdb.min.log.number.to.keep
(gauge)
The minimum log number of the log files that should be kept.
arangodb.rocksdb.num.deletes.active.mem.table
(gauge)
The total number of delete entries in the active memtable, summed over all column families.
Shown as entry
arangodb.rocksdb.num.deletes.imm.mem.tables
(gauge)
The total number of delete entries in the unflushed immutable memtables, summed over all column families.
Shown as entry
arangodb.rocksdb.num.entries.active.mem.table
(gauge)
The total number of entries in the active memtable, summed over all column families.
Shown as entry
arangodb.rocksdb.num.entries.imm_mem.tables
(gauge)
The total number of entries in the unflushed immutable memtables, summed over all column families.
Shown as entry
arangodb.rocksdb.num.immutable.mem.table
(gauge)
The number of immutable memtables that have not yet been flushed.
arangodb.rocksdb.num.immutable.mem.table.flushed
(gauge)
The number of immutable memtables that have already been flushed.
arangodb.rocksdb.num.live.versions
(gauge)
The number of live versions.
arangodb.rocksdb.num.running.compactions
(gauge)
The number of currently running compactions.
arangodb.rocksdb.num.running.flushes
(gauge)
The number of currently running flushes.
Shown as flush
arangodb.rocksdb.num.snapshots
(gauge)
The number of unreleased snapshots of the database.
arangodb.rocksdb.prunable.wal.files
(gauge)
The total number of RocksDB WAL files in the archive subdirectory that can be pruned.
Shown as file
arangodb.rocksdb.size.all.mem.tables
(gauge)
The approximate size of all active, unflushed immutable, and pinned immutable memtables in bytes, summed over all column families.
Shown as byte
arangodb.rocksdb.total.disk.space
(gauge)
The total size in bytes of all SST files, summed over all column families.
Shown as byte
arangodb.rocksdb.total.inodes
(gauge)
The currently used number of inodes on the disk volume used by RocksDB.
Shown as inode
arangodb.rocksdb.total.sst.files.size
(gauge)
The total size in bytes of all SST files, summed over all column families.
Shown as byte
arangodb.rocksdb.write.stalls.count
(count)
Number of times RocksDB has entered a stalled (slowed) write state.
arangodb.rocksdb.write.stop.count
(count)
Number of times RocksDB has entered a stopped write state.
arangodb.rocksdb.write.stops.count
(count)
The number of times RocksDB was observed by ArangoDB to have entered a stopped write state.
arangodb.server.cpu_cores
(gauge)
Number of CPU cores visible to the arangod process.
arangodb.server.idle_percent
(gauge)
Percentage of time that the system CPUs have been idle.
Shown as percent
arangodb.server.iowait_percent
(gauge)
Percentage of time that the system CPUs have been waiting for I/O.
Shown as percent
arangodb.server.kernel_mode.percent
(gauge)
Percentage of time that system CPUs have spent in kernel mode.
Shown as percent
arangodb.server.physical_memory
(gauge)
Physical memory of the system in bytes.
Shown as byte
arangodb.server.user_mode.percent
(gauge)
Percentage of time that system CPUs have spent in user mode.
Shown as percent
arangodb.transactions.aborted.count
(count)
Number of transactions aborted.
Shown as transaction
arangodb.transactions.committed.count
(count)
Number of transactions committed.
Shown as transaction
arangodb.transactions.expired.count
(count)
Number of expired transactions, i.e. transactions that have been begun but that were automatically garbage-collected due to inactivity within the transactions' time-to-live (TTL) period.
Shown as transaction
arangodb.transactions.read.count
(count)
Number of read-only transactions.
Shown as transaction
arangodb.transactions.started.count
(count)
Number of transactions started/begun.
Shown as transaction
arangodb.vst.connections.count
(count)
Total number of connections accepted for VST.

ログ収集

Agent バージョン 6.0 以降で利用可能

ArangoDB インスタンスからログを収集するには、まず ArangoDB がログをファイルに出力するよう構成されていることを確認します。 例えば、arangod.conf ファイルを使って ArangoDB インスタンスを構成する場合、以下のように記述してください。

# ArangoDB コンフィギュレーションファイル
#
# ドキュメント:
# https://www.arangodb.com/docs/stable/administration-configuration.html
#

...

[log]
file = /var/log/arangodb3/arangod.log

...

ArangoDB のログには、ログの冗長性と出力ファイルのための多くのオプションが含まれています。Datadog のインテグレーションパイプラインは、デフォルトの変換パターンをサポートしています。

  1. Datadog Agent で、ログの収集はデフォルトで無効になっています。以下のように、datadog.yaml ファイルでこれを有効にします。

    logs_enabled: true
    
  2. arangodb.d/conf.yaml ファイルのログ構成ブロックのコメントを解除して編集します。

    logs:
       - type: file
         path: /var/log/arangodb3/arangod.log
         source: arangodb
    

イベント

ArangoDB インテグレーションには、イベントは含まれません。

サービスチェック

arangodb.openmetrics.health
Returns CRITICAL if the Agent is unable to connect to the OpenMetrics endpoint, otherwise returns OK.
Statuses: ok, critical

トラブルシューティング

ご不明な点は、Datadog のサポートチームまでお問い合わせください。