Supported OS Linux Windows Mac OS

インテグレーションバージョン3.2.0

概要

このチェックは、以下を含む TokuMX メトリクスを収集します。

  • Opcounters
  • レプリケーションラグ
  • キャッシュテーブル使用率とストレージサイズ。

セットアップ

インストール

TokuMX チェックは Datadog Agent パッケージに含まれています。サーバーに追加でインストールする必要はありません。

構成

TokuMX の準備

  1. 次のコマンドを使用して、Python MongoDB モジュールを MongoDB サーバーにインストールします。

    sudo pip install --upgrade "pymongo<3.0"
    
  2. 次のコマンドを使用して、モジュールがインストールされていることを確認できます。

    python -c "import pymongo" 2>&1 | grep ImportError && \
    echo -e "\033[0;31mpymongo python module - Missing\033[0m" || \
    echo -e "\033[0;32mpymongo python module - OK\033[0m"
    
  3. Mongo シェルを起動し、admin データベースに Datadog Agent 用の読み取り専用ユーザーを作成します。

    # Authenticate as the admin user.
    use admin
    db.auth("admin", "<YOUR_TOKUMX_ADMIN_PASSWORD>")
    # Add a user for Datadog Agent
    db.addUser("datadog", "<UNIQUEPASSWORD>", true)
    
  4. 次のコマンドを使用して、ユーザーが作成されたことを確認します (Mongo シェルではありません)。

    python -c 'from pymongo import Connection; print Connection().admin.authenticate("datadog", "<UNIQUEPASSWORD>")' | \
    grep True && \
    echo -e "\033[0;32mdatadog user - OK\033[0m" || \
    echo -e "\033[0;31mdatadog user - Missing\033[0m"
    

MongoDB でのユーザーの作成と管理の詳細については、MongoDB Security のドキュメントを参照してください。

ホスト

ホストで実行中の Agent に対してこのチェックを構成するには

  1. Agent の構成ディレクトリのルートにある conf.d/ フォルダーの tokumx.d/conf.yaml ファイルを編集します。 使用可能なすべてのコンフィギュレーションオプションについては、サンプル tokumx.d/conf.yaml を参照してください。

    init_config:
    
    instances:
      - server: "mongodb://<USER>:<PASSWORD>@localhost:27017"
    
  2. Agent を再起動すると、Datadog への TokuMX メトリクスの送信が開始されます。

コンテナ化

コンテナ環境の場合は、オートディスカバリーのインテグレーションテンプレートのガイドを参照して、次のパラメーターを適用してください。

パラメーター
<INTEGRATION_NAME>tokumx
<INIT_CONFIG>空白または {}
<INSTANCE_CONFIG>{"server": "mongodb://<ユーザー>:<パスワード>@%%host%%:27017"}

検証

Agent の status サブコマンドを実行し、Checks セクションで tokumx を探します。

収集データ

メトリクス

tokumx.asserts.msgps
(gauge)
The number of message assertions raised per second.
Shown as assertion
tokumx.asserts.regularps
(gauge)
The number of regular assertions raised per second.
Shown as assertion
tokumx.asserts.rolloversps
(gauge)
The number of times that the rollover counters roll over per second. The counters rollover to zero every 2^30 assertions.
Shown as assertion
tokumx.asserts.userps
(gauge)
The number of user assertions raised per second.
Shown as assertion
tokumx.asserts.warningps
(gauge)
The number of warnings raised per second.
Shown as assertion
tokumx.connections.available
(gauge)
The number of unused available incoming connections the database can provide.
Shown as connection
tokumx.connections.current
(gauge)
The number of connections to the database server from clients.
Shown as connection
tokumx.cursors.timedOut
(gauge)
The total number of cursors that have timed out since the server process started.
Shown as cursor
tokumx.cursors.totalOpen
(gauge)
The number of cursors that tokumx is maintaining for clients.
Shown as cursor
tokumx.ft.alerts.checkpointFailures
(gauge)
The number of checkpoints that have failed for any reason.
Shown as event
tokumx.ft.alerts.locktreeRequestsPending
(gauge)
The number of requests for Document-level Locks in the locktree that are waiting for other requests to release their locks.
Shown as request
tokumx.ft.alerts.longWaitEvents.cachePressure.countps
(gauge)
Rate at which a thread had to wait more than 1 second for evictions to create space in the cachetable for it to page in data it needed.
Shown as event
tokumx.ft.alerts.longWaitEvents.cachePressure.timeps
(gauge)
Fraction of time (microseconds/second) that a thread had to wait more than 1 second for evictions to create space in the cachetable for it to page in data it needed.
Shown as fraction
tokumx.ft.alerts.longWaitEvents.checkpointBegin.countps
(gauge)
Rate at which the begin checkpoint phase of checkpoint has run (these should be fairly quick).
Shown as event
tokumx.ft.alerts.longWaitEvents.checkpointBegin.timeps
(gauge)
Fraction of time (microseconds/second) that a begin checkpoint phase has spent blocking other threads.
Shown as fraction
tokumx.ft.alerts.longWaitEvents.fsync.countps
(gauge)
Rate at which fsync operations took more than 1 second.
Shown as event
tokumx.ft.alerts.longWaitEvents.fsync.timeps
(gauge)
Fraction of time (microseconds/second) spent performing fsync operations that took longer than 1 second.
Shown as fraction
tokumx.ft.alerts.longWaitEvents.locktreeWait.countps
(gauge)
Rate at which a thread had to wait more than 1 second to acquire a document-level lock in the locktree.
Shown as event
tokumx.ft.alerts.longWaitEvents.locktreeWait.timeps
(gauge)
Fraction of time (microseconds/second) spent by threads waiting more than 1 second to acquire a document-level lock in the locktree.
Shown as fraction
tokumx.ft.alerts.longWaitEvents.locktreeWaitEscalation.countps
(gauge)
Rate at which a thread had to wait more than 1 second to acquire a document-level lock because the locktree was at the memory limit and needed to run escalation.
Shown as event
tokumx.ft.alerts.longWaitEvents.locktreeWaitEscalation.timeps
(gauge)
Fraction of time (microseconds/second) spent by threads waiting more than 1 second to acquire a document-level lock because the locktree was at the memory limit and needed to run escalation.
Shown as fraction
tokumx.ft.alerts.longWaitEvents.logBufferWaitps
(gauge)
Rate at which a writing client had to wait more than 100ms for access to the log buffer.
Shown as event
tokumx.ft.cachetable.evictions.full.leaf.clean.bytesps
(gauge)
Rate of full evictions of leaf nodes.
Shown as byte
tokumx.ft.cachetable.evictions.full.leaf.clean.countps
(gauge)
Rate of full evictions of leaf nodes.
Shown as event
tokumx.ft.cachetable.evictions.full.leaf.dirty.bytesps
(gauge)
Rate of full evictions of leaf nodes that need to be written back to disk.
Shown as byte
tokumx.ft.cachetable.evictions.full.leaf.dirty.countps
(gauge)
Rate of full evictions of leaf nodes that need to be written back to disk.
Shown as event
tokumx.ft.cachetable.evictions.full.leaf.dirty.timeps
(gauge)
Fraction of time (microseconds/second) spent performing full evictions leaf nodes, including the time spent serializing, compressing, and writing those nodes to disk.
Shown as fraction
tokumx.ft.cachetable.evictions.full.nonleaf.clean.bytesps
(gauge)
Rate of full evictions of nonleaf nodes.
Shown as byte
tokumx.ft.cachetable.evictions.full.nonleaf.clean.countps
(gauge)
Rate of full evictions of nonleaf nodes.
Shown as event
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.bytesps
(gauge)
Rate of full evictions of nonleaf nodes that need to be written back to disk.
Shown as byte
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.countps
(gauge)
Rate of full evictions of nonleaf nodes that need to be written back to disk.
Shown as event
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.timeps
(gauge)
Fraction of time (microseconds/second) spent performing full evictions nonleaf nodes, including the time spent serializing, compressing, and writing those nodes to disk.
Shown as fraction
tokumx.ft.cachetable.evictions.partial.leaf.clean.bytesps
(gauge)
Rate of partial evictions of leaf nodes.
Shown as byte
tokumx.ft.cachetable.evictions.partial.leaf.clean.countps
(gauge)
Rate of partial evictions of leaf nodes.
Shown as event
tokumx.ft.cachetable.evictions.partial.nonleaf.clean.bytesps
(gauge)
Rate of partial evictions of nonleaf nodes.
Shown as byte
tokumx.ft.cachetable.evictions.partial.nonleaf.clean.countps
(gauge)
Rate of partial evictions of nonleaf nodes.
Shown as event
tokumx.ft.cachetable.miss.countps
(gauge)
Rate of internal cache misses. This metric is similar to MongoDB's btree misses and page faults.
Shown as miss
tokumx.ft.cachetable.miss.full.countps
(gauge)
Rate of full internal cache misses.
Shown as miss
tokumx.ft.cachetable.miss.full.timeps
(gauge)
Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for a full cache miss.
Shown as fraction
tokumx.ft.cachetable.miss.partial.countps
(gauge)
Rate of partial internal cache misses.
Shown as miss
tokumx.ft.cachetable.miss.partial.timeps
(gauge)
Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for a partial cache miss.
Shown as fraction
tokumx.ft.cachetable.miss.timeps
(gauge)
Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for cache misses.
Shown as fraction
tokumx.ft.cachetable.size.current
(gauge)
Total amount of uncompressed data currently in the database's internal cache.
Shown as byte
tokumx.ft.cachetable.size.limit
(gauge)
Total amount of uncompressed data that will fit in TokuMX's internal cache.
Shown as byte
tokumx.ft.cachetable.size.writing
(gauge)
Total size of nodes that are currently queued up to be written to disk for eviction.
Shown as byte
tokumx.ft.checkpoint.begin.timeps
(gauge)
Fraction of time (microseconds/second) that a begin checkpoint phase has spent blocking other threads.
Shown as fraction
tokumx.ft.checkpoint.countps
(gauge)
Rate at which checkpoints are completed.
Shown as event
tokumx.ft.checkpoint.lastComplete.time
(gauge)
The time spent, in seconds, by the most recently completed checkpoint.
Shown as second
tokumx.ft.checkpoint.timeps
(gauge)
Fraction of time (seconds/second) spent doing checkpoints.
Shown as fraction
tokumx.ft.checkpoint.write.leaf.bytes.compressedps
(gauge)
The rate at which leaf nodes are written to disk during checkpoints, after compression.
Shown as byte
tokumx.ft.checkpoint.write.leaf.bytes.uncompressedps
(gauge)
The rate at which leaf nodes are written to disk during checkpoints, before compression.
Shown as byte
tokumx.ft.checkpoint.write.leaf.countps
(gauge)
The rate at which leaf nodes are written to disk during checkpoints.
Shown as write
tokumx.ft.checkpoint.write.leaf.timeps
(gauge)
The fraction of time spent writing leaf nodes to disk during checkpoints.
Shown as fraction
tokumx.ft.checkpoint.write.nonleaf.bytes.compressedps
(gauge)
The rate at which nonleaf nodes are written to disk during checkpoints, after compression.
Shown as byte
tokumx.ft.checkpoint.write.nonleaf.bytes.uncompressedps
(gauge)
The rate at which nonleaf nodes are written to disk during checkpoints, before compression.
Shown as byte
tokumx.ft.checkpoint.write.nonleaf.countps
(gauge)
The rate at which nonleaf nodes are written to disk during checkpoints.
Shown as write
tokumx.ft.checkpoint.write.nonleaf.timeps
(gauge)
The fraction of time spent writing nonleaf nodes to disk during checkpoints.
Shown as fraction
tokumx.ft.compressionRatio.leaf
(gauge)
The size ratio of leaf nodes before and after compression.
Shown as fraction
tokumx.ft.compressionRatio.nonleaf
(gauge)
The size ratio of nonleaf nodes before and after compression.
Shown as fraction
tokumx.ft.compressionRatio.overall
(gauge)
The size ratio of nodes before and after compression.
Shown as fraction
tokumx.ft.fsync.countps
(gauge)
The rate at which the database flushed the operating system's file buffers to disk.
Shown as operation
tokumx.ft.fsync.timeps
(gauge)
The fraction of time (microseconds/second) used to fsync to disk.
Shown as fraction
tokumx.ft.locktree.size.current
(gauge)
Total memory the locktree is currently using.
Shown as byte
tokumx.ft.locktree.size.limit
(gauge)
Maximum number of bytes that the locktree is allowed to use.
Shown as byte
tokumx.ft.log.bytesps
(gauge)
The rate at which the logger writes to disk.
Shown as byte
tokumx.ft.log.countps
(gauge)
The rate of of individual log writes.
Shown as write
tokumx.ft.log.timeps
(gauge)
The fraction of time spent performing log writes.
Shown as fraction
tokumx.ft.serializeTime.leaf.compressps
(gauge)
Fraction of time spent compressing leaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
Shown as fraction
tokumx.ft.serializeTime.leaf.decompressps
(gauge)
Fraction of time spent decompressing leaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
Shown as fraction
tokumx.ft.serializeTime.leaf.deserializeps
(gauge)
Fraction of time spent deserializing leaf nodes and their partitions after reading them off disk.
Shown as fraction
tokumx.ft.serializeTime.leaf.serializeps
(gauge)
Fraction of time spent serializing leaf nodes and their partitions after reading them off disk.
Shown as fraction
tokumx.ft.serializeTime.nonleaf.compressps
(gauge)
Fraction of time spent compressing nonleaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
Shown as fraction
tokumx.ft.serializeTime.nonleaf.decompressps
(gauge)
Fraction of time spent decompressing nonleaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
Shown as fraction
tokumx.ft.serializeTime.nonleaf.deserializeps
(gauge)
Fraction of time spent deserializing nonleaf nodes and their partitions after reading them off disk.
Shown as fraction
tokumx.ft.serializeTime.nonleaf.serializeps
(gauge)
Fraction of time spent serializing nonleaf nodes and their partitions after reading them off disk.
Shown as fraction
tokumx.mem.resident
(gauge)
The amount of memory currently used by the database process.
Shown as mebibyte
tokumx.mem.virtual
(gauge)
The amount of virtual memory used by the database process.
Shown as mebibyte
tokumx.metrics.document.deletedps
(gauge)
The number of documents deleted per second.
Shown as document
tokumx.metrics.document.insertedps
(gauge)
The number of documents inserted per second.
Shown as document
tokumx.metrics.document.returnedps
(gauge)
The number of documents returned by queries per second.
Shown as document
tokumx.metrics.document.updatedps
(gauge)
The number of documents updated per second.
Shown as document
tokumx.metrics.getLastError.wtime.numps
(gauge)
The number of getLastError operations per second with a specified write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation.
Shown as operation
tokumx.metrics.getLastError.wtime.totalMillisps
(gauge)
The number of times per second that write concern operations have timed out as a result of the wtimeout threshold to getLastError.
Shown as event
tokumx.metrics.getLastError.wtimeoutsps
(gauge)
The fraction of time (ms/s) spent performing getLastError operations with write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation.
Shown as fraction
tokumx.metrics.operation.idhackps
(gauge)
The rate of queries that contain the _id field.
Shown as query
tokumx.metrics.operation.scanAndOrderps
(gauge)
The rate of queries that return sorted numbers that cannot perform the sort operation using an index.
Shown as query
tokumx.metrics.queryExecutor.scannedps
(gauge)
The rate of index items scanned during queries and query-plan evaluation.
Shown as operation
tokumx.metrics.repl.apply.batches.numps
(gauge)
The number of batches applied across all databases per second.
Shown as operation
tokumx.metrics.repl.apply.batches.totalMillisps
(gauge)
The fraction of time (ms/s) spent applying operations from the oplog.
Shown as fraction
tokumx.metrics.repl.apply.opsps
(gauge)
The rate of oplog operations.
Shown as operation
tokumx.metrics.repl.buffer.count
(gauge)
The number of operations in the oplog buffer.
Shown as operation
tokumx.metrics.repl.buffer.sizeBytes
(gauge)
The current size of the contents of the oplog buffer.
Shown as byte
tokumx.metrics.repl.network.bytesps
(gauge)
The rate at which data is read from the replication sync source.
Shown as byte
tokumx.metrics.repl.network.getmores.numps
(gauge)
The rate of getmore operations.
Shown as operation
tokumx.metrics.repl.network.getmores.totalMillisps
(gauge)
The fraction of time (ms/s) spent collecting data from getmore operations.
Shown as fraction
tokumx.metrics.repl.network.opsps
(gauge)
The rate of operations read from the replication source.
Shown as operation
tokumx.metrics.repl.network.readersCreatedps
(gauge)
The rate at which oplog query processes are created.
Shown as process
tokumx.metrics.repl.oplog.insert.numps
(gauge)
The rate at which operations are inserted into the oplog.
Shown as operation
tokumx.metrics.repl.oplog.insert.totalMillisps
(gauge)
The fraction of time (ms/s) spent inserting operations into the oplog.
Shown as fraction
tokumx.metrics.repl.oplog.insertBytesps
(gauge)
The rate (in bytes) at which data is inserted into the oplog.
Shown as byte
tokumx.metrics.ttl.deletedDocumentsps
(gauge)
The rate at which documents are deleted from collections with a ttl index.
Shown as document
tokumx.metrics.ttl.passesps
(gauge)
The number of times per second the background process removes documents from collections with a ttl index.
Shown as event
tokumx.opcounters.commandps
(gauge)
The total number of commands per second issued to the database.
Shown as command
tokumx.opcounters.deleteps
(gauge)
The number of delete operations per second.
Shown as operation
tokumx.opcounters.getmoreps
(gauge)
The number of getmore operations per second.
Shown as operation
tokumx.opcounters.insertps
(gauge)
The number of insert operations per second.
Shown as operation
tokumx.opcounters.queryps
(gauge)
The total number of queries per second.
Shown as query
tokumx.opcounters.updateps
(gauge)
The number of update operations per second.
Shown as operation
tokumx.opcountersRepl.commandps
(gauge)
The total number of replicated commands issued to the database per second.
Shown as command
tokumx.opcountersRepl.deleteps
(gauge)
The number of replicated delete operations per second.
Shown as operation
tokumx.opcountersRepl.getmoreps
(gauge)
The number of replicated getmore operations per second.
Shown as operation
tokumx.opcountersRepl.insertps
(gauge)
The number of replicated insert operations per second.
Shown as operation
tokumx.opcountersRepl.queryps
(gauge)
The total number of replicated queries per second.
Shown as query
tokumx.opcountersRepl.updateps
(gauge)
The number of replicated update operations per second.
Shown as operation
tokumx.stats.coll.count
(gauge)
The number of objects or documents in this collection.
Shown as document
tokumx.stats.coll.nindexes
(gauge)
The number of indexes on this collection.
Shown as index
tokumx.stats.coll.nindexesbeingbuilt
(gauge)
The number of indexes currently being built.
Shown as index
tokumx.stats.coll.size
(gauge)
The total size in memory of all records in a collection. Does not include the record header, but does include the record's padding. Does not include the size of any indexes associated with the collection.
Shown as byte
tokumx.stats.coll.storageSize
(gauge)
The total amount of storage allocated to this collection for document storage.
Shown as byte
tokumx.stats.coll.totalIndexSize
(gauge)
The total size of all indexes on this collection.
Shown as byte
tokumx.stats.coll.totalIndexStorageSize
(gauge)
The total size on disk of all indexes on this collection (after compression).
Shown as byte
tokumx.stats.dataSize
(gauge)
The total size of the data held in this database including the padding factor.
Shown as byte
tokumx.stats.db.avgObjSize
(gauge)
The average size of each document.
Shown as byte
tokumx.stats.db.collections
(gauge)
The number of collections in the database.
tokumx.stats.db.dataSize
(gauge)
The total size of the data held in this database including the padding factor.
Shown as byte
tokumx.stats.db.indexSize
(gauge)
The total size of all indexes created on this database.
Shown as byte
tokumx.stats.db.indexStorageSize
(gauge)
The total size on disk of all indexes created on this database (after compression).
Shown as byte
tokumx.stats.db.indexes
(gauge)
The total number of indexes across all collections in the database.
Shown as index
tokumx.stats.db.objects
(gauge)
The number of documents in the database across all collections.
Shown as document
tokumx.stats.db.storageSize
(gauge)
The total amount of space allocated to collections in this database for document storage.
Shown as byte
tokumx.stats.idx.avgObjSize
(gauge)
The average size of each index entry.
Shown as byte
tokumx.stats.idx.count
(gauge)
The number of documents in this index.
Shown as index
tokumx.stats.idx.deletes
(gauge)
The number of delete operations performed on this index.
Shown as operation
tokumx.stats.idx.inserts
(gauge)
The number of insert operations performed on this index.
Shown as operation
tokumx.stats.idx.nscanned
(gauge)
The number of index entries scanned for queries using this index.
Shown as index
tokumx.stats.idx.nscannedObjects
(gauge)
The number of collection objects examined after scanning an index entry for a query using this index.
Shown as object
tokumx.stats.idx.queries
(gauge)
The number of query operations performed using this index.
Shown as query
tokumx.stats.idx.size
(gauge)
The total size of this index.
Shown as byte
tokumx.stats.idx.storageSize
(gauge)
The total size on disk of this index (after compression).
Shown as byte
tokumx.stats.indexSize
(gauge)
The total size of all indexes created on this database.
Shown as byte
tokumx.stats.indexes
(gauge)
The total number of indexes across all collections in the database.
Shown as index
tokumx.stats.objects
(gauge)
The number of documents in the database across all collections.
Shown as document
tokumx.stats.storageSize
(gauge)
The total amount of space allocated to collections in this database for document storage.
Shown as byte
tokumx.uptime
(gauge)
The time that the tokumx process has been active.
Shown as second

イベント

レプリケーション状態の変化:

このチェックは、TokuMX ノードでレプリケーション状態が変化するたびにイベントを送信します。

サービスチェック

tokumx.can_connect
Agent が監視対象の TokuMX インスタンスに接続できない場合は、CRITICAL を返します。それ以外の場合は、OK を返します。
Statuses: ok, クリティカル

トラブルシューティング

ご不明な点は、Datadog のサポートチームまでお問合せください。

その他の参考資料