MapR

Supported OS Linux

Intégration1.9.0

Présentation

Ce check permet de surveiller MapR (version 6.1+) avec l’Agent Datadog.

Configuration

Suivez les instructions ci-dessous pour installer et configurer ce check lorsque l’Agent est exécuté sur un host.

Installation

Le check MapR est inclus avec le package de l’Agent Datadog, mais il nécessite des opérations d’installation supplémentaires.

Prérequis

La surveillance de MapR fonctionne correctement.
Vous disposez d’un utilisateur MapR (comportant un nom, un mot de passe, un UID et un GID) avec l’autorisation « consume » sur le flux /var/mapr/mapr.monitoring/metricstreams. Il peut s’agir d’un utilisateur existant ou d’un nouvel utilisateur.
Sur un cluster non sécurisé : suivez la documentation Configurer l’emprunt d’identité sans sécurité de cluster (en anglais) pour permettre à l’utilisateur dd-agent d’emprunter l’identité de cet utilisateur MapR.
Sur un cluster sécurisé : générez un ticket de service à long terme pour cet utilisateur, lisible par l’utilisateur dd-agent.

Suivez ces étapes d’installation pour chaque nœud :

Installez l’Agent.
Installez la bibliothèque mapr-streams-library à l’aide de la commande suivante :
sudo -u dd-agent /opt/datadog-agent/embedded/bin/pip install --global-option=build_ext --global-option="--library-dirs=/opt/mapr/lib" --global-option="--include-dirs=/opt/mapr/include/" mapr-streams-python.
Si vous utilisez Python 3 avec l’Agent v7, remplacez pip par pip3.
Ajoutez /opt/mapr/lib/ à /etc/ld.so.conf (ou un fichier dans /etc/ld.so.conf.d/). Sans cela, la bibliothèque mapr-streams-library utilisée par l’Agent ne peut pas trouver les bibliothèques partagées MapR.
Rechargez les bibliothèques en exécutant la commande sudo ldconfig.
Configurez l’intégration en indiquant l’emplacement du ticket.

Remarques supplémentaires

Si vous n’avez pas activé les fonctionnalités de sécurité pour le cluster, vous pouvez continuer sans créer de ticket.
Si l’utilisation d’outils de compilation comme gcc (requis pour créer la bibliothèque mapr-streams-library) n’est pas autorisée dans votre environnement de production, vous pouvez générer un fichier wheel compilé de la bibliothèque sur une instance de développement, puis distribuer ce fichier aux hosts de production. Pour que le fichier wheel compilé fonctionne, les hosts de développement et de production doivent être suffisamment semblables. Vous pouvez exécuter la commande sudo -u dd-agent /opt/datadog-agent/embedded/bin/pip wheel --global-option=build_ext --global-option="--library-dirs=/opt/mapr/lib" --global-option="--include-dirs=/opt/mapr/include/" mapr-streams-python pour créer le fichier wheel sur la machine de développement. Exécutez ensuite la commande sudo -u dd-agent /opt/datadog-agent/embedded/bin/pip install <LE_FICHIER_WHEEL> sur la machine de production.
Si vous utilisez Python 3 avec l’Agent v7, n’oubliez pas de remplacer pip par pip3 lors de l’installation de mapr-streams-library.

Configuration

Collecte de métriques

Modifiez le fichier mapr.d/conf.yaml dans le dossier conf.d/ à la racine du répertoire de configuration de votre Agent pour recueillir vos données de performance MapR. Consultez le fichier d’exemple mapr.d/conf.yaml pour découvrir toutes les options de configuration disponibles.
Définissez le paramètre ticket_location dans la configuration sur le chemin du ticket à long terme que vous avez créé.
Redémarrez l’Agent.

Collecte de logs

MapR utilise FluentD pour les logs. Utilisez le plug-in FluentD pour Datadog afin de recueillir des logs MapR. La commande suivante télécharge et installe le plug-in dans le répertoire approprié.

curl https://raw.githubusercontent.com/DataDog/fluent-plugin-datadog/master/lib/fluent/plugin/out_datadog.rb -o /opt/mapr/fluentd/fluentd-<VERSION>/lib/fluentd-<VERSION>-linux-x86_64/lib/app/lib/fluent/plugin/out_datadog.rb

Ensuite, mettez à jour /opt/mapr/fluentd/fluentd-<VERSION>/etc/fluentd/fluentd.conf avec la section suivante.

<match *>
  @type copy
  <store> # Cette section est présente par défaut et transmet les logs à ElasticCache pour Kibana.
    @include /opt/mapr/fluentd/fluentd-<VERSION>/etc/fluentd/es_config.conf
    include_tag_key true
    tag_key service_name
  </store>
  <store> # Cette section transmet tous les logs à Datadog :
    @type datadog
    @id dd_agent
    include_tag_key true
    dd_source mapr  # Définit « source: mapr » sur chaque log pour permettre le parsing automatique sur Datadog.
    dd_tags "<KEY>:<VALUE>"
    service <NOM_DU_SERVICE>
    api_key <VOTRE_CLÉ_API>
  </store>

Consultez la documentation fluent_datadog_plugin (en anglais) pour en savoir plus sur les options disponibles.

Validation

Lancez la sous-commande status de l’Agent et cherchez mapr dans la section Checks.

Données collectées

Métriques


mapr.alarms.alarm_raised (gauge)	The number of threads that are waiting to be executed. This can occur when a thread must wait for another thread to perform an action before proceeding. Shown as thread
mapr.cache.lookups_data (count)	The number of cache lookups in the block cache. Shown as operation
mapr.cache.lookups_dir (count)	The number of cache lookups in the table LRU cache. The table LRU is used for storing internal B-Tree leaf pages. Shown as operation
mapr.cache.lookups_inode (count)	The number of cache lookups in the inode cache.
mapr.cache.lookups_largefile (count)	The number of cache lookups in the large file LRU cache. The large file LRU is used for storing files with size greater than 64K and MapR database data pages. Shown as operation
mapr.cache.lookups_meta (count)	The number of cache lookups on the meta LRU cache. The meta LRU is used for storing internal B-Tree pages. Shown as operation
mapr.cache.lookups_smallfile (count)	The number of cache lookups on the small file LRU cache. This LRU is used for storing files with size less than 64K and MapR database index pages. Shown as operation
mapr.cache.lookups_table (count)	The number of cache lookups in the table LRU cache. The table LRU is used for storing internal B-Tree leaf pages. Shown as operation
mapr.cache.misses_data (count)	The number of cache misses in the block cache. Shown as miss
mapr.cache.misses_dir (count)	The number of cache misses on the table LRU cache. Shown as miss
mapr.cache.misses_inode (count)	The number of cache misses in the inode cache. Shown as miss
mapr.cache.misses_largefile (count)	The number of cache misses on the large file LRU cache. Shown as miss
mapr.cache.misses_meta (count)	The number of cache misses on the meta LRU cache. Shown as miss
mapr.cache.misses_smallfile (count)	The number of cache misses on the small file LRU cache. Shown as miss
mapr.cache.misses_table (count)	The number of cache misses on the table LRU cache. Shown as miss
mapr.cldb.cluster_cpu_total (gauge)	The number of physical CPUs in the cluster. Shown as cpu
mapr.cldb.cluster_cpubusy_percent (gauge)	The aggregate percentage of busy CPUs in the cluster. Shown as percent
mapr.cldb.cluster_disk_capacity (gauge)	The storage capacity for MapR disks in GB. Shown as gibibyte
mapr.cldb.cluster_diskspace_used (gauge)	The amount of MapR disks used in GB. Shown as gibibyte
mapr.cldb.cluster_memory_capacity (gauge)	The memory capacity in MB. Shown as mebibyte
mapr.cldb.cluster_memory_used (gauge)	The amount of used memory in MB. Shown as mebibyte
mapr.cldb.containers (gauge)	The number of containers currently in the cluster. Shown as container
mapr.cldb.containers_created (count)	The cumulative number of containers created in the cluster. This value includes containers that have been deleted. Shown as container
mapr.cldb.containers_unusable (gauge)	The number of containers that are no longer usable. The CLDB marks a container as unusable when the node that stores the container is offline for 1 hour or more. Shown as container
mapr.cldb.disk_space_available (gauge)	The amount of disk space available in GB. Shown as gibibyte
mapr.cldb.nodes_in_cluster (gauge)	The number of nodes in the cluster. Shown as node
mapr.cldb.nodes_offline (gauge)	The number of nodes in the cluster that are offline. Shown as node
mapr.cldb.rpc_received (count)	The number of RPCs received. Shown as operation
mapr.cldb.rpcs_failed (count)	The number of RPCs failed. Shown as operation
mapr.cldb.storage_pools_cluster (gauge)	The number of storage pools.
mapr.cldb.storage_pools_offline (gauge)	The number of offline storage pools.
mapr.cldb.volumes (gauge)	The number of volumes created, including system volumes. Shown as volume
mapr.db.append_bytes (count)	The number of bytes written by append RPCs Shown as byte
mapr.db.append_rpcrows (count)	The number of rows written by append RPCs Shown as object
mapr.db.append_rpcs (count)	The number of MapR Database append RPCs completed Shown as operation
mapr.db.cdc.pending_bytes (gauge)	The number of bytes of CDC data remaining to be sent Shown as byte
mapr.db.cdc.sent_bytes (count)	The number of bytes of CDC data sent Shown as byte
mapr.db.checkandput_bytes (count)	The number of bytes written by check and put RPCs Shown as byte
mapr.db.checkandput_rpcrows (count)	The number of rows written by check and put RPCs Shown as object
mapr.db.checkandput_rpcs (count)	The number of MapR Database check and put RPCs completed Shown as operation
mapr.db.flushes (count)	The number of flushes that reorganize data from bucket files (unsorted data) to spill files (sorted data) when the bucket size exceeds a threshold. Shown as flush
mapr.db.forceflushes (count)	The number of flushes that reorganize data from bucket files (unsorted data) to spill files (sorted data) when the in-memory bucket file cache fills up. Shown as flush
mapr.db.fullcompacts (count)	The number of compactions that combine multiple MapR Database data files containing sorted data (known as spills) into a single spill file. Shown as operation
mapr.db.get_bytes (count)	The number of bytes read by get RPCs Shown as byte
mapr.db.get_currpcs (gauge)	The number of MapR Database get RPCs in progress Shown as operation
mapr.db.get_readrows (count)	The number of rows read by get RPCs Shown as object
mapr.db.get_resprows (count)	The number of rows returned from get RPCs Shown as object
mapr.db.get_rpcs (count)	The number of MapR database get RPCs completed Shown as operation
mapr.db.increment_bytes (count)	The number of bytes written by increment RPCs Shown as byte
mapr.db.increment_rpcrows (count)	The number of rows written by increment RPCs Shown as object
mapr.db.increment_rpcs (count)	The number of MapR Database increment RPCs completed Shown as operation
mapr.db.index.pending_bytes (gauge)	The number of bytes of secondary index data remaining to be sent Shown as byte
mapr.db.minicompacts (count)	The number of compactions that combine multiple small data files containing sorted data (known as spills) into a single spill file. Shown as operation
mapr.db.put_bytes (count)	The number of bytes written by put RPCs Shown as byte
mapr.db.put_currpcs (gauge)	The number of MapR Database put RPCs in progress Shown as operation
mapr.db.put_readrows (count)	The number of rows read by put RPCs Shown as object
mapr.db.put_rpcrows (count)	The number of rows written by put RPCs. Each MapR Database put RPC can include multiple put rows. Shown as object
mapr.db.put_rpcs (count)	The number of MapR Database put RPCs completed Shown as operation
mapr.db.repl.pending_bytes (gauge)	The number of bytes of replication data remaining to be sent Shown as byte
mapr.db.repl.sent_bytes (count)	The number of bytes sent to replicate data Shown as byte
mapr.db.scan_bytes (count)	The number of bytes read by scan RPCs Shown as byte
mapr.db.scan_currpcs (gauge)	The number of MapR Database scan RPCs in progress Shown as operation
mapr.db.scan_readrows (count)	The number of rows read by scan RPCs Shown as object
mapr.db.scan_resprows (count)	The number of rows returned from scan RPCs. Shown as object
mapr.db.scan_rpcs (count)	The number of MapR Database scan RPCs completed Shown as operation
mapr.db.table.latency (gauge)	The latency of RPC operations on tables,represented as a histogram. Endpoints identify histogram bucket boundaries. Shown as millisecond
mapr.db.table.read_bytes (count)	The number of bytes read from tables Shown as byte
mapr.db.table.read_rows (count)	The number of rows read from tables Shown as object
mapr.db.table.resp_rows (count)	The number of rows returned from tables Shown as object
mapr.db.table.rpcs (count)	The number of RPC calls completed on tables Shown as operation
mapr.db.table.value_cache_hits (count)	The number of MapR Database operations on tables that utilized the MapR Database value cache Shown as operation
mapr.db.table.value_cache_lookups (count)	The number of MapR Database operations on tables that performed a lookup on the MapR Database value cache Shown as operation
mapr.db.table.write_bytes (count)	The number of bytes written to tables Shown as byte
mapr.db.table.write_rows (count)	The number of rows written to tables Shown as object
mapr.db.ttlcompacts (count)	The number of compactions that result in reclamation of disk space due to removal of stale data. Shown as operation
mapr.db.updateandget_bytes (count)	The number of bytes written by update and get RPCs Shown as byte
mapr.db.updateandget_rpcrows (count)	The number of rows written by update and get RPCs Shown as object
mapr.db.updateandget_rpcs (count)	The number of MapR Database update and get RPCs completed Shown as operation
mapr.db.valuecache_hits (count)	The number of MapR Database operations that utilized the MapR Database value cache Shown as operation
mapr.db.valuecache_lookups (count)	The number of MapR Database operations that performed a lookup on the MapR Database value cache Shown as operation
mapr.db.valuecache_usedSize (gauge)	The MapR Database value cache size in MB Shown as mebibyte
mapr.drill.allocator_root_peak (gauge)	The peak amount of memory used in bytes by the internal memory allocator. Shown as byte
mapr.drill.allocator_root_used (gauge)	The amount of memory used in bytes by the internal memory allocator. Shown as byte
mapr.drill.blocked_count (gauge)	The number of threads that are blocked because they are waiting for a monitor lock. Shown as thread
mapr.drill.count (gauge)	The number of live threads (including both daemon and non-daemon threads). Shown as thread
mapr.drill.fd_usage (gauge)	The ratio of used to total file descriptors.
mapr.drill.fragments_running (gauge)	The number of query fragments currently running in the drillbit. Shown as byte
mapr.drill.heap_used (gauge)	The amount of heap memory used in bytes by the JVM. Shown as byte
mapr.drill.non_heap_used (gauge)	The amount of non-heap memory used in bytes by the JVM. Shown as byte
mapr.drill.queries_completed (count)	The number of completed, canceled or failed queries for which this drillbit is the foreman. Shown as byte
mapr.drill.queries_running (gauge)	The number of running queries for which this drillbit is the foreman. Shown as byte
mapr.drill.runnable_count (gauge)	The number of threads executing in the JVM. Shown as thread
mapr.drill.waiting_count (gauge)	The number of threads that are waiting to be executed. This can occur when a thread must wait for another thread to perform an action before proceeding. Shown as thread
mapr.fs.bulk_writes (count)	The number of bulk-write operations. Bulk-write operations occur when the MapR filesystem container master aggregates multiple file writes from one or more clients into one RPC before replicating the writes. Shown as write
mapr.fs.bulk_writesbytes (count)	The number of bytes written by bulk-write operations. Bulk-write operations occur when the MapR filesystem container master aggregates multiple file writes from one or more clients into one RPC before replicating the writes. Shown as byte
mapr.fs.kvstore_delete (count)	The number of delete operations on key-value store files which are used by the CLDB and MapR database. Shown as operation
mapr.fs.kvstore_insert (count)	The number of insert operations on key-value store files which are used by the CLDB and MapR database. Shown as operation
mapr.fs.kvstore_lookup (count)	The number of lookup operations on key-value store files which are used by the CLDB and MapR database. Shown as operation
mapr.fs.kvstore_scan (count)	The number of scan operations on key-value store files which are used by the CLDB and MapR database. Shown as operation
mapr.fs.local_readbytes (count)	The number of bytes read by applications that are running on the MapR filesystem node. Shown as byte
mapr.fs.local_reads (count)	The number of file read operations by applications that are running on the MapR filesystem node. Shown as read
mapr.fs.local_writebytes (count)	The number of bytes written by applications that are running on the MapR filesystem node. Shown as byte
mapr.fs.local_writes (count)	The number of file write operations by applications that are running on the MapR filesystem node. Shown as operation
mapr.fs.read_bytes (count)	The amount of data read remotely in MB. Shown as mebibyte
mapr.fs.read_cachehits (count)	The number of cache hits for file reads. This value includes pages that the MapR filesystem populates using readahead mechanism. Shown as hit
mapr.fs.read_cachemisses (count)	The number of cache misses for file read operations. Shown as miss
mapr.fs.reads (count)	The number of remote reads. Shown as read
mapr.fs.statstype_create (count)	The number of file create operations. Shown as operation
mapr.fs.statstype_lookup (count)	The number of lookup operations. Shown as operation
mapr.fs.statstype_read (count)	The number of file read operations. Shown as read
mapr.fs.statstype_write (count)	The number of file write operations. Shown as write
mapr.fs.write_bytes (count)	The amount of data written remotely in MB. Shown as mebibyte
mapr.fs.writes (count)	The number of remote writes. Shown as write
mapr.io.read_bytes (gauge)	The number of MB read from disk. Shown as mebibyte
mapr.io.reads (gauge)	The number of MapR Filesystem disk read operations. Shown as read
mapr.io.write_bytes (count)	The number of MB written to disk. Shown as mebibyte
mapr.io.writes (count)	The number of MapR Filesystem disk write operations. Shown as write
mapr.metrics.submitted (gauge)	Number of metrics submitted every check run.
mapr.process.context_switch_involuntary (count)	The number of involuntary context switches for MapR processes. Shown as operation
mapr.process.context_switch_voluntary (count)	The number of voluntary context switches for MapR processes. Shown as process
mapr.process.cpu_percent (gauge)	The percentage of CPU used for MapR processes. Shown as percent
mapr.process.cpu_time.syst (count)	The amount of time measured in seconds that the process has been in kernel mode. Shown as second
mapr.process.cpu_time.user (count)	The amount of time measured in seconds that the process has been in user mode Shown as second
mapr.process.data (gauge)	The amount memory in MB used by the data segments of MapR processes. Shown as mebibyte
mapr.process.disk_octets.read (count)	The number of bytes read from disk for MapR processes. Shown as byte
mapr.process.disk_octets.write (count)	The number of bytes written to disk for MapR processes. Shown as byte
mapr.process.disk_ops.read (count)	The number of read operations for MapR processes. Shown as read
mapr.process.disk_ops.write (count)	The number of write operations for MapR processes. Shown as write
mapr.process.mem_percent (gauge)	The percentage of total system memory (not capped by MapR processes) used for MapR processes. Shown as percent
mapr.process.page_faults.majflt (count)	The number of major MapR process faults that required loading a memory page from disk. Shown as error
mapr.process.page_faults.minflt (count)	The number of minor MapR process faults that required loading a memory page from disk. Shown as error
mapr.process.rss (gauge)	The actual amount of memory in MB used by MapR processes. Shown as mebibyte
mapr.process.vm (gauge)	The amount of virtual memory in MB used by MapR processes. Shown as mebibyte
mapr.rpc.bytes_recd (count)	The number of bytes received by the MapR Filesystem over RPC. Shown as byte
mapr.rpc.bytes_sent (count)	The number of bytes sent by the MapR filesystem over RPC. Shown as byte
mapr.rpc.calls_recd (count)	The number of RPC calls received by the MapR filesystem. Shown as message
mapr.streams.listen_bytes (count)	The number of megabytes consumed by Streams messages. Shown as mebibyte
mapr.streams.listen_currpcs (gauge)	The number of concurrent Stream consumer RPCs. Shown as object
mapr.streams.listen_msgs (count)	The number of Streams messages read by the consumer. Shown as object
mapr.streams.listen_rpcs (count)	The number of Streams consumer RPCs. Shown as object
mapr.streams.produce_bytes (count)	The number of megabytes produced by Streams messages. Shown as mebibyte
mapr.streams.produce_msgs (count)	The number of Streams messages produced. Shown as object
mapr.streams.produce_rpcs (count)	The number of Streams producer RPCs. Shown as object
mapr.topology.disks_total_capacity (gauge)	The disk capacity in gigabytes. Shown as gibibyte
mapr.topology.disks_used_capacity (gauge)	The amount disk space used in gigabytes. Shown as gibibyte
mapr.topology.utilization (gauge)	The aggregate percentage of CPU utilization. Shown as percent
mapr.volmetrics.read_latency (gauge)	The per volume read latency in milliseconds Shown as millisecond
mapr.volmetrics.read_ops (count)	A count of the read operations per volume Shown as operation
mapr.volmetrics.read_throughput (gauge)	The per volume read throughput in KB Shown as kibibyte
mapr.volmetrics.write_latency (gauge)	The per volume write latency in milliseconds Shown as millisecond
mapr.volmetrics.write_ops (count)	A count of the write operations per volume Shown as operation
mapr.volmetrics.write_throughput (gauge)	The per volume write throughput in KB Shown as kibibyte
mapr.volume.logical_used (gauge)	The number of MBs used for logical volumes before compression is applied to the files. Shown as mebibyte
mapr.volume.quota (gauge)	The number of megabytes(MB) used for volume quota. Shown as mebibyte
mapr.volume.snapshot_used (gauge)	The number of MBs used for snapshots. Shown as mebibyte
mapr.volume.total_used (gauge)	The number of MB used for volumes and snapshots. Shown as mebibyte
mapr.volume.used (gauge)	The number of MB used for volumes after compression is applied to the files. Shown as mebibyte

Événements

Le check MapR n’inclut aucun événement.

Checks de service

mapr.can_connect

Returns CRITICAL if the Agent fails to connect and subscribe to the stream topic, OK otherwise.

Statuses: ok, critical

Dépannage

L’Agent ne cesse de crasher après avoir configuré l’intégration MapR
Il arrive que la bibliothèque C au sein de mapr-streams-python entraîne une erreur de segmentation, en raison de problèmes d’autorisation. Assurez-vous que l’utilisateur dd-agent dispose d’une autorisation de lecture pour le fichier du ticket et qu’il peut exécuter des commandes maprcli lorsque la variable d’environnement MAPR_TICKETFILE_LOCATION est définie sur le ticket.
L’intégration semble fonctionner correctement, mais aucune métrique n’est envoyée
Laissez l’Agent s’exécuter pendant quelques minutes. Étant donné que l’intégration récupère des données à partir d’une rubrique, MapR doit d’abord transmettre les données à cette rubrique. Si cela ne résout pas le problème, et qu’une exécution manuelle de l’Agent avec sudo entraîne l’affichage de données, il s’agit d’un problème d’autorisations. Vérifiez toutes les options de configuration. L’utilisateur Linux dd-agent doit pouvoir utiliser un ticket stocké en local. Cela lui permet de transmettre des requêtes auprès de MapR en tant qu’utilisateur X (qui peut correspondre ou nom à dd-agent). De plus, l’utilisateur X doit disposer de l’autorisation consume sur le flux /var/mapr/mapr.monitoring/metricstreams.
Le message confluent_kafka was not imported correctly ... s’affiche
L’environnement intégré à l’Agent n’est pas parvenu à exécuter la commande import confluent_kafka. Cela signifie que la bibliothèque mapr-streams-library n’a pas été installée au sein de l’environnement intégré, ou qu’elle ne trouve pas les bibliothèques mapr-core. Le message d’erreur fournit davantage d’informations à ce sujet.

Besoin d’aide supplémentaire ? Contactez l’assistance Datadog.

MapR

Présentation

Configuration

Installation

Prérequis

Remarques supplémentaires

Configuration

Collecte de métriques

Collecte de logs

Validation

Données collectées

Métriques

Événements

Checks de service

Dépannage

How can I help you today?