MapR

문서 > 통합 > MapR

Supported OS

통합 버전3.0.0

개요

이 점검은 Datadog 에이전트를 통해 MapR 6.1+를 모니터링합니다.

설정

호스트에서 실행되는 Agent에 대해 이 검사를 설치하고 구성하려면 아래 지침을 따르세요.

설치

MapR 점검은 Datadog 에이전트 패키지에 포함되어 있습니다. 하지만 추가 설정 운영이 필요합니다.

사전 필수 조건

MapR 모니터링이 올바르게 실행 중입니다.
/var/mapr/mapr.monitoring/metricstreams 스트림에 ‘사용’ 권한이 있는 사용 가능한 MapR 사용자(이름, 비밀번호, UID 및 GID 포함)가 있습니다. 이 사용자는 기존 사용자이거나 새로 생성된 사용자일 수 있습니다.
비보안 클러스터: 클러스터 보안 없이 사칭(Impersonation) 설정하기에 따라 dd-agent 사용자가 이 MapR 사용자를 사칭할 수 있도록 합니다.
보안 클러스터: dd-agent 사용자가 읽을 수 있는, 해당 사용자에 대한 장기 서비스 티켓을 생성합니다.

각 노드에 대한 설치 단계:

에이전트를 설치합니다.
이 지침에 따라 _mapr-streams-library_에 필요한 librdkafka 라이브러리를 설치합니다.
다음 명령으로 mapr-streams-library 라이브러리를 설치합니다.
sudo -u dd-agent /opt/datadog-agent/embedded/bin/pip install --global-option=build_ext --global-option="--library-dirs=/opt/mapr/lib" --global-option="--include-dirs=/opt/mapr/include/" mapr-streams-python.
에이전트 v7에서 파이썬(Python) 3을 사용하는 경우 pip를 pip3로 변경합니다.
/etc/ld.so.conf (또는 /etc/ld.so.conf.d/의 파일)에 /opt/mapr/lib/을 추가합니다. 이는 에이전트가 MapR 공유 라이브러리를 찾는 데 사용하는 _mapr-streams-library_에 필요합니다.
sudo ldconfig을 실행하여 라이브러리를 다시 로드합니다.
티켓 위치를 지정하여 통합을 설정합니다.

추가 참고 사항

클러스터에서 ‘보안’을 활성화하지 않은 경우 티켓 없이 계속 진행할 수 있습니다.
프로덕션 환경에서 gcc와 같은 컴파일 도구(mapr-streams-library 빌드에 필요)를 허용하지 않는 경우, 개발 인스턴스에서 라이브러리의 컴파일된 휠을 생성하고 컴파일된 휠을 프로덕션 호스트에 배포할 수 있습니다. 개발과 프로덕션 호스트는 컴파일된 휠이 호환될 수 있을 정도로 유사해야 합니다. sudo -u dd-agent /opt/datadog-agent/embedded/bin/pip wheel --global-option=build_ext --global-option="--library-dirs=/opt/mapr/lib" --global-option="--include-dirs=/opt/mapr/include/" mapr-streams-python 을 실행하여 개발 머신에서 휠 파일을 생성할 수 있습니다. 그런 다음 프로덕션 머신에서 sudo -u dd-agent /opt/datadog-agent/embedded/bin/pip install <THE_WHEEL_FILE> 을 실행합니다.
에이전트 v7에서 파이썬(Python) 3을 사용하는 경우, _mapr-streams-library_를 설치할 때 pip을 pip3로 변경해야 합니다.

구성

메트릭 수집

MapR 성능 데이터를 수집하려면 에이전트의 설정 디렉토리 루트의 conf.d/ 폴더에서 mapr.d/conf.yaml 파일을 편집합니다. 사용 가능한 모든 설정 옵션은 mapr.d/conf.yaml 샘플을 참조하세요.
설정에서 ticket_location 파라미터를 생성한 장기 티켓의 경로로 설정하세요.
[에이전트를 재시작합니다] 9.

로그 수집

MapR은 로그에 FluentD를 사용합니다. FluentD Datadog 플러그인을 사용하여 MapR 로그를 수집합니다. 다음 명령으로 플러그인을 다운로드하여 올바른 디렉토리에 설치합니다.

curl https://raw.githubusercontent.com/DataDog/fluent-plugin-datadog/master/lib/fluent/plugin/out_datadog.rb -o /opt/mapr/fluentd/fluentd-<VERSION>/lib/fluentd-<VERSION>-linux-x86_64/lib/app/lib/fluent/plugin/out_datadog.rb

그런 다음 /opt/mapr/fluentd/fluentd-<VERSION>/etc/fluentd/fluentd.conf를 다음 섹션으로 업데이트합니다.

<match *>
  @type copy
  <store> # This section is here by default and sends the logs to ElasticCache for Kibana.
    @include /opt/mapr/fluentd/fluentd-<VERSION>/etc/fluentd/es_config.conf
    include_tag_key true
    tag_key service_name
  </store>
  <store> # This section also forwards all the logs to Datadog:
    @type datadog
    @id dd_agent
    include_tag_key true
    dd_source mapr  # Sets "source: mapr" on every log to allow automatic parsing on Datadog.
    dd_tags "<KEY>:<VALUE>"
    service <YOUR_SERVICE_NAME>
    api_key <YOUR_API_KEY>
  </store>

사용 가능한 옵션에 대한 자세한 내용은 fluent_datadog_plugin을 참조하세요.

검증

에이전트의 상태 하위 명령을 실행하고 점검 섹션에서 mapr를 찾습니다.

수집한 데이터

메트릭


mapr.alarms.alarm_raised (gauge)	The number of threads that are waiting to be executed. This can occur when a thread must wait for another thread to perform an action before proceeding. Shown as thread
mapr.cache.lookups_data (count)	The number of cache lookups in the block cache. Shown as operation
mapr.cache.lookups_dir (count)	The number of cache lookups in the table LRU cache. The table LRU is used for storing internal B-Tree leaf pages. Shown as operation
mapr.cache.lookups_inode (count)	The number of cache lookups in the inode cache.
mapr.cache.lookups_largefile (count)	The number of cache lookups in the large file LRU cache. The large file LRU is used for storing files with size greater than 64K and MapR database data pages. Shown as operation
mapr.cache.lookups_meta (count)	The number of cache lookups on the meta LRU cache. The meta LRU is used for storing internal B-Tree pages. Shown as operation
mapr.cache.lookups_smallfile (count)	The number of cache lookups on the small file LRU cache. This LRU is used for storing files with size less than 64K and MapR database index pages. Shown as operation
mapr.cache.lookups_table (count)	The number of cache lookups in the table LRU cache. The table LRU is used for storing internal B-Tree leaf pages. Shown as operation
mapr.cache.misses_data (count)	The number of cache misses in the block cache. Shown as miss
mapr.cache.misses_dir (count)	The number of cache misses on the table LRU cache. Shown as miss
mapr.cache.misses_inode (count)	The number of cache misses in the inode cache. Shown as miss
mapr.cache.misses_largefile (count)	The number of cache misses on the large file LRU cache. Shown as miss
mapr.cache.misses_meta (count)	The number of cache misses on the meta LRU cache. Shown as miss
mapr.cache.misses_smallfile (count)	The number of cache misses on the small file LRU cache. Shown as miss
mapr.cache.misses_table (count)	The number of cache misses on the table LRU cache. Shown as miss
mapr.cldb.cluster_cpu_total (gauge)	The number of physical CPUs in the cluster. Shown as cpu
mapr.cldb.cluster_cpubusy_percent (gauge)	The aggregate percentage of busy CPUs in the cluster. Shown as percent
mapr.cldb.cluster_disk_capacity (gauge)	The storage capacity for MapR disks in GB. Shown as gibibyte
mapr.cldb.cluster_diskspace_used (gauge)	The amount of MapR disks used in GB. Shown as gibibyte
mapr.cldb.cluster_memory_capacity (gauge)	The memory capacity in MB. Shown as mebibyte
mapr.cldb.cluster_memory_used (gauge)	The amount of used memory in MB. Shown as mebibyte
mapr.cldb.containers (gauge)	The number of containers currently in the cluster. Shown as container
mapr.cldb.containers_created (count)	The cumulative number of containers created in the cluster. This value includes containers that have been deleted. Shown as container
mapr.cldb.containers_unusable (gauge)	The number of containers that are no longer usable. The CLDB marks a container as unusable when the node that stores the container is offline for 1 hour or more. Shown as container
mapr.cldb.disk_space_available (gauge)	The amount of disk space available in GB. Shown as gibibyte
mapr.cldb.nodes_in_cluster (gauge)	The number of nodes in the cluster. Shown as node
mapr.cldb.nodes_offline (gauge)	The number of nodes in the cluster that are offline. Shown as node
mapr.cldb.rpc_received (count)	The number of RPCs received. Shown as operation
mapr.cldb.rpcs_failed (count)	The number of RPCs failed. Shown as operation
mapr.cldb.storage_pools_cluster (gauge)	The number of storage pools.
mapr.cldb.storage_pools_offline (gauge)	The number of offline storage pools.
mapr.cldb.volumes (gauge)	The number of volumes created, including system volumes. Shown as volume
mapr.db.append_bytes (count)	The number of bytes written by append RPCs Shown as byte
mapr.db.append_rpcrows (count)	The number of rows written by append RPCs Shown as object
mapr.db.append_rpcs (count)	The number of MapR Database append RPCs completed Shown as operation
mapr.db.cdc.pending_bytes (gauge)	The number of bytes of CDC data remaining to be sent Shown as byte
mapr.db.cdc.sent_bytes (count)	The number of bytes of CDC data sent Shown as byte
mapr.db.checkandput_bytes (count)	The number of bytes written by check and put RPCs Shown as byte
mapr.db.checkandput_rpcrows (count)	The number of rows written by check and put RPCs Shown as object
mapr.db.checkandput_rpcs (count)	The number of MapR Database check and put RPCs completed Shown as operation
mapr.db.flushes (count)	The number of flushes that reorganize data from bucket files (unsorted data) to spill files (sorted data) when the bucket size exceeds a threshold. Shown as flush
mapr.db.forceflushes (count)	The number of flushes that reorganize data from bucket files (unsorted data) to spill files (sorted data) when the in-memory bucket file cache fills up. Shown as flush
mapr.db.fullcompacts (count)	The number of compactions that combine multiple MapR Database data files containing sorted data (known as spills) into a single spill file. Shown as operation
mapr.db.get_bytes (count)	The number of bytes read by get RPCs Shown as byte
mapr.db.get_currpcs (gauge)	The number of MapR Database get RPCs in progress Shown as operation
mapr.db.get_readrows (count)	The number of rows read by get RPCs Shown as object
mapr.db.get_resprows (count)	The number of rows returned from get RPCs Shown as object
mapr.db.get_rpcs (count)	The number of MapR database get RPCs completed Shown as operation
mapr.db.increment_bytes (count)	The number of bytes written by increment RPCs Shown as byte
mapr.db.increment_rpcrows (count)	The number of rows written by increment RPCs Shown as object
mapr.db.increment_rpcs (count)	The number of MapR Database increment RPCs completed Shown as operation
mapr.db.index.pending_bytes (gauge)	The number of bytes of secondary index data remaining to be sent Shown as byte
mapr.db.minicompacts (count)	The number of compactions that combine multiple small data files containing sorted data (known as spills) into a single spill file. Shown as operation
mapr.db.put_bytes (count)	The number of bytes written by put RPCs Shown as byte
mapr.db.put_currpcs (gauge)	The number of MapR Database put RPCs in progress Shown as operation
mapr.db.put_readrows (count)	The number of rows read by put RPCs Shown as object
mapr.db.put_rpcrows (count)	The number of rows written by put RPCs. Each MapR Database put RPC can include multiple put rows. Shown as object
mapr.db.put_rpcs (count)	The number of MapR Database put RPCs completed Shown as operation
mapr.db.repl.pending_bytes (gauge)	The number of bytes of replication data remaining to be sent Shown as byte
mapr.db.repl.sent_bytes (count)	The number of bytes sent to replicate data Shown as byte
mapr.db.scan_bytes (count)	The number of bytes read by scan RPCs Shown as byte
mapr.db.scan_currpcs (gauge)	The number of MapR Database scan RPCs in progress Shown as operation
mapr.db.scan_readrows (count)	The number of rows read by scan RPCs Shown as object
mapr.db.scan_resprows (count)	The number of rows returned from scan RPCs. Shown as object
mapr.db.scan_rpcs (count)	The number of MapR Database scan RPCs completed Shown as operation
mapr.db.table.latency (gauge)	The latency of RPC operations on tables,represented as a histogram. Endpoints identify histogram bucket boundaries. Shown as millisecond
mapr.db.table.read_bytes (count)	The number of bytes read from tables Shown as byte
mapr.db.table.read_rows (count)	The number of rows read from tables Shown as object
mapr.db.table.resp_rows (count)	The number of rows returned from tables Shown as object
mapr.db.table.rpcs (count)	The number of RPC calls completed on tables Shown as operation
mapr.db.table.value_cache_hits (count)	The number of MapR Database operations on tables that utilized the MapR Database value cache Shown as operation
mapr.db.table.value_cache_lookups (count)	The number of MapR Database operations on tables that performed a lookup on the MapR Database value cache Shown as operation
mapr.db.table.write_bytes (count)	The number of bytes written to tables Shown as byte
mapr.db.table.write_rows (count)	The number of rows written to tables Shown as object
mapr.db.ttlcompacts (count)	The number of compactions that result in reclamation of disk space due to removal of stale data. Shown as operation
mapr.db.updateandget_bytes (count)	The number of bytes written by update and get RPCs Shown as byte
mapr.db.updateandget_rpcrows (count)	The number of rows written by update and get RPCs Shown as object
mapr.db.updateandget_rpcs (count)	The number of MapR Database update and get RPCs completed Shown as operation
mapr.db.valuecache_hits (count)	The number of MapR Database operations that utilized the MapR Database value cache Shown as operation
mapr.db.valuecache_lookups (count)	The number of MapR Database operations that performed a lookup on the MapR Database value cache Shown as operation
mapr.db.valuecache_usedSize (gauge)	The MapR Database value cache size in MB Shown as mebibyte
mapr.drill.allocator_root_peak (gauge)	The peak amount of memory used in bytes by the internal memory allocator. Shown as byte
mapr.drill.allocator_root_used (gauge)	The amount of memory used in bytes by the internal memory allocator. Shown as byte
mapr.drill.blocked_count (gauge)	The number of threads that are blocked because they are waiting for a monitor lock. Shown as thread
mapr.drill.count (gauge)	The number of live threads (including both daemon and non-daemon threads). Shown as thread
mapr.drill.fd_usage (gauge)	The ratio of used to total file descriptors.
mapr.drill.fragments_running (gauge)	The number of query fragments currently running in the drillbit. Shown as byte
mapr.drill.heap_used (gauge)	The amount of heap memory used in bytes by the JVM. Shown as byte
mapr.drill.non_heap_used (gauge)	The amount of non-heap memory used in bytes by the JVM. Shown as byte
mapr.drill.queries_completed (count)	The number of completed, canceled or failed queries for which this drillbit is the foreman. Shown as byte
mapr.drill.queries_running (gauge)	The number of running queries for which this drillbit is the foreman. Shown as byte
mapr.drill.runnable_count (gauge)	The number of threads executing in the JVM. Shown as thread
mapr.drill.waiting_count (gauge)	The number of threads that are waiting to be executed. This can occur when a thread must wait for another thread to perform an action before proceeding. Shown as thread
mapr.fs.bulk_writes (count)	The number of bulk-write operations. Bulk-write operations occur when the MapR filesystem container master aggregates multiple file writes from one or more clients into one RPC before replicating the writes. Shown as write
mapr.fs.bulk_writesbytes (count)	The number of bytes written by bulk-write operations. Bulk-write operations occur when the MapR filesystem container master aggregates multiple file writes from one or more clients into one RPC before replicating the writes. Shown as byte
mapr.fs.kvstore_delete (count)	The number of delete operations on key-value store files which are used by the CLDB and MapR database. Shown as operation
mapr.fs.kvstore_insert (count)	The number of insert operations on key-value store files which are used by the CLDB and MapR database. Shown as operation
mapr.fs.kvstore_lookup (count)	The number of lookup operations on key-value store files which are used by the CLDB and MapR database. Shown as operation
mapr.fs.kvstore_scan (count)	The number of scan operations on key-value store files which are used by the CLDB and MapR database. Shown as operation
mapr.fs.local_readbytes (count)	The number of bytes read by applications that are running on the MapR filesystem node. Shown as byte
mapr.fs.local_reads (count)	The number of file read operations by applications that are running on the MapR filesystem node. Shown as read
mapr.fs.local_writebytes (count)	The number of bytes written by applications that are running on the MapR filesystem node. Shown as byte
mapr.fs.local_writes (count)	The number of file write operations by applications that are running on the MapR filesystem node. Shown as operation
mapr.fs.read_bytes (count)	The amount of data read remotely in MB. Shown as mebibyte
mapr.fs.read_cachehits (count)	The number of cache hits for file reads. This value includes pages that the MapR filesystem populates using readahead mechanism. Shown as hit
mapr.fs.read_cachemisses (count)	The number of cache misses for file read operations. Shown as miss
mapr.fs.reads (count)	The number of remote reads. Shown as read
mapr.fs.statstype_create (count)	The number of file create operations. Shown as operation
mapr.fs.statstype_lookup (count)	The number of lookup operations. Shown as operation
mapr.fs.statstype_read (count)	The number of file read operations. Shown as read
mapr.fs.statstype_write (count)	The number of file write operations. Shown as write
mapr.fs.write_bytes (count)	The amount of data written remotely in MB. Shown as mebibyte
mapr.fs.writes (count)	The number of remote writes. Shown as write
mapr.io.read_bytes (gauge)	The number of MB read from disk. Shown as mebibyte
mapr.io.reads (gauge)	The number of MapR Filesystem disk read operations. Shown as read
mapr.io.write_bytes (count)	The number of MB written to disk. Shown as mebibyte
mapr.io.writes (count)	The number of MapR Filesystem disk write operations. Shown as write
mapr.metrics.submitted (gauge)	Number of metrics submitted every check run.
mapr.process.context_switch_involuntary (count)	The number of involuntary context switches for MapR processes. Shown as operation
mapr.process.context_switch_voluntary (count)	The number of voluntary context switches for MapR processes. Shown as process
mapr.process.cpu_percent (gauge)	The percentage of CPU used for MapR processes. Shown as percent
mapr.process.cpu_time.syst (count)	The amount of time measured in seconds that the process has been in kernel mode. Shown as second
mapr.process.cpu_time.user (count)	The amount of time measured in seconds that the process has been in user mode Shown as second
mapr.process.data (gauge)	The amount memory in MB used by the data segments of MapR processes. Shown as mebibyte
mapr.process.disk_octets.read (count)	The number of bytes read from disk for MapR processes. Shown as byte
mapr.process.disk_octets.write (count)	The number of bytes written to disk for MapR processes. Shown as byte
mapr.process.disk_ops.read (count)	The number of read operations for MapR processes. Shown as read
mapr.process.disk_ops.write (count)	The number of write operations for MapR processes. Shown as write
mapr.process.mem_percent (gauge)	The percentage of total system memory (not capped by MapR processes) used for MapR processes. Shown as percent
mapr.process.page_faults.majflt (count)	The number of major MapR process faults that required loading a memory page from disk. Shown as error
mapr.process.page_faults.minflt (count)	The number of minor MapR process faults that required loading a memory page from disk. Shown as error
mapr.process.rss (gauge)	The actual amount of memory in MB used by MapR processes. Shown as mebibyte
mapr.process.vm (gauge)	The amount of virtual memory in MB used by MapR processes. Shown as mebibyte
mapr.rpc.bytes_recd (count)	The number of bytes received by the MapR Filesystem over RPC. Shown as byte
mapr.rpc.bytes_sent (count)	The number of bytes sent by the MapR filesystem over RPC. Shown as byte
mapr.rpc.calls_recd (count)	The number of RPC calls received by the MapR filesystem. Shown as message
mapr.streams.listen_bytes (count)	The number of megabytes consumed by Streams messages. Shown as mebibyte
mapr.streams.listen_currpcs (gauge)	The number of concurrent Stream consumer RPCs. Shown as object
mapr.streams.listen_msgs (count)	The number of Streams messages read by the consumer. Shown as object
mapr.streams.listen_rpcs (count)	The number of Streams consumer RPCs. Shown as object
mapr.streams.produce_bytes (count)	The number of megabytes produced by Streams messages. Shown as mebibyte
mapr.streams.produce_msgs (count)	The number of Streams messages produced. Shown as object
mapr.streams.produce_rpcs (count)	The number of Streams producer RPCs. Shown as object
mapr.topology.disks_total_capacity (gauge)	The disk capacity in gigabytes. Shown as gibibyte
mapr.topology.disks_used_capacity (gauge)	The amount disk space used in gigabytes. Shown as gibibyte
mapr.topology.utilization (gauge)	The aggregate percentage of CPU utilization. Shown as percent
mapr.volmetrics.read_latency (gauge)	The per volume read latency in milliseconds Shown as millisecond
mapr.volmetrics.read_ops (count)	A count of the read operations per volume Shown as operation
mapr.volmetrics.read_throughput (gauge)	The per volume read throughput in KB Shown as kibibyte
mapr.volmetrics.write_latency (gauge)	The per volume write latency in milliseconds Shown as millisecond
mapr.volmetrics.write_ops (count)	A count of the write operations per volume Shown as operation
mapr.volmetrics.write_throughput (gauge)	The per volume write throughput in KB Shown as kibibyte
mapr.volume.logical_used (gauge)	The number of MBs used for logical volumes before compression is applied to the files. Shown as mebibyte
mapr.volume.quota (gauge)	The number of megabytes(MB) used for volume quota. Shown as mebibyte
mapr.volume.snapshot_used (gauge)	The number of MBs used for snapshots. Shown as mebibyte
mapr.volume.total_used (gauge)	The number of MB used for volumes and snapshots. Shown as mebibyte
mapr.volume.used (gauge)	The number of MB used for volumes after compression is applied to the files. Shown as mebibyte

이벤트

MapR 점검은 이벤트를 포함하지 않습니다.

서비스 점검

mapr.can_connect

Returns CRITICAL if the Agent fails to connect and subscribe to the stream topic, OK otherwise.

Statuses: ok, critical

트러블슈팅

MapR 통합 설정 후 에이전트가 크래시 루프 상태입니다.
권한 문제로 인해 mapr-streams-python 내의 C 라이브러리 세그먼트 오류가 발생하는 경우가 몇 건 있었습니다. dd-agent 사용자에게 티켓 파일 읽기 권한이 있는지, dd-agent 사용자가 MAPR_TICKETFILE_LOCATION 환경 변수가 티켓을 포인팅할 때 maprcli 명령을 실행할 수 있는지 확인하세요.
통합은 정상 작동하는데 메트릭을 전송하지 않습니다.
통합은 토픽에서 데이터를 가져오고 MapR은 해당 토픽에 데이터를 푸시해야 하므로 에이전트를 최소 몇 분 동안 실행해야 합니다. 해당 방법이 효과가 없지만 sudo으로 에이전트를 수동 실행하면 데이터가 표시되는 경우 권한에 문제가 있는 것입니다. 모든 사항을 두 번 점검합니다. dd-agent Linux 사용자는 로컬에 저장된 티켓을 사용할 수 있어야 하며, 사용자 X( dd-agent 자체일 수도 있고 아닐 수도 있음)로서 MapR에 대해 쿼리를 실행할 수 있어야 합니다. 또한 사용자 X는 /var/mapr/mapr.monitoring/metricstreams 스트림에 대한 consume 권한이 있어야 합니다.
confluent_kafka was not imported correctly ... 메시지가 표시됩니다.
에이전트 임베디드 환경이 import confluent_kafka 명령을 실행할 수 없습니다. 이는 임베디드 환경 내에 _mapr-streams-library_가 설치되지 않았거나 mapr-core 라이브러리를 찾을 수 없음을 뜻합니다. 오류 메시지에 자세한 내용이 표시됩니다.

도움이 더 필요하신가요? Datadog 지원 팀에 문의하세요.