ignite

Supported OS Linux Windows Mac OS

통합 버전2.4.0

개요

이 점검에서는 Ignite를 모니터링합니다.

설정

설치

Ignite 점검은 Datadog 에이전트 패키지에 포함되어 있습니다. 서버에 추가 설치를 할 필요가 없습니다.

구성

Ignite 설정

기본적으로 JMX 메트릭 내보내기가 활성화되어 있지만 노출할 포트를 선택해야 하거나 내 네트워크 보안에 따라 인증을 활성화해야 합니다. 공식 docker 이미지에서는 기본적으로 49112를 사용합니다.

로깅하려면 log4j를 활성화해 전체 날짜가 있는 로그를 사용해 혜택을 극대화하는 것이 좋습니다.

호스트

호스트에서 실행 중인 에이전트에 이 점검을 구성하는 방법:

  1. 에이전트 구성 디렉터리의 루트에 있는 conf.d/ 폴더에서 ignite.d/conf.yaml 파일을 편집해 ignite 성능 데이터 수집을 시작하세요. 사용할 수 있는 설정 옵션 전체를 보려면 샘플 ignite.d/conf.yaml을 참고하세요.

    이 점검의 제한 값은 인스턴스당 메트릭 350개입니다. 반환된 메트릭 개수는 상태 출력에 표시됩니다. 아래 구성을 편집해 관심 있는 메트릭을 지정할 수 있습니다. 수집할 메트릭을 사용자 지정하는 방법을 배우려면 JMX 점검 설명서를 참고하세요. 더 많은 메트릭을 모니터링해야 하는 경우 Datadog 지원팀에 문의하세요.

  2. 에이전트를 재시작하세요.

로그 수집

에이전트 버전 > 6.0에서 사용 가능

  1. Datadog 에이전트에서는 로그 수집이 기본적으로 비활성화되어 있습니다. datadog.yaml 파일에서 활성화해야 합니다.

    logs_enabled: true
    
  2. Ignote 로그 수집을 시작하려면 이 구성 블록을 ignite.d/conf.yaml 파일에 추가하세요.

      logs:
        - type: file
          path: <IGNITE_HOME>/work/log/ignite-*.log
          source: ignite
          service: '<SERVICE_NAME>'
          log_processing_rules:
            - type: multi_line
              name: new_log_start_with_date
              pattern: \[\d{4}\-\d{2}\-\d{2}
    

    pathservice 파라미터 값을 내 환경에 맞게 변경하세요. 사용할 수 있는 구성 옵션 전체를 보려면 샘플 ignite.d/conf.yaml을 참고하세요.

  3. 에이전트를 재시작하세요.

컨테이너화된 환경

컨테이너화된 환경의 경우 자동탐지 통합 템플릿에 다음 파라미터를 적용하는 방법이 안내되어 있습니다.

메트릭 수집

Datadog-Ignite 통합으로 메트릭을 수집하려면 JMX로 자동탐지 가이드를 참고하세요.

로그 수집

에이전트 버전 > 6.0에서 사용 가능

기본적으로 로그 수집은 Datadog 에이전트에서 비활성화되어 있습니다. 활성화하려면 Docker 로그 수집을 참고하세요.

파라미터
<LOG_CONFIG>{"source": "ignite", "service": "<SERVICE_NAME>", "log_processing_rules":{"type":"multi_line","name":"new_log_start_with_date", "pattern":"\d{4}\-\d{2}\-\d{2}"}}

검증

에이전트의 status 하위 명령을 실행하고 Checks 섹션 아래에서 ignite를 찾으세요.

수집한 데이터

메트릭

ignite.active_baseline_nodes
(gauge)
Active baseline nodes count.
Shown as node
ignite.allocation_rate
(gauge)
Allocation rate (pages per second) averaged across rateTimeInternal.
Shown as page
ignite.average_cpu_load
(gauge)
Average of CPU load values over all metrics kept in the history.
ignite.busy_time_percentage
(gauge)
Percentage of time this node is busy executing jobs vs. idling.
Shown as percent
ignite.cache.average_commit_time
(gauge)
Average time to commit transaction.
Shown as microsecond
ignite.cache.average_get_time
(gauge)
Average time to execute get.
Shown as microsecond
ignite.cache.average_put_time
(gauge)
Average time to execute put.
Shown as microsecond
ignite.cache.average_remove_time
(gauge)
Average time to execute remove.
Shown as microsecond
ignite.cache.average_rollback_time
(gauge)
Average time to rollback transaction.
Shown as microsecond
ignite.cache.backups
(gauge)
Count of backups configured for cache group.
ignite.cache.cluster_moving_partitions
(gauge)
Count of partitions for this cache group in the entire cluster with state MOVING.
ignite.cache.cluster_owning_partitions
(gauge)
Count of partitions for this cache group in the entire cluster with state OWNING.
ignite.cache.commit_queue_size
(gauge)
Transaction committed queue size.
Shown as transaction
ignite.cache.commits
(rate)
Number of transaction commits.
ignite.cache.committed_versions_size
(gauge)
Transaction committed ID map size.
Shown as transaction
ignite.cache.dht_commit_queue_size
(gauge)
Transaction DHT committed queue size.
Shown as transaction
ignite.cache.dht_committed_versions_size
(gauge)
Transaction DHT committed ID map size.
Shown as transaction
ignite.cache.dht_prepare_queue_size
(gauge)
Transaction DHT prepared queue size.
Shown as transaction
ignite.cache.dht_rolledback_versions_size
(gauge)
Transaction DHT rolled back ID map size.
Shown as transaction
ignite.cache.dht_start_version_counts_size
(gauge)
Transaction DHT start version counts map size.
Shown as transaction
ignite.cache.dht_thread_map_size
(gauge)
Transaction DHT per-thread map size.
Shown as transaction
ignite.cache.dht_xid_map_size
(gauge)
Transaction DHT per-Xid map size.
Shown as transaction
ignite.cache.entry_processor.average_invocation_time
(gauge)
The mean time to execute cache invokes.
Shown as microsecond
ignite.cache.entry_processor.hit_percentage
(gauge)
The percentage of invocations on keys, which exist in cache.
Shown as percent
ignite.cache.entry_processor.hits
(rate)
The total number of invocations on keys, which exist in cache.
ignite.cache.entry_processor.invocations
(rate)
The total number of cache invocations.
ignite.cache.entry_processor.maximum_invocation_time
(gauge)
So far, the maximum time to execute cache invokes.
Shown as microsecond
ignite.cache.entry_processor.minimum_invocation_time
(gauge)
So far, the minimum time to execute cache invokes.
Shown as microsecond
ignite.cache.entry_processor.miss_percentage
(gauge)
The percentage of invocations on keys, which don't exist in cache.
Shown as percent
ignite.cache.entry_processor.misses
(rate)
The total number of invocations on keys, which don't exist in cache.
ignite.cache.entry_processor.puts
(rate)
The total number of cache invocations, caused update.
ignite.cache.entry_processor.read_only_invocations
(rate)
The total number of cache invocations, caused no updates.
ignite.cache.entry_processor.removals
(rate)
The total number of cache invocations, caused removals.
ignite.cache.estimated_rebalancing_keys
(gauge)
Number estimated to rebalance keys.
Shown as key
ignite.cache.evict_queue_size
(gauge)
Current size of evict queue.
ignite.cache.evictions
(rate)
Number of eviction entries.
Shown as eviction
ignite.cache.gets
(rate)
The total number of gets to the cache.
Shown as request
ignite.cache.heap_entries
(gauge)
Number of entries in heap memory.
Shown as entry
ignite.cache.hit_percentage
(gauge)
Percentage of successful hits.
Shown as percent
ignite.cache.hits
(rate)
The number of get requests that were satisfied by the cache.
Shown as request
ignite.cache.keys_to_rebalance
(gauge)
Estimated number of keys to be rebalanced on current node.
Shown as key
ignite.cache.local_moving_partitions
(gauge)
Count of partitions with state MOVING for this cache group located on this node.
ignite.cache.local_owning_partitions
(gauge)
Count of partitions with state OWNING for this cache group located on this node.
ignite.cache.local_renting_entries
(gauge)
Count of entries remains to evict in RENTING partitions located on this node for this cache group.
ignite.cache.local_renting_partitions
(gauge)
Count of partitions with state RENTING for this cache group located on this node.
ignite.cache.maximum_partition_copies
(gauge)
Maximum number of partition copies for all partitions of this cache group.
ignite.cache.minimum_partition_copies
(gauge)
Minimum number of partition copies for all partitions of this cache group.
ignite.cache.miss_percentage
(gauge)
Percentage of accesses that failed to find anything.
Shown as percent
ignite.cache.misses
(rate)
A miss is a get request that is not satisfied.
Shown as request
ignite.cache.offheap_allocated_size
(gauge)
Memory size allocated in off-heap.
Shown as byte
ignite.cache.offheap_backup_entries
(gauge)
Number of backup stored in off-heap memory.
ignite.cache.offheap_entries
(gauge)
Number of entries stored in off-heap memory.
Shown as entry
ignite.cache.offheap_evictions
(rate)
Number of evictions from off-heap memory.
Shown as eviction
ignite.cache.offheap_gets
(rate)
Number of gets from off-heap memory.
ignite.cache.offheap_hit_percentage
(gauge)
Percentage of hits on off-heap memory.
Shown as percent
ignite.cache.offheap_hits
(rate)
Number of hits on off-heap memory.
Shown as hit
ignite.cache.offheap_miss_percentage
(gauge)
Percentage of misses on off-heap memory.
Shown as percent
ignite.cache.offheap_misses
(rate)
Number of misses on off-heap memory.
Shown as miss
ignite.cache.offheap_primary_entries
(gauge)
Number of primary entries stored in off-heap memory.
Shown as entry
ignite.cache.offheap_puts
(rate)
Number of puts to off-heap memory.
ignite.cache.offheap_removals
(rate)
Number of removed entries from off-heap memory.
ignite.cache.partitions
(gauge)
Count of partitions for cache group.
ignite.cache.prepare_queue_size
(gauge)
Transaction prepared queue size.
Shown as transaction
ignite.cache.puts
(rate)
The total number of puts to the cache.
Shown as request
ignite.cache.rebalance_clearing_partitions
(gauge)
Number of partitions need to be cleared before actual rebalance start.
ignite.cache.rebalanced_keys
(gauge)
Number of already rebalanced keys.
Shown as key
ignite.cache.rebalancing_bytes_rate
(gauge)
Estimated rebalancing speed in bytes.
Shown as byte
ignite.cache.rebalancing_keys_rate
(gauge)
Estimated rebalancing speed in keys.
Shown as operation
ignite.cache.rebalancing_partitions
(gauge)
Number of currently rebalancing partitions on current node.
ignite.cache.removals
(rate)
The total number of removals from the cache.
ignite.cache.rollbacks
(rate)
Number of transaction rollback.
ignite.cache.rolledback_versions_size
(gauge)
Transaction rolled back ID map size.
Shown as transaction
ignite.cache.size
(gauge)
Number of non-null values in the cache as a long value.
ignite.cache.start_version_counts_size
(gauge)
Transaction start version counts map size.
Shown as transaction
ignite.cache.thread_map_size
(gauge)
Transaction per-thread map size.
Shown as transaction
ignite.cache.total_partitions
(gauge)
Total number of partitions on current node.
ignite.cache.write_behind_buffer_size
(gauge)
Count of cache entries that are waiting to be flushed.
ignite.cache.write_behind_overflow
(gauge)
Count of write buffer overflow events in progress at the moment.
Shown as event
ignite.cache.write_behind_overflow_total
(rate)
Count of cache overflow events since write-behind cache has started.
Shown as event
ignite.cache.write_behind_retries
(gauge)
Count of cache entries that are currently in retry state.
ignite.cache.write_behind_store_batch_size
(gauge)
Maximum size of batch for similar operations.
ignite.cache.xid_map_size
(gauge)
Transaction per-Xid map size.
Shown as transaction
ignite.check_point_buffer_size
(gauge)
Total size in bytes for checkpoint buffer.
Shown as byte
ignite.checkpoint.last_copied_on_write_pages
(gauge)
Number of pages copied to a temporary checkpoint buffer during the last checkpoint.
Shown as page
ignite.checkpoint.last_data_pages
(gauge)
Total number of data pages written during the last checkpoint.
Shown as page
ignite.checkpoint.last_duration
(gauge)
Duration of the last checkpoint in milliseconds.
Shown as second
ignite.checkpoint.last_fsync_duration
(gauge)
Duration of the sync phase of the last checkpoint in milliseconds.
Shown as millisecond
ignite.checkpoint.last_lock_wait_duration
(gauge)
Duration of the checkpoint lock wait in milliseconds.
Shown as millisecond
ignite.checkpoint.last_mark_duration
(gauge)
Duration of the checkpoint mark in milliseconds.
Shown as millisecond
ignite.checkpoint.last_pages_write_duration
(gauge)
Duration of the checkpoint pages write in milliseconds.
Shown as millisecond
ignite.checkpoint.last_total_pages
(gauge)
Total number of pages written during the last checkpoint.
Shown as page
ignite.checkpoint.total_time
(gauge)
Total checkpoint time from last restart.
Shown as second
ignite.current_cpu_load
(gauge)
The system load average; or a negative value if not available.
Shown as byte
ignite.current_daemon_thread_count
(gauge)
Current number of live daemon threads.
Shown as thread
ignite.current_gc_load
(gauge)
Average time spent in GC since the last update.
Shown as time
ignite.current_idle_time
(gauge)
Time this node spend idling since executing last job.
Shown as second
ignite.current_thread_count
(gauge)
Current number of live threads.
Shown as thread
ignite.dirty_pages
(gauge)
Number of pages in memory not yet synchronized with persistent storage.
Shown as page
ignite.discovery.average_message_processing_time
(gauge)
Avg message processing time.
Shown as second
ignite.discovery.max_message_processing_time
(gauge)
Max message processing time.
Shown as second
ignite.discovery.message_worker_queue_size
(gauge)
Message worker queue current size.
ignite.discovery.nodes_failed
(rate)
Nodes failed count.
Shown as node
ignite.discovery.nodes_joined
(rate)
Nodes joined count.
Shown as node
ignite.discovery.nodes_left
(rate)
Nodes left count.
Shown as node
ignite.discovery.pending_messages_discarded
(gauge)
Pending messages discarded.
Shown as message
ignite.discovery.pending_messages_registered
(gauge)
Pending messages registered.
Shown as message
ignite.discovery.total_processed_messages
(rate)
Total processed messages count.
Shown as message
ignite.discovery.total_received_messages
(rate)
Total received messages count.
Shown as message
ignite.eviction_rate
(gauge)
Eviction rate (pages per second).
Shown as page
ignite.heap_memory_committed
(gauge)
The amount of committed memory in bytes.
Shown as byte
ignite.heap_memory_initialized
(gauge)
The initial size of memory in bytes; -1 if undefined.
Shown as byte
ignite.heap_memory_maximum
(gauge)
The maximum amount of memory in bytes; -1 if undefined.
Shown as byte
ignite.heap_memory_total
(gauge)
The total amount of memory in bytes; -1 if undefined.
Shown as byte
ignite.heap_memory_used
(gauge)
Current heap size that is used for object allocation.
Shown as byte
ignite.idle_time_percentage
(gauge)
Percentage of time this node is idling vs. executing jobs.
Shown as percent
ignite.initial_memory_size
(gauge)
Initial memory region size defined by its data region.
Shown as byte
ignite.jobs.active.average
(gauge)
Average number of active jobs concurrently executing on the node.
Shown as job
ignite.jobs.active.current
(gauge)
Number of currently active jobs concurrently executing on the node.
Shown as job
ignite.jobs.active.maximum
(gauge)
Maximum number of jobs that ever ran concurrently on this node.
Shown as job
ignite.jobs.cancelled.average
(gauge)
Average number of cancelled jobs this node ever had running concurrently.
Shown as job
ignite.jobs.cancelled.current
(gauge)
Number of cancelled jobs that are still running.
Shown as job
ignite.jobs.cancelled.maximum
(gauge)
Maximum number of cancelled jobs this node ever had running concurrently.
Shown as job
ignite.jobs.cancelled.total
(rate)
Total number of cancelled jobs since node startup.
Shown as job
ignite.jobs.execute_time.average
(gauge)
Average time a job takes to execute on the node.
Shown as second
ignite.jobs.execute_time.current
(gauge)
Longest time a current job has been executing for.
Shown as second
ignite.jobs.execute_time.maximum
(gauge)
Time it took to execute the longest job on the node.
Shown as second
ignite.jobs.executed.total
(rate)
Total number of jobs handled by the node.
Shown as job
ignite.jobs.execution_time.total
(rate)
Total time all finished jobs takes to execute on the node.
Shown as second
ignite.jobs.maximum_failover
(gauge)
Maximum number of attempts to execute a failed job on another node.
Shown as attempt
ignite.jobs.rejected.average
(gauge)
Average number of jobs this node rejects during collision resolution operations.
Shown as job
ignite.jobs.rejected.current
(gauge)
Number of jobs rejected after more recent collision resolution operation.
Shown as job
ignite.jobs.rejected.maximum
(gauge)
Maximum number of jobs rejected at once during a single collision resolution operation.
Shown as job
ignite.jobs.rejected.total
(rate)
Total number of jobs this node rejects during collision resolution operations since node startup.
Shown as job
ignite.jobs.total_failover
(rate)
Total number of jobs that were failed over.
Shown as job
ignite.jobs.wait_time.average
(gauge)
Average time jobs spend waiting in the queue to be executed.
Shown as second
ignite.jobs.wait_time.current
(gauge)
Current wait time of oldest job.
Shown as second
ignite.jobs.wait_time.maximum
(gauge)
Maximum time a job ever spent waiting in a queue to be executed.
Shown as second
ignite.jobs.waiting.average
(gauge)
Average number of waiting jobs this node had queued.
Shown as job
ignite.jobs.waiting.current
(gauge)
Number of queued jobs currently waiting to be executed.
Shown as job
ignite.jobs.waiting.maximum
(gauge)
Maximum number of waiting jobs this node had.
Shown as job
ignite.large_entries_pages_percentage
(gauge)
Percentage of pages that are fully occupied by large entries that go beyond page size.
Shown as percent
ignite.max_memory_size
(gauge)
Maximum memory region size defined by its data region.
Shown as byte
ignite.maximum_thread_count
(gauge)
The peak live thread count.
Shown as thread
ignite.non_heap_memory_committed
(gauge)
Amount of non-heap memory in bytes that is committed for the JVM to use.
Shown as byte
ignite.non_heap_memory_initialized
(gauge)
The initial size of non-heap memory in bytes; -1 if undefined.
Shown as byte
ignite.non_heap_memory_maximum
(gauge)
Maximum amount of non-heap memory in bytes that can be used for memory management. -1 if undefined.
Shown as byte
ignite.non_heap_memory_total
(gauge)
Total amount of non-heap memory in bytes that can be used for memory management. -1 if undefined.
Shown as byte
ignite.non_heap_memory_used
(gauge)
Current non-heap memory size that is used by Java VM.
Shown as byte
ignite.offheap_size
(gauge)
Offheap size in bytes.
Shown as byte
ignite.offheap_used_size
(gauge)
Total used offheap size in bytes.
Shown as byte
ignite.oubound_messages_queue_size
(gauge)
Outbound messages queue size.
Shown as message
ignite.pages_fill_factor
(gauge)
The percentage of the used space.
Shown as percent
ignite.pages_read
(rate)
Number of pages read from last restart.
Shown as page
ignite.pages_replace_age
(gauge)
Average age at which pages in memory are replaced with pages from persistent storage (milliseconds).
Shown as page
ignite.pages_replace_rate
(gauge)
Rate at which pages in memory are replaced with pages from persistent storage (pages per second).
Shown as page
ignite.pages_replaced
(rate)
Number of pages replaced from last restart.
Shown as page
ignite.pages_written
(rate)
Number of pages written from last restart.
Shown as page
ignite.physical_memory_pages
(gauge)
Number of pages residing in physical RAM.
Shown as page
ignite.received_bytes
(rate)
Received bytes count.
Shown as byte
ignite.received_messages
(rate)
Received messages count.
Shown as message
ignite.sent_bytes
(rate)
Sent bytes count.
Shown as byte
ignite.sent_messages
(rate)
Sent messages count.
Shown as message
ignite.threads.active
(gauge)
Approximate number of threads that are actively executing tasks.
Shown as thread
ignite.threads.completed_tasks
(rate)
Approximate total number of tasks that have completed execution.
Shown as task
ignite.threads.core_pool_size
(gauge)
The core number of threads.
Shown as thread
ignite.threads.largest_size
(gauge)
Largest number of threads that have ever simultaneously been in the pool.
Shown as thread
ignite.threads.maximum_pool_size
(gauge)
The maximum allowed number of threads.
Shown as thread
ignite.threads.pool_size
(gauge)
Current number of threads in the pool.
Shown as thread
ignite.threads.queue_size
(gauge)
Current number of threads in the pool
Shown as thread
ignite.threads.tasks
(rate)
Approximate total number of tasks that have been scheduled for execution.
Shown as task
ignite.total_allocated_pages
(gauge)
Total number of allocated pages.
Shown as page
ignite.total_allocated_size
(gauge)
Total size of memory allocated in bytes.
Shown as byte
ignite.total_baseline_nodes
(gauge)
Total baseline nodes count.
Shown as node
ignite.total_busy_time
(gauge)
Total time this node spent executing jobs.
Shown as second
ignite.total_client_nodes
(gauge)
Client nodes count.
Shown as node
ignite.total_cpus
(gauge)
The number of CPUs available to the Java Virtual Machine.
Shown as core
ignite.total_executed_tasks
(rate)
Total number of tasks handled by the node.
Shown as task
ignite.total_idle_time
(gauge)
Total time this node spent idling (not executing any jobs).
Shown as second
ignite.total_nodes
(gauge)
Total number of nodes.
Shown as node
ignite.total_server_nodes
(gauge)
Server nodes count.
Shown as node
ignite.total_started_threads
(rate)
The total number of threads started.
Shown as thread
ignite.transaction.committed
(rate)
The number of transactions which were committed.
Shown as transaction
ignite.transaction.holding_lock
(gauge)
The number of active transactions holding at least one key lock.
Shown as transaction
ignite.transaction.locked_keys
(gauge)
The number of keys locked on the node.
Shown as key
ignite.transaction.owner
(gauge)
The number of active transactions for which this node is the initiator.
Shown as transaction
ignite.transaction.rolledback
(rate)
The number of transactions which were rollback.
Shown as transaction
ignite.used_checkpoint_buffer_pages
(gauge)
Used checkpoint buffer size in pages.
Shown as page
ignite.used_checkpoint_buffer_size
(gauge)
Used checkpoint buffer size in bytes.
Shown as byte
ignite.wal.archive_segments
(gauge)
Current number of WAL segments in the WAL archive.
Shown as segment
ignite.wal.buffer_poll_spin
(gauge)
WAL buffer poll spins number over the last time interval.
ignite.wal.fsync_average
(gauge)
Average WAL fsync duration in microseconds over the last time interval.
Shown as microsecond
ignite.wal.last_rollover
(gauge)
Time of the last WAL segment rollover.
Shown as second
ignite.wal.logging_rate
(gauge)
Average number of WAL records per second written during the last time interval.
Shown as record
ignite.wal.total_size
(gauge)
Total size in bytes for storage wal files.
Shown as byte
ignite.wal.writing_rate
(gauge)
Average number of bytes per second written during the last time interval.
Shown as byte

이벤트

Ignite 통합에는 이벤트가 포함되어 있지 않습니다.

서비스 점검

ignite.can_connect
Returns CRITICAL if the Agent is unable to connect to and collect metrics from the monitored Ignite instance, WARNING if no metrics are collected, and OK otherwise.
Statuses: ok, critical, warning

트러블슈팅

도움이 필요하신가요? Datadog 지원팀에 문의하세요.