The Service Map for APM is here!

Hbase Master

Agent Check Agent Check

Supported OS: Linux Mac OS Windows

Overview

Get metrics from Hbase_master service in real time to:

  • Visualize and monitor Hbase_master states.
  • Be notified about Hbase_master failovers and events.

Installation

To install the Hbase_master check on your host:

  1. Download the Datadog Agent.
  2. Create a hbase_master.d/ folder in the conf.d/ folder at the root of your Agent’s directory.
  3. Create a conf.yaml file in the hbase_master.d/ folder previously created.
  4. Consult the sample hbase_master.yaml file and copy its content in the conf.yaml file.
  5. Restart the Agent.

Configuration

To configure the Hbase_master check:

  1. Open the conf.yaml file created during installation.
  2. Edit the conf.yaml file to point to your server and port, set the masters to monitor.
  3. Restart the Agent.

Validation

Run the Agent’s status subcommand and look for hbase_master under the Checks section.

Data Collected

Metrics

hbase.master.assignmentmanager.rit_oldest_age
(gauge)
The age of the longest region in transition, in milliseconds
shown as millisecond
hbase.master.assignmentmanager.rit_count_over_threshold
(gauge)
The number of regions that have been in transition longer than a threshold time
hbase.master.assignmentmanager.rit_count
(gauge)
The number of regions in transition
hbase.master.assignmentmanager.assign.min
(gauge)
hbase.master.assignmentmanager.assign.max
(gauge)
hbase.master.assignmentmanager.assign.mean
(gauge)
hbase.master.assignmentmanager.assign.median
(gauge)
hbase.master.assignmentmanager.assign.percentile.99
(gauge)
hbase.master.ipc.queue_size
(gauge)
Number of bytes in the call queues.
shown as byte
hbase.master.ipc.num_calls_in_general_queue
(gauge)
Number of calls in the general call queue.
hbase.master.ipc.num_calls_in_replication_queue
(gauge)
Number of calls in the replication call queue.
hbase.master.ipc.num_calls_in_priority_queue
(gauge)
Number of calls in the priority call queue.
hbase.master.ipc.num_open_connections
(gauge)
Number of open connections.
hbase.master.ipc.num_active_handler
(gauge)
Number of active rpc handlers.
hbase.master.ipc.total_call_time.max
(gauge)
total call time, including both queued and processing time.
shown as millisecond
hbase.master.ipc.total_call_time.mean
(gauge)
total call time, including both queued and processing time.
shown as millisecond
hbase.master.ipc.total_call_time.median
(gauge)
total call time, including both queued and processing time.
shown as millisecond
hbase.master.ipc.total_call_time.percentile.99
(gauge)
total call time, including both queued and processing time.
shown as millisecond
hbase.master.server.tag.is_active_master
(gauge)
Is Active Master
hbase.master.server.num_region_servers
(gauge)
Number of RegionServers
hbase.master.server.num_dead_region_servers
(gauge)
Number of dead RegionServers

Events

The Hbase_master check does not include any events at this time.

Service Checks

The Hbase_master check does not include any service checks at this time.

Troubleshooting

Need help? Contact Datadog Support.


Mistake in the docs? Feel free to contribute!

Hbase_regionserver Integration

Overview

Get metrics from Hbase_regionserver service in real time to:

  • Visualize and monitor Hbase_regionserver states.
  • Be notified about Hbase_regionserver failovers and events.

Setup

The Hbase_regionserver check is NOT included in the Datadog Agent package.

Installation

To install the Hbase_regionserver check on your host:

  1. Download the Datadog Agent.
  2. Download the check.py file for Hbase_regionserver.
  3. Place it in the Agent’s checks.d directory.
  4. Rename it to hbase_regionserver.py.

Configuration

To configure the Hbase_regionserver check:

  1. Create a hbase_regionserver.d/ folder in the conf.d/ folder at the root of your Agent’s directory.
  2. Create a conf.yaml file in the hbase_regionserver.d/ folder previously created.
  3. Consult the sample hbase_regionserver.yaml file and copy its content in the conf.yaml file.
  4. Edit the conf.yaml file to point to your server and port, set the masters to monitor.
  5. Restart the Agent.

Validation

Run the Agent’s status subcommand and look for hbase_regionserver under the Checks section.

Data Collected

Metrics

hbase.regionserver.ipc.queue_size
(gauge)
Number of bytes in the call queues.
shown as byte
hbase.regionserver.ipc.num_open_connections
(gauge)
Number of open connections.
hbase.regionserver.ipc.num_active_handler
(gauge)
Number of active rpc handlers.
hbase.regionserver.ipc.total_call_time.max
(gauge)
total call time, including both queued and processing time.
shown as millisecond
hbase.regionserver.ipc.total_call_time.mean
(gauge)
total call time, including both queued and processing time.
shown as millisecond
hbase.regionserver.ipc.total_call_time.median
(gauge)
total call time, including both queued and processing time.
shown as millisecond
hbase.regionserver.ipc.total_call_time.percentile.99
(gauge)
total call time, including both queued and processing time.
shown as millisecond
hbase.regionserver.regions.num_regions
(gauge)
Number of regions in the metrics system
hbase.regionserver.replication.sink.applied_ops
(gauge)
Number of WAL entries applied on replication sink.
hbase.regionserver.replication.sink.age_of_last_applied_op
(gauge)
Replication time lag of last applied WAL entry between source and sink.
shown as millisecond
hbase.regionserver.replication.sink.applied_batches
(gauge)
Number of WAL applying operations processd on replication sink.
hbase.regionserver.server.region_count
(gauge)
Number of regions
hbase.regionserver.server.store_count
(gauge)
Number of Stores
hbase.regionserver.server.hlog_file_count
(gauge)
Number of WAL Files
hbase.regionserver.server.hlog_file_size
(gauge)
Size of all WAL Files
shown as byte
hbase.regionserver.server.store_file_count
(gauge)
Number of Store Files
hbase.regionserver.server.mem_store_size
(gauge)
Size of the memstore
shown as byte
hbase.regionserver.server.store_file_size
(gauge)
Size of storefiles being served.
shown as byte
hbase.regionserver.server.total_request_count
(gauge)
Total number of requests this RegionServer has answered.
hbase.regionserver.server.read_request_count
(gauge)
Number of read requests this region server has answered.
hbase.regionserver.server.write_request_count
(gauge)
Number of mutation requests this region server has answered.
hbase.regionserver.server.check_mutate_failed_count
(gauge)
Number of Check and Mutate calls that failed the checks.
hbase.regionserver.server.check_mutate_passed_count
(gauge)
Number of Check and Mutate calls that passed the checks.
hbase.regionserver.server.store_file_index_size
(gauge)
Size of indexes in storefiles on disk.
shown as byte
hbase.regionserver.server.static_index_size
(gauge)
Uncompressed size of the static indexes.
shown as byte
hbase.regionserver.server.static_bloom_size
(gauge)
Uncompressed size of the static bloom filters.
shown as byte
hbase.regionserver.server.mutations_without_wal_count
(count)
Number of mutations that have been sent by clients with the write ahead logging turned off.
hbase.regionserver.server.mutations_without_wal_size
(gauge)
Size of data that has been sent by clients with the write ahead logging turned off.
shown as byte
hbase.regionserver.server.percent_files_local
(gauge)
The percent of HFiles that are stored on the local hdfs data node.
shown as percent
hbase.regionserver.server.percent_files_local_secondary_regions
(gauge)
The percent of HFiles used by secondary regions that are stored on the local hdfs data node.
shown as percent
hbase.regionserver.server.split_queue_length
(gauge)
Length of the queue for splits.
hbase.regionserver.server.compaction_queue_length
(gauge)
Length of the queue for compactions.
hbase.regionserver.server.flush_queue_length
(gauge)
Length of the queue for region flushes
hbase.regionserver.server.block_cache_free_size
(gauge)
Size of the block cache that is not occupied.
shown as byte
hbase.regionserver.server.block_cache_count
(gauge)
Number of block in the block cache.
hbase.regionserver.server.block_cache_size
(gauge)
Size of the block cache.
shown as byte
hbase.regionserver.server.block_cache_hit_count
(gauge)
Count of the hit on the block cache.
hbase.regionserver.server.block_cache_hit_count_primary
(gauge)
Count of hit on primary replica in the block cache.
hbase.regionserver.server.block_cache_miss_count
(gauge)
Number of requests for a block that missed the block cache.
hbase.regionserver.server.block_cache_miss_count_primary
(gauge)
Number of requests for a block of primary replica that missed the block cache.
hbase.regionserver.server.block_cache_eviction_count
(gauge)
Count of the number of blocks evicted from the block cache.
hbase.regionserver.server.block_cache_eviction_count_primary
(gauge)
Count of the number of blocks evicted from primary replica in the block cache.
hbase.regionserver.server.block_cache_hit_percent
(gauge)
Percent of block cache requests that are hits
shown as percent
hbase.regionserver.server.block_cache_express_hit_percent
(gauge)
The percent of the time that requests with the cache turned on hit the cache.
shown as percent
hbase.regionserver.server.block_cache_failed_insertion_count
(gauge)
Number of times that a block cache insertion failed. Usually due to size restrictions.
shown as millisecond
hbase.regionserver.server.updates_blocked_time
(gauge)
Number of MS updates have been blocked so that the memstore can be flushed.
shown as millisecond
hbase.regionserver.server.flushed_cells_count
(gauge)
The number of cells flushed to disk
hbase.regionserver.server.compacted_cells_count
(gauge)
The number of cells processed during minor compactions
hbase.regionserver.server.major_compacted_cells_count
(gauge)
The number of cells processed during major compactions
hbase.regionserver.server.flushed_cells_size
(gauge)
The total amount of data flushed to disk, in bytes
shown as byte
hbase.regionserver.server.compacted_cells_size
(gauge)
The total amount of data processed during minor compactions, in bytes
shown as byte
hbase.regionserver.server.major_compacted_cells_size
(gauge)
The total amount of data processed during major compactions, in bytes
shown as byte
hbase.regionserver.server.blocked_request_count
(gauge)
The number of blocked requests because of memstore size is larger than blockingMemStoreSize
hbase.regionserver.server.hedged_read
(gauge)
hbase.regionserver.server.hedged_read_wins
(gauge)
hbase.regionserver.server.pause_time_with_gc_num_ops
(gauge)

shown as millisecond
hbase.regionserver.server.pause_time_with_gc.min
(gauge)

shown as millisecond
hbase.regionserver.server.pause_time_with_gc.max
(gauge)

shown as millisecond
hbase.regionserver.server.pause_time_with_gc.mean
(gauge)

shown as millisecond
hbase.regionserver.server.pause_time_with_gc.median
(gauge)

shown as millisecond
hbase.regionserver.server.pause_time_with_gc.percentile.99
(gauge)

shown as millisecond
hbase.regionserver.server.mutate.num_ops
(gauge)
hbase.regionserver.server.mutate.min
(gauge)
hbase.regionserver.server.mutate.max
(gauge)
hbase.regionserver.server.mutate.mean
(gauge)
hbase.regionserver.server.mutate.median
(gauge)
hbase.regionserver.server.mutate.percentile.99
(gauge)
hbase.regionserver.server.slow_append_count
(gauge)
The number of Appends that took over 1000ms to complete
hbase.regionserver.server.pause_warn_threshold_exceeded
(gauge)
hbase.regionserver.server.slow_delete_count
(gauge)
The number of Deletes that took over 1000ms to complete
hbase.regionserver.server.increment.num_ops
(gauge)
hbase.regionserver.server.increment.min
(gauge)
hbase.regionserver.server.increment.max
(gauge)
hbase.regionserver.server.increment.mean
(gauge)
hbase.regionserver.server.increment.median
(gauge)
hbase.regionserver.server.increment.percentile.99
(gauge)
hbase.regionserver.server.replay.num_ops
(gauge)
hbase.regionserver.server.replay.min
(gauge)
hbase.regionserver.server.replay.max
(gauge)
hbase.regionserver.server.replay.mean
(gauge)
hbase.regionserver.server.replay.median
(gauge)
hbase.regionserver.server.replay.percentile.99
(gauge)
hbase.regionserver.server.flush_time.num_ops
(gauge)

shown as millisecond
hbase.regionserver.server.flush_time.min
(gauge)

shown as millisecond
hbase.regionserver.server.flush_time.max
(gauge)

shown as millisecond
hbase.regionserver.server.flush_time.mean
(gauge)

shown as millisecond
hbase.regionserver.server.flush_time.median
(gauge)

shown as millisecond
hbase.regionserver.server.flush_time.percentile.99
(gauge)

shown as millisecond
hbase.regionserver.server.pause_info_threshold_exceeded
(gauge)
hbase.regionserver.server.delete.num_ops
(gauge)
hbase.regionserver.server.delete.min
(gauge)
hbase.regionserver.server.delete.max
(gauge)
hbase.regionserver.server.delete.mean
(gauge)
hbase.regionserver.server.delete.median
(gauge)
hbase.regionserver.server.delete.percentile.99
(gauge)
hbase.regionserver.server.split_request_count
(gauge)
Number of splits requested
hbase.regionserver.server.split_success_count
(gauge)
Number of successfully executed splits
hbase.regionserver.server.slow_get_count
(gauge)
The number of Gets that took over 1000ms to complete
hbase.regionserver.server.get.num_ops
(gauge)
hbase.regionserver.server.get.min
(gauge)
hbase.regionserver.server.get.max
(gauge)
hbase.regionserver.server.get.mean
(gauge)
hbase.regionserver.server.get.median
(gauge)
hbase.regionserver.server.get.percentile.99
(gauge)
hbase.regionserver.server.scan_next.num_ops
(gauge)
hbase.regionserver.server.scan_next.min
(gauge)
hbase.regionserver.server.scan_next.max
(gauge)
hbase.regionserver.server.scan_next.mean
(gauge)
hbase.regionserver.server.scan_next.median
(gauge)
hbase.regionserver.server.scan_next.percentile.99
(gauge)
hbase.regionserver.server.pause_time_without_gc.num_ops
(gauge)

shown as millisecond
hbase.regionserver.server.pause_time_without_gc.min
(gauge)

shown as millisecond
hbase.regionserver.server.pause_time_without_gc.max
(gauge)

shown as millisecond
hbase.regionserver.server.pause_time_without_gc.mean
(gauge)

shown as millisecond
hbase.regionserver.server.pause_time_without_gc.median
(gauge)

shown as millisecond
hbase.regionserver.server.pause_time_without_gc.percentile.99
(gauge)

shown as millisecond
hbase.regionserver.server.slow_put_count
(gauge)
The number of Multis that took over 1000ms to complete
hbase.regionserver.server.slow_increment_count
(gauge)
The number of Increments that took over 1000ms to complete
hbase.regionserver.server.split_time.num_ops
(gauge)

shown as millisecond
hbase.regionserver.server.split_time.min
(gauge)

shown as millisecond
hbase.regionserver.server.split_time.max
(gauge)

shown as millisecond
hbase.regionserver.server.split_time.mean
(gauge)

shown as millisecond
hbase.regionserver.server.split_time.median
(gauge)

shown as millisecond
hbase.regionserver.server.split_time.percentile.99
(gauge)

shown as millisecond
hbase.regionserver.wal.append_size.num_ops
(gauge)
size (in bytes) of the data appended to the WAL.
shown as byte
hbase.regionserver.wal.append_size.min
(gauge)
size (in bytes) of the data appended to the WAL.
shown as byte
hbase.regionserver.wal.append_size.max
(gauge)
size (in bytes) of the data appended to the WAL.
shown as byte
hbase.regionserver.wal.append_size.mean
(gauge)
size (in bytes) of the data appended to the WAL.
shown as byte
hbase.regionserver.wal.append_size.median
(gauge)
size (in bytes) of the data appended to the WAL.
shown as byte
hbase.regionserver.wal.append_size.percentile.99
(gauge)
size (in bytes) of the data appended to the WAL.
shown as byte
hbase.regionserver.wal.sync_time.num_ops
(gauge)
the time it took to sync the WAL to HDFS.
shown as millisecond
hbase.regionserver.wal.sync_time.min
(gauge)
the time it took to sync the WAL to HDFS.
shown as millisecond
hbase.regionserver.wal.sync_time.max
(gauge)
the time it took to sync the WAL to HDFS.
shown as millisecond
hbase.regionserver.wal.sync_time.mean
(gauge)
the time it took to sync the WAL to HDFS.
shown as millisecond
hbase.regionserver.wal.sync_time.median
(gauge)
the time it took to sync the WAL to HDFS.
shown as millisecond
hbase.regionserver.wal.sync_time.percentile.99
(gauge)
the time it took to sync the WAL to HDFS.
shown as millisecond
hbase.regionserver.wal.slow_append_count
(gauge)
Number of appends that were slow.
hbase.regionserver.wal.roll_request
(gauge)
How many times a log roll has been requested total
shown as millisecond
hbase.regionserver.wal.append_count
(gauge)
Number of appends to the write ahead log.
hbase.regionserver.wal.low_replica_roll_request
(gauge)
How many times a log roll was requested due to too few DN's in the write pipeline.
shown as millisecond
hbase.regionserver.wal.append_time.num_ops
(gauge)
time an append to the log took.
shown as millisecond
hbase.regionserver.wal.append_time.min
(gauge)
time an append to the log took.
shown as millisecond
hbase.regionserver.wal.append_time.max
(gauge)
time an append to the log took.
shown as millisecond
hbase.regionserver.wal.append_time.mean
(gauge)
time an append to the log took.
shown as millisecond
hbase.regionserver.wal.append_time.median
(gauge)
time an append to the log took.
shown as millisecond
hbase.regionserver.wal.append_time.percentile.99
(gauge)
time an append to the log took.
shown as millisecond
hbase.jvm_metrics.mem_non_heap_used_in_mb
(gauge)
Non-heap memory used in MB
hbase.jvm_metrics.mem_non_heap_committed_in_mb
(gauge)
Non-heap memory committed in MB
hbase.jvm_metrics.mem_non_heap_max_in_mb
(gauge)
Non-heap memory max in MB
hbase.jvm_metrics.mem_heap_used_in_mb
(gauge)
Heap memory used in MB
hbase.jvm_metrics.mem_heap_committed_in_mb
(gauge)
Heap memory committed in MB
hbase.jvm_metrics.mem_heap_max_in_mb
(gauge)
Heap memory max in MB
hbase.jvm_metrics.mem_max_in_mb
(gauge)
Max memory size in MB
hbase.jvm_metrics.gc_count_par_new
(gauge)
GC Count for ParNew
hbase.jvm_metrics.gc_time_millis_par_new
(gauge)
GC Time for ParNew
shown as millisecond
hbase.jvm_metrics.gc_count_concurrent_mark_sweep
(gauge)
GC Count for ConcurrentMarkSweep
hbase.jvm_metrics.gc_time_millis_concurrent_mark_sweep
(gauge)
GC Time for ConcurrentMarkSweep
shown as millisecond
hbase.jvm_metrics.gc_count
(gauge)
Total GC count
hbase.jvm_metrics.gc_time_millis
(gauge)
Total GC time in milliseconds
shown as millisecond

Events

The Hbase_regionserver check does not include any events at this time.

Service Checks

The Hbase_regionserver check does not include any service checks at this time.

Troubleshooting

Need help? Contact Datadog Support.


Mistake in the docs? Feel free to contribute!