Datadog-Hadoop MapReduce Integration

Overview

Get metrics from mapreduce service in real time to:

  • Visualize and monitor mapreduce states
  • Be notified about mapreduce failovers and events.

Setup

Installation

Install the dd-check-mapreduce package manually or with your favorite configuration manager

Configuration

Edit the mapreduce.yaml file to point to your server and port, set the masters to monitor. See the sample mapreduce.yaml for all available configuration options.

Validation

Run the Agent’s info subcommand and look for mapreduce under the Checks section:

Checks
======

    mapreduce
    -----------
      - instance #0 [OK]
      - Collected 39 metrics, 0 events & 7 service checks

Compatibility

The mapreduce check is compatible with all major platforms

Data Collected

Metrics

mapreduce.job.elapsed_time.max
(gauge)
Max elapsed time since the application started
shown as millisecond
mapreduce.job.elapsed_time.avg
(gauge)
Average elapsed time since the application started
shown as millisecond
mapreduce.job.elapsed_time.median
(gauge)
Median elapsed time since the application started
shown as millisecond
mapreduce.job.elapsed_time.95percentile
(gauge)
95th percentile elapsed time since the application started
shown as millisecond
mapreduce.job.elapsed_time.count
(rate)
Number of times the elapsed time was sampled
shown as
mapreduce.job.maps_total
(rate)
Total number of maps
shown as task
mapreduce.job.maps_completed
(rate)
Number of completed maps
shown as task
mapreduce.job.reduces_total
(rate)
Number of reduces
shown as task
mapreduce.job.reduces_completed
(rate)
Number of completed reduces
shown as task
mapreduce.job.maps_pending
(rate)
Number of pending maps
shown as task
mapreduce.job.maps_running
(rate)
Number of running maps
shown as task
mapreduce.job.reduces_pending
(rate)
Number of pending reduces
shown as task
mapreduce.job.reduces_running
(rate)
Number of running reduces
shown as task
mapreduce.job.new_reduce_attempts
(rate)
Number of new reduce attempts
shown as task
mapreduce.job.running_reduce_attempts
(rate)
Number of running reduce attempts
shown as task
mapreduce.job.failed_reduce_attempts
(rate)
Number of failed reduce attempts
shown as task
mapreduce.job.killed_reduce_attempts
(rate)
Number of killed reduce attempts
shown as task
mapreduce.job.successful_reduce_attempts
(rate)
Number of successful reduce attempts
shown as task
mapreduce.job.new_map_attempts
(rate)
Number of new map attempts
shown as task
mapreduce.job.running_map_attempts
(rate)
Number of running map attempts
shown as task
mapreduce.job.failed_map_attempts
(rate)
Number of failed map attempts
shown as task
mapreduce.job.killed_map_attempts
(rate)
Number of killed map attempts
shown as task
mapreduce.job.successful_map_attempts
(rate)
Number of successful map attempts
shown as task
mapreduce.job.counter.reduce_counter_value
(rate)
Counter value of reduce tasks
shown as task
mapreduce.job.counter.map_counter_value
(rate)
Counter value of map tasks
shown as task
mapreduce.job.counter.total_counter_value
(rate)
Counter value of all tasks
shown as task
mapreduce.job.map.task.elapsed_time.max
(gauge)
Max of all map tasks elapsed time
shown as millisecond
mapreduce.job.map.task.elapsed_time.avg
(gauge)
Average of all map tasks elapsed time
shown as millisecond
mapreduce.job.map.task.elapsed_time.median
(gauge)
Median of all map tasks elapsed time
shown as millisecond
mapreduce.job.map.task.elapsed_time.95percentile
(gauge)
95th percentile of all map tasks elapsed time
shown as millisecond
mapreduce.job.map.task.elapsed_time.count
(rate)
Number of times the map tasks elapsed time were sampled
shown as
mapreduce.job.reduce.task.elapsed_time.max
(gauge)
Max of all reduce tasks elapsed time
shown as millisecond
mapreduce.job.reduce.task.elapsed_time.avg
(gauge)
Average of all reduce tasks elapsed time
shown as millisecond
mapreduce.job.reduce.task.elapsed_time.median
(gauge)
Median of all reduce tasks elapsed time
shown as millisecond
mapreduce.job.reduce.task.elapsed_time.95percentile
(gauge)
95th percentile of all reduce tasks elapsed time
shown as millisecond
mapreduce.job.reduce.task.elapsed_time.count
(rate)
Number of times the reduce tasks elapsed time were sampled
shown as

Events

The Mapreduce check does not include any event at this time.

Service Checks

The Mapreduce check does not include any service check at this time.

Troubleshooting

Need help? Contact Datadog Support.

Further Reading