Logging is here!

Map Reduce

Agent Check Agent Check

Supported OS: Linux Mac OS Windows

Overview

Get metrics from mapreduce service in real time to:

  • Visualize and monitor mapreduce states
  • Be notified about mapreduce failovers and events.

Setup

Installation

The Mapreduce check is included in the Datadog Agent package, so you don’t need to install anything else on your servers.

Configuration

Edit the mapreduce.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to point to your server and port, set the masters to monitor. See the sample mapreduce.d/conf.yaml for all available configuration options.

Validation

Run the Agent’s status subcommand and look for mapreduce under the Checks section.

Data Collected

Metrics

mapreduce.job.elapsed_time.max
(gauge)
Max elapsed time since the application started
shown as millisecond
mapreduce.job.elapsed_time.avg
(gauge)
Average elapsed time since the application started
shown as millisecond
mapreduce.job.elapsed_time.median
(gauge)
Median elapsed time since the application started
shown as millisecond
mapreduce.job.elapsed_time.95percentile
(gauge)
95th percentile elapsed time since the application started
shown as millisecond
mapreduce.job.elapsed_time.count
(rate)
Number of times the elapsed time was sampled
mapreduce.job.maps_total
(rate)
Total number of maps
shown as task
mapreduce.job.maps_completed
(rate)
Number of completed maps
shown as task
mapreduce.job.reduces_total
(rate)
Number of reduces
shown as task
mapreduce.job.reduces_completed
(rate)
Number of completed reduces
shown as task
mapreduce.job.maps_pending
(rate)
Number of pending maps
shown as task
mapreduce.job.maps_running
(rate)
Number of running maps
shown as task
mapreduce.job.reduces_pending
(rate)
Number of pending reduces
shown as task
mapreduce.job.reduces_running
(rate)
Number of running reduces
shown as task
mapreduce.job.new_reduce_attempts
(rate)
Number of new reduce attempts
shown as task
mapreduce.job.running_reduce_attempts
(rate)
Number of running reduce attempts
shown as task
mapreduce.job.failed_reduce_attempts
(rate)
Number of failed reduce attempts
shown as task
mapreduce.job.killed_reduce_attempts
(rate)
Number of killed reduce attempts
shown as task
mapreduce.job.successful_reduce_attempts
(rate)
Number of successful reduce attempts
shown as task
mapreduce.job.new_map_attempts
(rate)
Number of new map attempts
shown as task
mapreduce.job.running_map_attempts
(rate)
Number of running map attempts
shown as task
mapreduce.job.failed_map_attempts
(rate)
Number of failed map attempts
shown as task
mapreduce.job.killed_map_attempts
(rate)
Number of killed map attempts
shown as task
mapreduce.job.successful_map_attempts
(rate)
Number of successful map attempts
shown as task
mapreduce.job.counter.reduce_counter_value
(rate)
Counter value of reduce tasks
shown as task
mapreduce.job.counter.map_counter_value
(rate)
Counter value of map tasks
shown as task
mapreduce.job.counter.total_counter_value
(rate)
Counter value of all tasks
shown as task
mapreduce.job.map.task.elapsed_time.max
(gauge)
Max of all map tasks elapsed time
shown as millisecond
mapreduce.job.map.task.elapsed_time.avg
(gauge)
Average of all map tasks elapsed time
shown as millisecond
mapreduce.job.map.task.elapsed_time.median
(gauge)
Median of all map tasks elapsed time
shown as millisecond
mapreduce.job.map.task.elapsed_time.95percentile
(gauge)
95th percentile of all map tasks elapsed time
shown as millisecond
mapreduce.job.map.task.elapsed_time.count
(rate)
Number of times the map tasks elapsed time were sampled
mapreduce.job.reduce.task.elapsed_time.max
(gauge)
Max of all reduce tasks elapsed time
shown as millisecond
mapreduce.job.reduce.task.elapsed_time.avg
(gauge)
Average of all reduce tasks elapsed time
shown as millisecond
mapreduce.job.reduce.task.elapsed_time.median
(gauge)
Median of all reduce tasks elapsed time
shown as millisecond
mapreduce.job.reduce.task.elapsed_time.95percentile
(gauge)
95th percentile of all reduce tasks elapsed time
shown as millisecond
mapreduce.job.reduce.task.elapsed_time.count
(rate)
Number of times the reduce tasks elapsed time were sampled

Events

The Mapreduce check does not include any events at this time.

Service Checks

mapreduce.resource_manager.can_connect

Returns CRITICAL if the Agent is unable to connect to the Resource Manager. Returns OK otherwise.

mapreduce.application_master.can_connect

Returns CRITICAL if the Agent is unable to connect to the Application Master. Returns OK otherwise.

Troubleshooting

Need help? Contact Datadog Support.

Further Reading


Mistake in the docs? Feel free to contribute!