The Service Map for APM is here!

Storm

Agent Check Agent Check

Supported OS: Linux Mac OS Windows

Overview

Get metrics from Storm service in real time to:

  • Visualize and monitor Storm cluster and topology metrics.
  • Be notified about Storm failovers and events.

Setup

The Storm check is NOT included in the Datadog Agent package.

Installation

To install the Storm check on your host:

  1. Download the Datadog Agent.
  2. Download the check.py file for Storm.
  3. Place it in the Agent’s checks.d directory.
  4. Rename it to storm.py.

Configuration

To configure the Storm check:

  1. Create a storm.d/ folder in the conf.d/ folder at the root of your Agent’s directory.
  2. Create a conf.yaml file in the storm.d/ folder previously created.
  3. Consult the sample storm.yaml file and copy its content in the conf.yaml file.
  4. Edit the conf.yaml file to point to your server and port, set the masters to monitor.
  5. Restart the Agent.

Validation

Run the Agent’s status subcommand and look for storm under the Checks section.

Data Collected

Metrics

storm.bolt.last_<interval>.acked
(gauge)
Number of Acked Tuples
shown as sample
storm.bolt.last_<interval>.capacity
(gauge)
Bolt Capacity
shown as fraction
storm.bolt.last_<interval>.emitted
(gauge)
Number of Emitted Tuples
shown as sample
storm.bolt.last_<interval>.errorLapsedSecs
(gauge)
Number of Seconds Since Last Error
shown as second
storm.bolt.last_<interval>.executed
(gauge)
Number of Tuples Executed
shown as sample
storm.bolt.last_<interval>.executeLatency
(gauge)
Bolt Execute Latency
shown as millisecond
storm.bolt.last_<interval>.executors
(gauge)
Number of Bolt Executors
shown as thread
storm.bolt.last_<interval>.failed
(gauge)
Number of Failed Tuples
shown as sample
storm.bolt.last_<interval>.processLatency
(gauge)
Bolt Process Latency
shown as millisecond
storm.bolt.last_<interval>.requestedCpu
(gauge)
Bolt Requested CPU
shown as percent
storm.bolt.last_<interval>.requestedMemOffHeap
(guage)
Bolt Requested Memory Off Heap
shown as mebibyte
storm.bolt.last_<interval>.requestedMemOnHeap
(guage)
Bolt Requested Memory On Heap
shown as mebibyte
storm.bolt.last_<interval>.tasks
(gauge)
Bolt Tasks
shown as task
storm.bolt.last_<interval>.transferred
(gauge)
Number of Transferred Tuples
shown as sample
storm.cluster.availCpu
(gauge)
Available Storm Cluster CPU
shown as core
storm.cluster.availMem
(gauge)
Available Storm Cluster Memory
shown as mebibyte
storm.cluster.cpuAssignedPercentUtil
(gauge)
Storm Cluster CPU Assigned Percent
shown as percent
storm.cluster.executorsTotal
(gauge)
Total Storm Cluster Executors
shown as thread
storm.cluster.memAssignedPercentUtil
(gauge)
Storm Cluster Memory Assigned Percent
shown as percent
storm.cluster.slotsFree
(gauge)
Total Cluster Slots Available
shown as process
storm.cluster.slotsTotal
(gauge)
Total Cluster Slots
shown as process
storm.cluster.slotsUsed
(gauge)
Total Storm Cluster Slots Used
shown as process
storm.cluster.supervisors
(gauge)
Total Storm Cluster Supervisors
shown as worker
storm.cluster.tasksTotal
(gauge)
Total Storm Cluster Tasks
shown as task
storm.cluster.topologies
(gauge)
Number of Storm Topologies
shown as service
storm.cluster.totalCpu
(gauge)
Total Storm Cluster CPU
shown as core
storm.cluster.totalMem
(gauge)
Total Storm Cluster Memory
shown as mebibyte
storm.nimbus.numDead
(gauge)
Number of Dead Nimbus Nodes
shown as node
storm.nimbus.numFollowers
(gauge)
Number of Follower Nimbus Nodes
shown as node
storm.nimbus.numLeaders
(gauge)
Number of Leader Nimbus Nodes
shown as node
storm.nimbus.numOffline
(gauge)
Number of Offline Nimbus Nodes
shown as node
storm.nimbus.upTimeSeconds
(gauge)
Nimbus Uptime Seconds
shown as second
storm.spout.last_<interval>.acked
(gauge)
Number of Acked Tuples
shown as sample
storm.spout.last_<interval>.completeLatency
(gauge)
Spout Complete Latency
shown as millisecond
storm.spout.last_<interval>.emitted
(gauge)
Number of Emitted Tuples
shown as sample
storm.spout.last_<interval>.errorLapsedSecs
(gauge)
Number of Seconds Since Last Error
shown as second
storm.spout.last_<interval>.executors
(gauge)
Number of Spout Executors
shown as thread
storm.spout.last_<interval>.failed
(gauge)
Number of Failed Tuples
shown as sample
storm.spout.last_<interval>.requestedCpu
(gauge)
Spout Requested CPU
shown as percent
storm.spout.last_<interval>.requestedMemOffHeap
(guage)
Spout Requested Memory Off Heap
shown as mebibyte
storm.spout.last_<interval>.requestedMemOnHeap
(guage)
Spout Requested Memory On Heap
shown as mebibyte
storm.spout.last_<interval>.tasks
(gauge)
Spout Tasks
shown as task
storm.spout.last_<interval>.transferred
(gauge)
Number of Transferred Tuples
shown as sample
storm.supervisor.slotsTotal
(gauge)
Total Supervisor Slots
shown as process
storm.supervisor.slotsUsed
(gauge)
Used Supervisor Slots
shown as process
storm.supervisor.totalCpu
(gauge)
Total Supervisor CPU
shown as core
storm.supervisor.totalMem
(gauge)
Total Supervisor Memory
shown as mebibyte
storm.supervisor.uptimeSeconds
(gauge)
Supervisor Uptime
shown as second
storm.supervisor.usedCpu
(gauge)
Used Supervisor CPU
shown as core
storm.supervisor.usedMem
(gauge)
Used Supervisor Memory
shown as mebibyte
storm.topologyStats.last_<interval>.acked
(gauge)
All Time Acked Tuples
shown as sample
storm.topologyStats.last_<interval>.assignedCpu
(gauge)
Assigned CPU Percentage
shown as percent
storm.topologyStats.last_<interval>.assignedMemOffHeap
(gauge)
Off Heap Memory Assigned
shown as mebibyte
storm.topologyStats.last_<interval>.assignedMemOnHeap
(gauge)
On Heap Memory Assigned
shown as mebibyte
storm.topologyStats.last_<interval>.assignedTotalMem
(gauge)
Total Memory Assigned
shown as mebibyte
storm.topologyStats.last_<interval>.completeLatency
(gauge)
All Time Complete Latence
shown as millisecond
storm.topologyStats.last_<interval>.debug
(gauge)
Boolean indicating if debug mode is enabled.
shown as sample
storm.topologyStats.last_<interval>.emitted
(gauge)
All Time Emitted Tuples
shown as sample
storm.topologyStats.last_<interval>.executorsTotal
(gauge)
Total Storm Topology Executors
shown as thread
storm.topologyStats.last_<interval>.failed
(gauge)
All Time Failed Tuples
shown as sample
storm.topologyStats.last_<interval>.msgTimeout
(gauge)
Spout Tuple Timeout in Seconds
shown as second
storm.topologyStats.last_<interval>.numBolts
(gauge)
Total Number of Bolts
shown as task
storm.topologyStats.last_<interval>.numSpouts
(gauge)
Total Number of Spouts
shown as task
storm.topologyStats.last_<interval>.replicationCount
(gauge)
Number of Replications
shown as occurrence
storm.topologyStats.last_<interval>.requestedCpu
(gauge)
Requested Topology CPU resources
shown as percent
storm.topologyStats.last_<interval>.requestedMemOffHeap
(gauge)
Requested Topology Off Heap Memory resources
shown as mebibyte
storm.topologyStats.last_<interval>.requestedMemOnHeap
(gauge)
Requested Topology On Heap Memory Resources
shown as mebibyte
storm.topologyStats.last_<interval>.samplingPct
(gauge)
Metric Sampling Percentage by Storm
shown as percent
storm.topologyStats.last_<interval>.tasksTotal
(gauge)
Total Number of Tasks
shown as task
storm.topologyStats.last_<interval>.transferred
(gauge)
All Time Transferred Tuples
shown as sample
storm.topologyStats.last_<interval>.uptimeSeconds
(gauge)
Total Topology Uptime
shown as second
storm.topologyStats.last_<interval>.workersTotal
(gauge)
Total Number of Workers
shown as worker
storm.topologyStats.metrics.bolts.last_<interval>..acked
(gauge)
Number of Tuples Acked by Spout & Stream
shown as sample
storm.topologyStats.metrics.bolts.last_<interval>.complete_ms_avg
(gauge)
Complete Tuple Latency by Spout & Stream
shown as millisecond
storm.topologyStats.metrics.bolts.last_<interval>.emitted
(gauge)
Number of Tuples Emitted by Spout & Stream
shown as sample
storm.topologyStats.metrics.bolts.last_<interval>.executed
(gauge)
Number of Tuples Executed by Spout & Stream
shown as sample
storm.topologyStats.metrics.bolts.last_<interval>.executed_ms_avg
(gauge)
Execute Tuple Latency by Spout & Stream
shown as millisecond
storm.topologyStats.metrics.bolts.last_<interval>.failed
(gauge)
Number of Tuples Failed by Spout & Stream
shown as sample
storm.topologyStats.metrics.bolts.last_<interval>.process_ms_avg
(gauge)
Process Tuple Latency by Spout & Stream
shown as millisecond
storm.topologyStats.metrics.bolts.last_<interval>.transferred
(gauge)
Number of Tuples Transferred by Spout & Stream
shown as sample
storm.topologyStats.metrics.spouts.last_<interval>.acked
(gauge)
Number of Tuples Acked by Spout & Stream
shown as sample
storm.topologyStats.metrics.spouts.last_<interval>.complete_ms_avg
(gauge)
Complete Tuple Latency by Spout & Stream
shown as millisecond
storm.topologyStats.metrics.spouts.last_<interval>.emitted
(gauge)
Number of Tuples Emitted by Spout & Stream
shown as sample
storm.topologyStats.metrics.spouts.last_<interval>.executed
(gauge)
Number of Tuples Executed by Spout & Stream
shown as sample
storm.topologyStats.metrics.spouts.last_<interval>.executed_ms_avg
(gauge)
Execute Tuple Latency by Spout & Stream
shown as millisecond
storm.topologyStats.metrics.spouts.last_<interval>.failed
(gauge)
Number of Tuples Failed by Spout & Stream
shown as sample
storm.topologyStats.metrics.spouts.last_<interval>.process_ms_avg
(gauge)
Process Tuple Latency by Spout & Stream
shown as millisecond
storm.topologyStats.metrics.spouts.last_<interval>.transferred
(gauge)
Number of Tuples Transferred by Spout & Stream
shown as sample
storm.worker.last_<interval>.assignedCpu
(guage)
Assigned Worker CPU Percentage
shown as percent
storm.worker.last_<interval>.assignedMemOffHeap
(guage)
Off Heap Memory Assigned for a Worker
shown as mebibyte
storm.worker.last_<interval>.assignedMemOnHeap
(guage)
On Heap Memory Assigned for a Worker
shown as mebibyte
storm.worker.last_<interval>.componentNumTasks
(histogram)
Total Number of Component Tasks for a Worker
shown as task
storm.worker.last_<interval>.executorsTotal
(guage)
Total Number of Executors for a Worker
shown as thread
storm.worker.last_<interval>.uptimeSeconds
(guage)
Worker Uptime
shown as second

Events

The Storm check does not include any events at this time.

Service Checks

topology_check.{TOPOLOGY NAME}

The check returns:

  • OK if the topology is active.
  • CRITICAL if the topology is not active.

Troubleshooting

Need help? Contact Datadog Support.


Mistake in the docs? Feel free to contribute!