Datadog-RabbitMQ Integration

RabbitMQ Dashboard

Overview

The RabbitMQ check lets you:

  • Track queue-based stats: queue size, consumer count, unacknowledged messages, redelivered messages, etc
  • Track node-based stats: waiting processes, used sockets, used file descriptors, etc
  • Monitor vhosts for aliveness and number of connections

And more.

Setup

Installation

The RabbitMQ check is packaged with the Agent, so simply install the Agent on your RabbitMQ servers. If you need the newest version of the check, install the dd-check-rabbitmq package.

Configuration

Prepare RabbitMQ

You must enable the RabbitMQ management plugin. See RabbitMQ’s documentation to enable it.

Connect the Agent

Create a file rabbitmq.yaml in the Agent’s conf.d directory. See the sample rabbitmq.yaml for all available configuration options:

init_config:

instances:
  - rabbitmq_api_url: http://localhost:15672/api/
#   rabbitmq_user: <RABBIT_USER> # if your rabbitmq API requires auth; default is guest
#   rabbitmq_pass: <RABBIT_PASS> # default is guest
#   tag_families: true           # default is false
#   vhosts:
#     - <THE_ONE_VHOST_YOU_CARE_ABOUT>

If you don’t set vhosts, the Agent sends the following for EVERY vhost:

  1. the rabbitmq.aliveness service check
  2. the rabbitmq.connections metric

If you do set vhosts, the Agent sends this check and metric only for the vhosts you list.

There are options for queues and nodes that work similarly—the Agent checks all queues and nodes by default, but you can provide lists or regexes to limit this. See the example check configuration for details on these configuration options (and all others).

Configuration Options

  • rabbitmq_api_url - required - Points to the api url of the RabbitMQ Managment Plugin
  • rabbitmq_user - optional - Defaults to ‘guest’
  • rabbitmq_pass - optional - Defaults to ‘guest’
  • tag_families - optional - Defaults to false - Tag queue “families” based off of regex matching
  • nodes or nodes_regexes - optional - Use the nodes or nodes_regexes parameters to specify the nodes you’d like to collect metrics on (up to 100 nodes). If you have less than 100 nodes, you don’t have to set this parameter, the metrics will be collected on all the nodes by default. See the link to the example YAML below for more.
  • queues or queues_regexes - optional - Use the queues or queues_regexes parameters to specify the queues you’d like to collect metrics on (up to 200 queues). If you have less than 200 queues, you don’t have to set this parameter, the metrics will be collected on all the queues by. default. If you have set up vhosts, set the queue names as vhost_name/queue_name. If you have tag_families enabled, the first captured group in the regex will be used as the queue_family tag. See the link to the example YAML below for more.
  • vhosts - optional - By default a list of all vhosts is fetched and each one will be checked using the aliveness API. If you prefer only certain vhosts to be monitored with service checks then you can list the vhosts you care about.

Restart the Agent to begin sending RabbitMQ metrics, events, and service checks to Datadog.

Validation

Run the Agent’s info subcommand and look for rabbitmq under the Checks section:

  Checks
  ======
    [...]

    rabbitmq
    -------
      - instance #0 [OK]
      - Collected 26 metrics, 0 events & 2 service checks

    [...]

Compatibility

The rabbitmq check is compatible with all major platforms.

Data Collected

Metrics

rabbitmq.node.fd_used
(gauge)
Used file descriptors
shown as
rabbitmq.node.mem_used
(gauge)
Memory used in bytes
shown as byte
rabbitmq.node.run_queue
(gauge)
Average number of Erlang processes waiting to run
shown as process
rabbitmq.node.sockets_used
(gauge)
Number of file descriptors used as sockets
shown as
rabbitmq.node.partitions
(gauge)
Number of network partitions this node is seeing
shown as
rabbitmq.queue.active_consumers
(gauge)
Number of active consumers, consumers that can immediately receive any messages sent to the queue
shown as
rabbitmq.queue.bindings.count
(gauge)
Number of bindings for a specific queue
shown as
rabbitmq.queue.consumers
(gauge)
Number of consumers
shown as
rabbitmq.queue.consumer_utilisation
(gauge)
The ratio of time that a queue's consumers can take new messages
shown as fraction
rabbitmq.queue.memory
(gauge)
Bytes of memory consumed by the Erlang process associated with the queue, including stack, heap and internal structures
shown as byte
rabbitmq.queue.messages
(gauge)
Count of the total messages in the queue
shown as message
rabbitmq.queue.messages.rate
(gauge)
Count per second of the total messages in the queue
shown as message
rabbitmq.queue.messages_ready
(gauge)
Number of messages ready to be delivered to clients
shown as message
rabbitmq.queue.messages_ready.rate
(gauge)
Number per second of messages ready to be delivered to clients
shown as message
rabbitmq.queue.messages_unacknowledged
(gauge)
Number of messages delivered to clients but not yet acknowledged
shown as message
rabbitmq.queue.messages_unacknowledged.rate
(gauge)
Number per second of messages delivered to clients but not yet acknowledged
shown as message
rabbitmq.queue.messages.ack.count
(gauge)
Number of messages delivered to clients and acknowledged
shown as message
rabbitmq.queue.messages.ack.rate
(gauge)
Number per second of messages delivered to clients and acknowledged
shown as message
rabbitmq.queue.messages.deliver.count
(gauge)
Count of messages delivered in acknowledgement mode to consumers
shown as message
rabbitmq.queue.messages.deliver.rate
(gauge)
Count of messages delivered in acknowledgement mode to consumers
shown as message
rabbitmq.queue.messages.deliver_get.count
(gauge)
Sum of messages delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get.
shown as message
rabbitmq.queue.messages.deliver_get.rate
(gauge)
Rate per second of the sum of messages delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get.
shown as message
rabbitmq.queue.messages.publish.count
(gauge)
Count of messages published
shown as message
rabbitmq.queue.messages.publish.rate
(gauge)
Rate per second of messages published
shown as message
rabbitmq.queue.messages.redeliver.count
(gauge)
Count of subset of messages in deliver_get which had the redelivered flag set
shown as message
rabbitmq.queue.messages.redeliver.rate
(gauge)
Rate per second of subset of messages in deliver_get which had the redelivered flag set
shown as message
rabbitmq.connections
(gauge)
Number of current connections to a given rabbitmq vhost, tagged 'rabbitmq_vhost:<vhost_name>'
shown as connection
rabbitmq.connections.state
(gauge)
Number of connections in the specified connection state
shown as connection_state

The Agent tags rabbitmq.queue.* metrics by queue name, and rabbitmq.node.* metrics by node name.

Events

For performance reasons, the RabbitMQ check self-limits the number of queues and nodes it will collect metrics for. If and when the check nears this limit, it emits a warning-level event to your event stream.

See the example check configuration for details about these limits.

Service Checks

rabbitmq.aliveness:

The Agent submits this service check for all vhosts (if vhosts is not configured) OR a subset of vhosts (those configured in vhosts), tagging each service check vhost:<vhost_name>. Returns CRITICAL if the aliveness check failed, otherwise OK.

rabbitmq.status:

Returns CRITICAL if the Agent cannot connect to rabbitmq to collect metrics, otherwise OK.

Troubleshooting

Need help? Contact Datadog Support.

Further Reading

Datadog Blog

Learn more about infrastructure monitoring and all our integrations on our blog

Knowledge Base

  • Tagging RabbitMQ queues by tag Family
  • By default, queue metrics are tagged by queue and node metrics are tagged by node. If you have a Datadog account you can see the integration installation instructions here