The Service Map for APM is here!

RabbitMQ

Agent Check Agent Check

Supported OS: Linux Mac OS Windows

RabbitMQ Dashboard

Overview

The RabbitMQ check lets you:

  • Track queue-based stats: queue size, consumer count, unacknowledged messages, redelivered messages, etc
  • Track node-based stats: waiting processes, used sockets, used file descriptors, etc
  • Monitor vhosts for aliveness and number of connections

And more.

Setup

Installation

The RabbitMQ check is included in the Datadog Agent package, so you don’t need to install anything else on your RabbitMQ servers.

Configuration

Edit the rabbitmq.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your RabbitMQ metrics and logs. See the sample rabbitmq.yaml for all available configuration options.

Prepare RabbitMQ

Enable the RabbitMQ management plugin. See RabbitMQ’s documentation to enable it.

Metric Collection

  • Add this configuration block to your rabbitmq.d/conf.yaml file to start gathering your RabbitMQ metrics:
init_config:

instances:
  - rabbitmq_api_url: http://localhost:15672/api/
  #  rabbitmq_user: <RABBIT_USER> # if your rabbitmq API requires auth; default is guest
  #  rabbitmq_pass: <RABBIT_PASS> # default is guest
  #  tag_families: true           # default is false
  #  vhosts:
  #    - <THE_ONE_VHOST_YOU_CARE_ABOUT>

If you don’t set vhosts, the Agent sends the following for EVERY vhost:

  1. the rabbitmq.aliveness service check
  2. the rabbitmq.connections metric

If you do set vhosts, the Agent sends this check and metric only for the vhosts you list.

There are options for queues and nodes that work similarly. The Agent checks all queues and nodes by default, but you can provide lists or regexes to limit this. See the example check configuration for details on these configuration options (and all others).

Configuration Options:

  • rabbitmq_api_url - required - Points to the api url of the RabbitMQ Managment Plugin
  • rabbitmq_user - optional - Defaults to ‘guest’
  • rabbitmq_pass - optional - Defaults to ‘guest’
  • tag_families - optional - Defaults to false - Tag queue “families” based off of regex matching
  • nodes or nodes_regexes - optional - Use the nodes or nodes_regexes parameters to specify the nodes you’d like to collect metrics on (up to 100 nodes). If you have less than 100 nodes, you don’t have to set this parameter, the metrics will be collected on all the nodes by default. See the link to the example YAML below for more.
  • queues or queues_regexes - optional - Use the queues or queues_regexes parameters to specify the queues you’d like to collect metrics on (up to 200 queues). If you have less than 200 queues, you don’t have to set this parameter, the metrics will be collected on all the queues by. default. If you have set up vhosts, set the queue names as vhost_name/queue_name. If you have tag_families enabled, the first captured group in the regex will be used as the queue_family tag. See the link to the example YAML below for more.
  • vhosts - optional - By default a list of all vhosts is fetched and each one will be checked using the aliveness API. If you prefer only certain vhosts to be monitored, list the vhosts you care about.

See the sample rabbitmq.d/conf.yaml for all available configuration options.

Restart the Agent to begin sending RabbitMQ metrics, events, and service checks to Datadog.

Log Collection

Available for Agent >6.0

  1. To modify the default log file location either set the RABBITMQ_LOGS environment variable or add the following in your rabbitmq configuration file (/etc/rabbitmq/rabbitmq.conf):

    log.dir = /var/log/rabbit
    log.file = rabbit.log
    
  2. Collecting logs is disabled by default in the Datadog Agent, enable it in your datadog.yaml file:

    logs_enabled: true
    
  3. Add this configuration block to your rabbitmq.d/conf.yaml file to start collecting your RabbitMQ logs:

    logs:
    
        - type: file
          path: /var/log/rabbit/*.log
          source: rabbitmq
          service: myservice
          log_processing_rules:
            - type: multi_line
              name: logs_starts_with_equal_sign
              pattern: "="
    

    See the sample rabbitmq.yaml for all available configuration options.

  4. Restart the Agent.

Validation

Run the Agent’s status subcommand and look for rabbitmq under the Checks section.

Data Collected

Metrics

rabbitmq.node.fd_used
(gauge)
Used file descriptors
rabbitmq.node.disk_free
(gauge)
Current free disk space
shown as byte
rabbitmq.node.mem_used
(gauge)
Memory used in bytes
shown as byte
rabbitmq.node.run_queue
(gauge)
Average number of Erlang processes waiting to run
shown as process
rabbitmq.node.sockets_used
(gauge)
Number of file descriptors used as sockets
rabbitmq.node.partitions
(gauge)
Number of network partitions this node is seeing
rabbitmq.node.running
(gauge)
Is the node running or not
rabbitmq.node.mem_alarm
(gauge)
Does the host has memory alarm
rabbitmq.node.disk_alarm
(gauge)
Does the node have disk alarm
rabbitmq.exchange.messages.ack.count
(gauge)
Number of messages delivered to clients and acknowledged
shown as message
rabbitmq.exchange.messages.ack.rate
(gauge)
Rate of messages delivered to clients and acknowledged per second
shown as message
rabbitmq.exchange.messages.confirm.count
(gauge)
Count of messages confirmed
shown as message
rabbitmq.exchange.messages.confirm.rate
(gauge)
Rate of messages confirmed per second
shown as message
rabbitmq.exchange.messages.deliver_get.count
(gauge)
Sum of messages delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get
shown as message
rabbitmq.exchange.messages.deliver_get.rate
(gauge)
Rate per second of the sum of messages delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get
shown as message
rabbitmq.exchange.messages.redeliver.count
(gauge)
Count of subset of messages in deliver_get which had the redelivered flag set
shown as message
rabbitmq.exchange.messages.redeliver.rate
(gauge)
Rate of subset of messages in deliver_get which had the redelivered flag set per second
shown as message
rabbitmq.exchange.messages.return_unroutable.count
(gauge)
Count of messages returned to publisher as unroutable
shown as message
rabbitmq.exchange.messages.return_unroutable.rate
(gauge)
Rate of messages returned to publisher as unroutable per second
shown as message
rabbitmq.exchange.messages.publish.count
(gauge)
Count of messages published
shown as message
rabbitmq.exchange.messages.publish.rate
(gauge)
Rate of messages published per second
shown as message
rabbitmq.exchange.messages.publish_in.count
(gauge)
Count of messages published from channels into this exchange
shown as message
rabbitmq.exchange.messages.publish_in.rate
(gauge)
Rate of messages published from channels into this exchange per sec
shown as message
rabbitmq.exchange.messages.publish_out.count
(gauge)
Count of messages published from this exchange into queues
shown as message
rabbitmq.exchange.messages.publish_out.rate
(gauge)
Rate of messages published from this exchange into queues per second
shown as message
rabbitmq.queue.active_consumers
(gauge)
Number of active consumers, consumers that can immediately receive any messages sent to the queue
rabbitmq.queue.bindings.count
(gauge)
Number of bindings for a specific queue
rabbitmq.queue.consumers
(gauge)
Number of consumers
rabbitmq.queue.consumer_utilisation
(gauge)
The ratio of time that a queue's consumers can take new messages
shown as fraction
rabbitmq.queue.memory
(gauge)
Bytes of memory consumed by the Erlang process associated with the queue, including stack, heap and internal structures
shown as byte
rabbitmq.queue.messages
(gauge)
Count of the total messages in the queue
shown as message
rabbitmq.queue.messages.rate
(gauge)
Count per second of the total messages in the queue
shown as message
rabbitmq.queue.messages_ready
(gauge)
Number of messages ready to be delivered to clients
shown as message
rabbitmq.queue.messages_ready.rate
(gauge)
Number per second of messages ready to be delivered to clients
shown as message
rabbitmq.queue.messages_unacknowledged
(gauge)
Number of messages delivered to clients but not yet acknowledged
shown as message
rabbitmq.queue.messages_unacknowledged.rate
(gauge)
Number per second of messages delivered to clients but not yet acknowledged
shown as message
rabbitmq.queue.messages.ack.count
(gauge)
Number of messages delivered to clients and acknowledged
shown as message
rabbitmq.queue.messages.ack.rate
(gauge)
Number per second of messages delivered to clients and acknowledged
shown as message
rabbitmq.queue.messages.deliver.count
(gauge)
Count of messages delivered in acknowledgement mode to consumers
shown as message
rabbitmq.queue.messages.deliver.rate
(gauge)
Count of messages delivered in acknowledgement mode to consumers
shown as message
rabbitmq.queue.messages.deliver_get.count
(gauge)
Sum of messages delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get.
shown as message
rabbitmq.queue.messages.deliver_get.rate
(gauge)
Rate per second of the sum of messages delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get.
shown as message
rabbitmq.queue.messages.publish.count
(gauge)
Count of messages published
shown as message
rabbitmq.queue.messages.publish.rate
(gauge)
Rate per second of messages published
shown as message
rabbitmq.queue.messages.redeliver.count
(gauge)
Count of subset of messages in deliver_get which had the redelivered flag set
shown as message
rabbitmq.queue.messages.redeliver.rate
(gauge)
Rate per second of subset of messages in deliver_get which had the redelivered flag set
shown as message
rabbitmq.connections
(gauge)
Number of current connections to a given rabbitmq vhost, tagged 'rabbitmq_vhost:<vhost_name>'
shown as connection
rabbitmq.connections.state
(gauge)
Number of connections in the specified connection state
shown as connection

The Agent tags rabbitmq.queue.* metrics by queue name, and rabbitmq.node.* metrics by node name.

Events

For performance reasons, the RabbitMQ check self-limits the number of queues and nodes it will collect metrics for. If and when the check nears this limit, it emits a warning-level event to your event stream.

See the example check configuration for details about these limits.

Service Checks

rabbitmq.aliveness:

The Agent submits this service check for all vhosts (if vhosts is not configured) OR a subset of vhosts (those configured in vhosts), tagging each service check vhost:<vhost_name>. Returns CRITICAL if the aliveness check failed, otherwise OK.

rabbitmq.status:

Returns CRITICAL if the Agent cannot connect to rabbitmq to collect metrics, otherwise OK.

Troubleshooting

Further Reading

Datadog Blog

Knowledge Base

  • By default, queue metrics are tagged by queue and node metrics are tagged by node. If you have a Datadog account you can see the integration installation instructions here


Mistake in the docs? Feel free to contribute!