Network Performance Monitoring is now generally available! Network Monitoring is now available!

ZooKeeper

Agent Check Agent Check

Supported OS: Linux Mac OS

Zookeeper Dashboard

Overview

The Zookeeper check tracks client connections and latencies, monitors the number of unprocessed requests, and more.

Setup

Installation

The Zookeeper check is included in the Datadog Agent package, so you don’t need to install anything else on your Zookeeper servers.

Configuration

Zookeepr whitelist

As of version 3.5, Zookeeper has a 4lw.commands.whitelist parameter (see Zookeeper documentation) that whitelists four letter word commands. By default, only srvr is whitelisted. Add stat and mntr to the whitelist, as the integration is based on these commands.

Host

Follow the instructions below to configure this check for an Agent running on a host. For containerized environments, see the Containerized section.

  1. Edit the zk.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your Zookeeper metrics and logs. See the sample zk.d/conf.yaml for all available configuration options.

  2. Restart the Agent.

Log collection

Available for Agent >6.0

  1. Zookeeper uses the log4j logger per default. To activate the logging into a file and customize the format edit the log4j.properties file:

      # Set root logger level to INFO and its only appender to R
      log4j.rootLogger=INFO, R
      log4j.appender.R.File=/var/log/zookeeper.log
      log4j.appender.R.layout=org.apache.log4j.PatternLayout
      log4j.appender.R.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p [%t] %c{1}:%L - %m%n
    
  2. By default, our integration pipeline support the following conversion patterns:

      %d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
      %d [%t] %-5p %c - %m%n
      %r [%t] %p %c %x - %m%n
    

    Make sure you clone and edit the integration pipeline if you have a different format.

  3. Collecting logs is disabled by default in the Datadog Agent, enable it in your datadog.yaml file:

      logs_enabled: true
  4. Uncomment and edit this configuration block at the bottom of your zk.d/conf.yaml:

      logs:
        - type: file
          path: /var/log/zookeeper.log
          source: zookeeper
          service: myapp
          #To handle multi line that starts with yyyy-mm-dd use the following pattern
          #log_processing_rules:
          #  - type: multi_line
          #    name: log_start_with_date
          #    pattern: \d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])

    Change the path and service parameter values and configure them for your environment. See the sample zk.d/conf.yaml for all available configuration options.

  5. Restart the Agent.

Containerized

For containerized environments, see the Autodiscovery Integration Templates for guidance on applying the parameters below.

Metric collection
ParameterValue
<INTEGRATION_NAME>zk
<INIT_CONFIG>blank or {}
<INSTANCE_CONFIG>{"host": "%%host%%", "port": "2181"}
Log collection

Available for Agent v6.5+

Collecting logs is disabled by default in the Datadog Agent. To enable it, see Docker log collection.

ParameterValue
<LOG_CONFIG>{"source": "zk", "service": "<SERVICE_NAME>"}

Validation

Run the Agent’s status subcommand and look for zk under the Checks section.

Data Collected

Metrics

zookeeper.approximate_data_size
(gauge)
zookeeper.avg_latency
(gauge)
The amount of time it takes for the server to respond to a client request.
Shown as millisecond
zookeeper.bytes_received
(gauge)
Number of bytes received
zookeeper.bytes_sent
(gauge)
Number of bytes sent
zookeeper.connections
(gauge)
The total count of client connections.
Shown as connection
zookeeper.datadog_client_exception
(rate)
The exception rate seen by the Datadog Agent when trying to collect stats.
Shown as error
zookeeper.ephemerals_count
(gauge)
zookeeper.instances
(gauge)
zookeeper.latency.avg
(gauge)
The amount of time it takes for the server to respond to a client request.
Shown as millisecond
zookeeper.latency.max
(gauge)
The amount of time it takes for the server to respond to a client request.
Shown as millisecond
zookeeper.latency.min
(gauge)
The amount of time it takes for the server to respond to a client request.
Shown as millisecond
zookeeper.max_file_descriptor_count
(gauge)
zookeeper.max_latency
(gauge)
The amount of time it takes for the server to respond to a client request.
Shown as millisecond
zookeeper.min_latency
(gauge)
The amount of time it takes for the server to respond to a client request.
Shown as millisecond
zookeeper.nodes
(gauge)
The number of znodes in the ZooKeeper namespace (the data).
Shown as node
zookeeper.num_alive_connections
(gauge)
The total count of client connections.
Shown as connection
zookeeper.open_file_descriptor_count
(gauge)
zookeeper.outstanding_requests
(gauge)
The number of queued requests when the server is under load and is receiving more sustained requests than it can process.
Shown as request
zookeeper.packets.received
(gauge)
The number of packets received.
Shown as packet
zookeeper.packets.sent
(gauge)
The number of packets sent.
Shown as packet
zookeeper.packets_received
(gauge)
The number of packets received.
Shown as packet
zookeeper.packets_sent
(gauge)
The number of packets sent.
Shown as packet
zookeeper.server_state
(gauge)
zookeeper.timeouts
(rate)
The rate of timeouts the Datadog Agent received when trying to collect stats.
Shown as occurrence
zookeeper.watch_count
(gauge)
zookeeper.znode_count
(gauge)
The number of znodes in the ZooKeeper namespace (the data).
Shown as node
zookeeper.zxid.count
(gauge)
zookeeper.zxid.epoch
(gauge)

Deprecated metrics

Following metrics are still sent but will be removed eventually:

  • zookeeper.bytes_received
  • zookeeper.bytes_sent

Events

The Zookeeper check does not include any events.

Service Checks

zookeeper.ruok:
Sends ruok to the monitored node. Returns OK with an imok response, WARN in the case of a different response and CRITICAL if no response is received..

zookeeper.mode:
The Agent submits this service check if expected_mode is configured in zk.yaml. The check returns OK when Zookeeper’s actual mode matches expected_mode, otherwise returns CRITICAL.

Troubleshooting

Need help? Contact Datadog support.


Mistake in the docs? Feel free to contribute!