Logging is here!

Zookeeper

Agent Check Agent Check

Supported OS: Linux Mac OS

Zookeeper Dashboard

Overview

The Zookeeper check tracks client connections and latencies, monitors the number of unprocessed requests, and more.

Setup

Installation

The Zookeeper check is included in the Datadog Agent package, so you don’t need to install anything else on your Zookeeper servers.

Configuration

  1. Edit the zk.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your Zookeeper metrics and logs. See the sample zk.d/conf.yaml for all available configuration options.

  2. Restart the Agent

Metric Collection

  • Add this configuration block to your zk.yaml file to start gathering your Zookeeper metrics:
init_config:

instances:
  - host: localhost
    port: 2181
    timeout: 3

Log Collection

Available for Agent >6.0

Zookeeper uses the log4j logger per default. To activate the logging into a file and customize the format edit the log4j.properties file:

# Set root logger level to INFO and its only appender to R
log4j.rootLogger=INFO, R
log4j.appender.R.File=/var/log/zookeeper.log
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p [%t] %c{1}:%L - %m%n

By default, our integration pipeline support the following conversion patterns:

  %d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
  %d [%t] %-5p %c - %m%n
  %r [%t] %p %c %x - %m%n

Make sure you clone and edit the integration pipeline if you have a different format.

  • Collecting logs is disabled by default in the Datadog Agent, enable it in your datadog.yaml file with:
  logs_enabled: true
  • Add this configuration block to your zk.yaml file to start collecting your Zookeeper Logs:
  logs:
    - type: file
      path: /var/log/zookeeper.log
      source: zookeeper
      service: myapp
      #To handle multi line that starts with yyyy-mm-dd use the following pattern
      #log_processing_rules:
      #  - type: multi_line
      #    name: log_start_with_date
      #    pattern: \d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])

Validation

Run the Agent’s status subcommand and look for zk under the Checks section.

Data Collected

Metrics

zookeeper.bytes_received
(gauge)
zookeeper.bytes_sent
(gauge)
zookeeper.bytes_outstanding
(gauge)
zookeeper.packets_received
(gauge)
The number of packets received.
shown as packet
zookeeper.packets_sent
(gauge)
The number of packets sent.
shown as packet
zookeeper.packets_received
(gauge)
The number of packets received.
shown as packet
zookeeper.packets.received
(gauge)
The number of packets received.
shown as packet
zookeeper.packets_sent
(gauge)
The number of packets sent.
shown as packet
zookeeper.packets.sent
(gauge)
The number of packets sent.
shown as packet
zookeeper.num_alive_connections
(gauge)
The total count of client connections.
shown as connection
zookeeper.connections
(gauge)
The total count of client connections.
shown as connection
zookeeper.datadog_client_exception
(rate)
The exception rate seen by the Datadog Agent when trying to collect stats.
shown as error
zookeeper.avg_latency
(gauge)
The amount of time it takes for the server to respond to a client request.
shown as millisecond
zookeeper.latency.avg
(gauge)
The amount of time it takes for the server to respond to a client request.
shown as millisecond
zookeeper.max_latency
(gauge)
The amount of time it takes for the server to respond to a client request.
shown as millisecond
zookeeper.latency.max
(gauge)
The amount of time it takes for the server to respond to a client request.
shown as millisecond
zookeeper.min_latency
(gauge)
The amount of time it takes for the server to respond to a client request.
shown as millisecond
zookeeper.latency.min
(gauge)
The amount of time it takes for the server to respond to a client request.
shown as millisecond
zookeeper.znode_count
(gauge)
The number of znodes in the ZooKeeper namespace (the data).
shown as node
zookeeper.nodes
(gauge)
The number of znodes in the ZooKeeper namespace (the data).
shown as node
zookeeper.outstanding_requests
(gauge)
The number of queued requests when the server is under load and is receiving more sustained requests than it can process.
shown as request
zookeeper.timeouts
(rate)
The rate of timeouts the Datadog Agent received when trying to collect stats.
shown as occurrence
zookeeper.zxid.count
(gauge)
zookeeper.zxid.epoch
(gauge)
zookeeper.instances
(gauge)
zookeeper.server_state
(gauge)
zookeeper.watch_count
(gauge)
zookeeper.ephemerals_count
(gauge)
zookeeper.approximate_data_size
(gauge)
zookeeper.open_file_descriptor_count
(gauge)
zookeeper.max_file_descriptor_count
(gauge)

Deprecated metrics

Following metrics are still sent but will be removed eventually: * zookeeper.bytes_received * zookeeper.bytes_sent * zookeeper.bytes_outstanding

Events

The Zookeeper check does not include any events at this time.

Service Checks

zookeeper.ruok:

Sends ruok to the monitored node. Returns OK with an imok response, WARN in the case of a different response and CRITICAL if no response is received..

zookeeper.mode:

The Agent submits this service check if expected_mode is configured in zk.yaml. The check returns OK when Zookeeper’s actual mode matches expected_mode, otherwise CRITICAL.

Troubleshooting

Need help? Contact Datadog Support.


Mistake in the docs? Feel free to contribute!