Logging is here!

etcd

Agent Check Agent Check

Supported OS: Linux Mac OS Windows

Etcd Dashboard

Overview

Collect etcd metrics to:

  • Monitor the health of your etcd cluster.
  • Know when host configurations may be out of sync.
  • Correlate the performance of etcd with the rest of your applications.

Setup

Installation

The etcd check is included in the Datadog Agent package, so you don’t need to install anything else on your etcd instance(s).

Configuration

  1. Edit the etcd.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your etcd performance data. See the sample etcd.d/conf.yaml for all available configuration options.

    init_config:
    
    instances:
        - url: "https://server:port" # API endpoint of your etcd instance
  2. Restart the Agent

Validation

Run the Agent’s status subcommand and look for etcd under the Checks section.

Data Collected

Metrics

etcd.store.gets.success
(gauge)
Rate of successful get requests
shown as request
etcd.store.gets.fail
(gauge)
Rate of failed get requests
shown as request
etcd.store.sets.success
(gauge)
Rate of successful set requests
shown as request
etcd.store.sets.fail
(gauge)
Rate of failed set requests
shown as request
etcd.store.delete.success
(gauge)
Rate of successful delete requests
shown as request
etcd.store.delete.fail
(gauge)
Rate of failed delete requests
shown as request
etcd.store.update.success
(gauge)
Rate of successful update requests
shown as request
etcd.store.update.fail
(gauge)
Rate of failed update requests
shown as request
etcd.store.create.success
(gauge)
Rate of successful create requests
shown as request
etcd.store.create.fail
(gauge)
Rate of failed create requests
shown as request
etcd.store.compareandswap.success
(gauge)
Rate of compare and swap requests success
shown as request
etcd.store.compareandswap.fail
(gauge)
Rate of compare and swap requests failure
shown as request
etcd.store.compareanddelete.success
(gauge)
Rate of compare and delete requests success
shown as request
etcd.store.compareanddelete.fail
(gauge)
Rate of compare and delete requests failure
shown as request
etcd.store.expire.count
(gauge)
Rate of expired keys
shown as eviction
etcd.store.watchers
(gauge)
Rate of watchers
etcd.self.send.pkgrate
(gauge)
Rate of packets sent
shown as packet
etcd.self.send.bandwidthrate
(gauge)
Rate of bytes sent
shown as byte
etcd.self.recv.pkgrate
(gauge)
Rate of packets received
shown as packet
etcd.self.recv.bandwidthrate
(gauge)
Rate of bytes received
shown as byte
etcd.self.recv.appendrequest.count
(gauge)
Rate of append requests this node has processed
shown as request
etcd.self.send.appendrequest.count
(gauge)
Rate of append requests this node has sent
shown as request
etcd.leader.counts.fail
(gauge)
Rate of failed Raft RPC requests
shown as request
etcd.leader.counts.success
(gauge)
Rate of successful Raft RPC requests
shown as request
etcd.leader.latency.current
(gauge)
Current latency to each peer in the cluster
shown as millisecond
etcd.leader.latency.avg
(gauge)
Average latency to each peer in the cluster
shown as millisecond
etcd.leader.latency.min
(gauge)
Minimum latency to each peer in the cluster
shown as millisecond
etcd.leader.latency.max
(gauge)
Maximum latency to each peer in the cluster
shown as millisecond
etcd.leader.latency.stddev
(gauge)
Standard deviation latency to each peer in the cluster
shown as millisecond

etcd metrics are tagged with etcd_state:leader or etcd_state:follower, depending on the node status, so you can easily aggregate metrics by status.

Events

The Etcd check does not include any events at this time.

Service Checks

etcd.can_connect:

Returns ‘Critical’ if the Agent cannot collect metrics from your etcd API endpoint.

etcd.healthy:

Returns ‘Critical’ if a member node is not healthy. Returns ‘Unknown’ if the Agent can’t reach the /health endpoint, or if the health status is missing.

Troubleshooting

Need help? Contact Datadog Support.

Further Reading

To get a better idea of how (or why) to integrate etcd with Datadog, check out our blog post about it.


Mistake in the docs? Feel free to contribute!