Datadog-etcd Integration

Etcd Graph

Overview

Collect etcd metrics to:

  • Monitor the health of your etcd cluster.
  • Know when host configurations may be out of sync.
  • Correlate the performance of etcd with the rest of your applications.

Setup

Installation

The etcd check is packaged with the Agent, so simply install the Agent on your etcd instance(s).

Configuration

Create a file etcd.yaml in the Agent’s conf.d directory. See the sample etcd.yaml for all available configuration options:

init_config:

instances:
  - url: "https://server:port" # API endpoint of your etcd instance

Restart the Agent to begin sending etcd metrics to Datadog.

Validation

Run the Agent’s info subcommand and look for etcd under the Checks section:

  Checks
  ======
    [...]

    etcd
    -------
      - instance #0 [OK]
      - Collected 26 metrics, 0 events & 0 service checks

    [...]

Compatibility

The etcd check is compatible with all major platforms.

Data Collected

Metrics

etcd.store.gets.success
(gauge)
Rate of successful get requests
shown as request
etcd.store.gets.fail
(gauge)
Rate of failed get requests
shown as request
etcd.store.sets.success
(gauge)
Rate of successful set requests
shown as request
etcd.store.sets.fail
(gauge)
Rate of failed set requests
shown as request
etcd.store.delete.success
(gauge)
Rate of successful delete requests
shown as request
etcd.store.delete.fail
(gauge)
Rate of failed delete requests
shown as request
etcd.store.update.success
(gauge)
Rate of successful update requests
shown as request
etcd.store.update.fail
(gauge)
Rate of failed update requests
shown as request
etcd.store.create.success
(gauge)
Rate of successful create requests
shown as request
etcd.store.create.fail
(gauge)
Rate of failed create requests
shown as request
etcd.store.compareandswap.success
(gauge)
Rate of compare and swap requests success
shown as request
etcd.store.compareandswap.fail
(gauge)
Rate of compare and swap requests failure
shown as request
etcd.store.compareanddelete.success
(gauge)
Rate of compare and delete requests success
shown as request
etcd.store.compareanddelete.fail
(gauge)
Rate of compare and delete requests failure
shown as request
etcd.store.expire.count
(gauge)
Rate of expired keys
shown as eviction
etcd.store.watchers
(gauge)
Rate of watchers
shown as
etcd.self.send.pkgrate
(gauge)
Rate of packets received
shown as packet
etcd.self.send.bandwidthrate
(gauge)
Rate of bytes received
shown as byte
etcd.self.recv.pkgrate
(gauge)
Rate of packets sent
shown as packet
etcd.self.recv.bandwidthrate
(gauge)
Rate of bytes sent
shown as byte
etcd.self.recv.appendrequest.count
(gauge)
Rate of append requests this node has processed
shown as request
etcd.self.send.appendrequest.count
(gauge)
Rate of append requests this node has sent
shown as request
etcd.leader.counts.fail
(gauge)
Rate of failed Raft RPC requests
shown as request
etcd.leader.counts.success
(gauge)
Rate of successful Raft RPC requests
shown as request
etcd.leader.latency.current
(gauge)
Current latency to each peer in the cluster
shown as millisecond
etcd.leader.latency.avg
(gauge)
Average latency to each peer in the cluster
shown as millisecond
etcd.leader.latency.min
(gauge)
Minimum latency to each peer in the cluster
shown as millisecond
etcd.leader.latency.max
(gauge)
Maximum latency to each peer in the cluster
shown as millisecond
etcd.leader.latency.stddev
(gauge)
Standard deviation latency to each peer in the cluster
shown as millisecond

etcd metrics are tagged with etcd_state:leader or etcd_state:follower, depending on the node status, so you can easily aggregate metrics by status.

Events

The Etcd check does not include any event at this time.

Service Checks

etcd.can_connect:

Returns ‘Critical’ if the Agent cannot collect metrics from your etcd API endpoint.

Troubleshooting

Need help? Contact Datadog Support.

Further Reading

To get a better idea of how (or why) to integrate etcd with Datadog, check out our blog post about it.