---
title: etcd
description: Track writes, updates, deletes, inter-node latencies, and more Etcd metrics.
breadcrumbs: Docs > Integrations > etcd
---

# etcd
Supported OS Integration version9.4.0


## Overview{% #overview %}

Collect Etcd metrics to:

- Monitor the health of your Etcd cluster.
- Know when host configurations may be out of sync.
- Correlate the performance of Etcd with the rest of your applications.

**Minimum Agent version:** 6.0.0

## Setup{% #setup %}

### Installation{% #installation %}

The Etcd check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package, so you don't need to install anything else on your Etcd instance(s).

### Configuration{% #configuration %}

{% tab title="Host" %}
#### Host{% #host %}

To configure this check for an Agent running on a host:

##### Metric collection{% #metric-collection %}

1. Edit the `etcd.d/conf.yaml` file, in the `conf.d/` folder at the root of your [Agent's configuration directory](https://docs.datadoghq.com/agent/guide/agent-configuration-files.md#agent-configuration-directory) to start collecting your Etcd performance data. See the [sample etcd.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/etcd/datadog_checks/etcd/data/conf.yaml.example) for all available configuration options.
1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands.md#start-stop-and-restart-the-agent)

##### Log collection{% #log-collection %}

1. Collecting logs is disabled by default in the Datadog Agent, enable it in your `datadog.yaml` file:

   ```yaml
   logs_enabled: true
   ```

1. Uncomment and edit this configuration block at the bottom of your `etcd.d/conf.yaml`:

   ```yaml
   logs:
     - type: file
       path: "<LOG_FILE_PATH>"
       source: etcd
       service: "<SERVICE_NAME>"
   ```

Change the `path` and `service` parameter values based on your environment. See the [sample etcd.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/etcd/datadog_checks/etcd/data/conf.yaml.example) for all available configuration options.

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands.md#start-stop-and-restart-the-agent).

{% /tab %}

{% tab title="Containerized" %}
#### Containerized{% #containerized %}

For containerized environments, see the [Autodiscovery Integration Templates](https://docs.datadoghq.com/agent/kubernetes/integrations.md) for guidance on applying the parameters below.

##### Metric collection{% #metric-collection %}

| Parameter            | Value                                                |
| -------------------- | ---------------------------------------------------- |
| `<INTEGRATION_NAME>` | `etcd`                                               |
| `<INIT_CONFIG>`      | blank or `{}`                                        |
| `<INSTANCE_CONFIG>`  | `{"prometheus_url": "http://%%host%%:2379/metrics"}` |

##### Log collection{% #log-collection %}

Collecting logs is disabled by default in the Datadog Agent. To enable it, see [Kubernetes log collection](https://docs.datadoghq.com/agent/kubernetes/log.md).

| Parameter      | Value                                             |
| -------------- | ------------------------------------------------- |
| `<LOG_CONFIG>` | `{"source": "etcd", "service": "<SERVICE_NAME>"}` |

{% /tab %}

### Validation{% #validation %}

[Run the Agent's `status` subcommand](https://docs.datadoghq.com/agent/guide/agent-commands.md#agent-status-and-information) and look for `etcd` under the Checks section.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **etcd.debugging.mvcc.db.compaction.keys.total**(count)                     | Total number of db keys compacted.*Shown as key*                                                                  |
| **etcd.debugging.mvcc.db.compaction.pause.duration.milliseconds**(gauge)    | Bucketed histogram of db compaction pause duration.*Shown as millisecond*                                         |
| **etcd.debugging.mvcc.db.compaction.total.duration.milliseconds**(gauge)    | Bucketed histogram of db compaction total duration.*Shown as millisecond*                                         |
| **etcd.debugging.mvcc.db.total.size.in\_bytes**(gauge)                      | Total size of the underlying database in bytes.*Shown as byte*                                                    |
| **etcd.debugging.mvcc.delete.total**(count)                                 | Total number of deletes seen by this member.*Shown as query*                                                      |
| **etcd.debugging.mvcc.events.total**(count)                                 | Total number of events sent by this member.*Shown as event*                                                       |
| **etcd.debugging.mvcc.index.compaction.pause.duration.milliseconds**(gauge) | Bucketed histogram of index compaction pause duration.*Shown as millisecond*                                      |
| **etcd.debugging.mvcc.keys.total**(gauge)                                   | Total number of keys.*Shown as key*                                                                               |
| **etcd.debugging.mvcc.pending.events.total**(gauge)                         | Total number of pending events to be sent.*Shown as event*                                                        |
| **etcd.debugging.mvcc.put.total**(count)                                    | Total number of puts seen by this member.*Shown as query*                                                         |
| **etcd.debugging.mvcc.range.total**(count)                                  | Total number of ranges seen by this member.*Shown as query*                                                       |
| **etcd.debugging.mvcc.slow\_watcher.total**(gauge)                          | Total number of unsynced slow watchers.*Shown as connection*                                                      |
| **etcd.debugging.mvcc.txn.total**(count)                                    | Total number of txns seen by this member.*Shown as transaction*                                                   |
| **etcd.debugging.mvcc.watch\_stream.total**(gauge)                          | Total number of watch streams.*Shown as connection*                                                               |
| **etcd.debugging.mvcc.watcher.total**(gauge)                                | Total number of watchers.*Shown as connection*                                                                    |
| **etcd.debugging.server.lease.expired.total**(count)                        | The total number of expired leases.*Shown as item*                                                                |
| **etcd.debugging.snap.save.marshalling.duration.seconds**(gauge)            | The marshalling cost distributions of save called by snapshot.*Shown as second*                                   |
| **etcd.debugging.snap.save.total.duration.seconds**(gauge)                  | The total latency distributions of save called by snapshot.*Shown as second*                                      |
| **etcd.debugging.store.expires.total**(count)                               | Total number of expired keys.*Shown as key*                                                                       |
| **etcd.debugging.store.reads.total**(count)                                 | Total number of reads action by (get/getRecursive), local to this member.*Shown as read*                          |
| **etcd.debugging.store.watch.requests.total**(count)                        | Total number of incoming watch requests (new or reestablished).*Shown as request*                                 |
| **etcd.debugging.store.watchers**(gauge)                                    | Count of currently active watchers.*Shown as connection*                                                          |
| **etcd.debugging.store.writes.total**(count)                                | Total number of writes (e.g. set/compareAndDelete) seen by this member.*Shown as write*                           |
| **etcd.disk.backend.commit.duration.seconds**(gauge)                        | The latency distributions of commit called by backend.*Shown as second*                                           |
| **etcd.disk.backend.snapshot.duration.seconds**(gauge)                      | The latency distribution of backend snapshots.*Shown as second*                                                   |
| **etcd.disk.wal.fsync.duration.seconds.count**(count)                       | The count of latency distributions of fsync called by wal.*Shown as second*                                       |
| **etcd.disk.wal.fsync.duration.seconds.sum**(gauge)                         | The sum of latency distributions of fsync called by wal.*Shown as second*                                         |
| **etcd.disk.wal.write.bytes.total**(gauge)                                  | Total number of bytes written in WAL*Shown as byte*                                                               |
| **etcd.etcd.server.client.requests.total**(count)                           | The total number of client requests per client version*Shown as request*                                          |
| **etcd.go.gc.duration.seconds**(gauge)                                      | A summary of the GC invocation durations.*Shown as second*                                                        |
| **etcd.go.goroutines**(gauge)                                               | Number of goroutines that currently exist.*Shown as thread*                                                       |
| **etcd.go.info**(gauge)                                                     | Information about the Go environment.*Shown as item*                                                              |
| **etcd.go.memstats.alloc.bytes**(gauge)                                     | Number of bytes allocated and still in use.*Shown as byte*                                                        |
| **etcd.go.memstats.alloc.bytes.total**(count)                               | Total number of bytes allocated, even if freed.*Shown as byte*                                                    |
| **etcd.go.memstats.buck.hash.sys.bytes**(gauge)                             | Number of bytes used by the profiling bucket hash table.*Shown as byte*                                           |
| **etcd.go.memstats.frees.total**(count)                                     | Total number of frees.*Shown as occurrence*                                                                       |
| **etcd.go.memstats.gc.cpu.fraction**(gauge)                                 | The fraction of this program's available CPU time used by the GC since the program started.*Shown as cpu*         |
| **etcd.go.memstats.gc.sys.bytes**(gauge)                                    | Number of bytes used for garbage collection system metadata.*Shown as byte*                                       |
| **etcd.go.memstats.heap.alloc.bytes**(gauge)                                | Number of heap bytes allocated and still in use.*Shown as byte*                                                   |
| **etcd.go.memstats.heap.idle.bytes**(gauge)                                 | Number of heap bytes waiting to be used.*Shown as byte*                                                           |
| **etcd.go.memstats.heap.inuse.bytes**(gauge)                                | Number of heap bytes that are in use.*Shown as byte*                                                              |
| **etcd.go.memstats.heap.objects**(gauge)                                    | Number of allocated objects.*Shown as item*                                                                       |
| **etcd.go.memstats.heap.released.bytes**(gauge)                             | Number of heap bytes released to OS.*Shown as byte*                                                               |
| **etcd.go.memstats.heap.sys.bytes**(gauge)                                  | Number of heap bytes obtained from system.*Shown as byte*                                                         |
| **etcd.go.memstats.last.gc.time.seconds**(gauge)                            | Number of seconds since 1970 of last garbage collection.*Shown as second*                                         |
| **etcd.go.memstats.lookups.total**(count)                                   | Total number of pointer lookups.*Shown as occurrence*                                                             |
| **etcd.go.memstats.mallocs.total**(count)                                   | Total number of mallocs.*Shown as occurrence*                                                                     |
| **etcd.go.memstats.mcache.inuse.bytes**(gauge)                              | Number of bytes in use by mcache structures.*Shown as byte*                                                       |
| **etcd.go.memstats.mcache.sys.bytes**(gauge)                                | Number of bytes used for mcache structures obtained from system.*Shown as byte*                                   |
| **etcd.go.memstats.mspan.inuse.bytes**(gauge)                               | Number of bytes in use by mspan structures.*Shown as byte*                                                        |
| **etcd.go.memstats.mspan.sys.bytes**(gauge)                                 | Number of bytes used for mspan structures obtained from system.*Shown as byte*                                    |
| **etcd.go.memstats.next.gc.bytes**(gauge)                                   | Number of heap bytes when next garbage collection will take place.*Shown as byte*                                 |
| **etcd.go.memstats.other.sys.bytes**(gauge)                                 | Number of bytes used for other system allocations.*Shown as byte*                                                 |
| **etcd.go.memstats.stack.inuse.bytes**(gauge)                               | Number of bytes in use by the stack allocator.*Shown as byte*                                                     |
| **etcd.go.memstats.stack.sys.bytes**(gauge)                                 | Number of bytes obtained from system for stack allocator.*Shown as byte*                                          |
| **etcd.go.memstats.sys.bytes**(gauge)                                       | Number of bytes obtained from system.*Shown as byte*                                                              |
| **etcd.go.threads**(gauge)                                                  | Number of OS threads created.*Shown as thread*                                                                    |
| **etcd.grpc.proxy.cache.hits.total**(gauge)                                 | Total number of cache hits*Shown as occurrence*                                                                   |
| **etcd.grpc.proxy.cache.keys.total**(gauge)                                 | Total number of keys/ranges cached*Shown as item*                                                                 |
| **etcd.grpc.proxy.cache.misses.total**(gauge)                               | Total number of cache misses*Shown as occurrence*                                                                 |
| **etcd.grpc.proxy.events.coalescing.total**(count)                          | Total number of events coalescing*Shown as event*                                                                 |
| **etcd.grpc.proxy.watchers.coalescing.total**(gauge)                        | Total number of current watchers coalescing*Shown as connection*                                                  |
| **etcd.grpc.server.handled.total**(count)                                   | Total number of RPCs completed on the server, regardless of success or failure.*Shown as operation*               |
| **etcd.grpc.server.msg.received.total**(count)                              | Total number of RPC stream messages received on the server.*Shown as operation*                                   |
| **etcd.grpc.server.msg.sent.total**(count)                                  | Total number of gRPC stream messages sent by the server.*Shown as operation*                                      |
| **etcd.grpc.server.started.total**(count)                                   | Total number of RPCs started on the server.*Shown as operation*                                                   |
| **etcd.leader.counts.fail**(gauge)                                          | Rate of failed Raft RPC requests (ETCD API V2 only)*Shown as request*                                             |
| **etcd.leader.counts.success**(gauge)                                       | Rate of successful Raft RPC requests (ETCD API V2 only)*Shown as request*                                         |
| **etcd.leader.latency.avg**(gauge)                                          | Average latency to each peer in the cluster (ETCD API V2 only)*Shown as millisecond*                              |
| **etcd.leader.latency.current**(gauge)                                      | Current latency to each peer in the cluster (ETCD API V2 only)*Shown as millisecond*                              |
| **etcd.leader.latency.max**(gauge)                                          | Maximum latency to each peer in the cluster (ETCD API V2 only)*Shown as millisecond*                              |
| **etcd.leader.latency.min**(gauge)                                          | Minimum latency to each peer in the cluster (ETCD API V2 only)*Shown as millisecond*                              |
| **etcd.leader.latency.stddev**(gauge)                                       | Standard deviation latency to each peer in the cluster (ETCD API V2 only)*Shown as millisecond*                   |
| **etcd.mvcc.db.total.size.in\_use.bytes**(gauge)                            | Total size of the underlying database logically in use*Shown as byte*                                             |
| **etcd.network.active\_peers**(gauge)                                       | The current number of active peer connections*Shown as connection*                                                |
| **etcd.network.client.grpc.received.bytes.total**(count)                    | The total number of bytes received from grpc clients.*Shown as byte*                                              |
| **etcd.network.client.grpc.sent.bytes.total**(count)                        | The total number of bytes sent to grpc clients.*Shown as byte*                                                    |
| **etcd.network.disconnected\_peers.total**(count)                           | The total number of disconnected peers*Shown as connection*                                                       |
| **etcd.network.peer.received.bytes.total**(count)                           | The total number of bytes received from peers.*Shown as byte*                                                     |
| **etcd.network.peer.received.failures.total**(count)                        | The total number of receive failures from peers*Shown as event*                                                   |
| **etcd.network.peer.round\_trip\_time.seconds**(gauge)                      | Round-Trip-Time histogram between peers.*Shown as second*                                                         |
| **etcd.network.peer.sent.bytes.total**(count)                               | The total number of bytes sent to peers.*Shown as byte*                                                           |
| **etcd.network.peer.sent.failures.total**(count)                            | The total number of send failures from peers*Shown as event*                                                      |
| **etcd.network.snapshot.receive.failures.total**(count)                     | Total number of snapshot receive failures*Shown as event*                                                         |
| **etcd.network.snapshot.receive.inflights.total**(gauge)                    | Total number of inflight snapshot sends*Shown as event*                                                           |
| **etcd.network.snapshot.receive.success.total**(count)                      | Total number of successful snapshot receives*Shown as event*                                                      |
| **etcd.network.snapshot.receive.total.duration.seconds.count**(gauge)       | Total latency distributions of v3 snapshot receives*Shown as second*                                              |
| **etcd.network.snapshot.receive.total.duration.seconds.sum**(gauge)         | Total latency distributions of v3 snapshot receives*Shown as second*                                              |
| **etcd.network.snapshot.send.failures.total**(count)                        | The total number of send failures from peers*Shown as event*                                                      |
| **etcd.network.snapshot.send.inflights.total**(gauge)                       | Total number of inflight snapshot receives*Shown as event*                                                        |
| **etcd.network.snapshot.send.sucess.total**(count)                          | Total number of successful snapshot sends*Shown as event*                                                         |
| **etcd.network.snapshot.send.total.duration.seconds.count**(gauge)          | Total latency distributions of v3 snapshot sends*Shown as second*                                                 |
| **etcd.network.snapshot.send.total.duration.seconds.sum**(gauge)            | Total latency distributions of v3 snapshot sends*Shown as second*                                                 |
| **etcd.os.fd.limit**(gauge)                                                 | The file descriptor limit*Shown as object*                                                                        |
| **etcd.os.fd.used**(gauge)                                                  | The number of used file descriptors*Shown as object*                                                              |
| **etcd.process.cpu.seconds.total**(count)                                   | Total user and system CPU time spent in seconds.*Shown as cpu*                                                    |
| **etcd.process.max.fds**(gauge)                                             | Maximum number of open file descriptors.*Shown as item*                                                           |
| **etcd.process.open.fds**(gauge)                                            | Number of open file descriptors.*Shown as item*                                                                   |
| **etcd.process.resident.memory.bytes**(gauge)                               | Resident memory size in bytes.*Shown as byte*                                                                     |
| **etcd.process.start.time.seconds**(gauge)                                  | Start time of the process since unix epoch in seconds.*Shown as second*                                           |
| **etcd.process.virtual.memory.bytes**(gauge)                                | Virtual memory size in bytes.*Shown as byte*                                                                      |
| **etcd.self.recv.appendrequest.count**(gauge)                               | Rate of append requests this node has processed (ETCD API V2 only)*Shown as request*                              |
| **etcd.self.recv.bandwidthrate**(gauge)                                     | Rate of bytes received (ETCD API V2 only)*Shown as byte*                                                          |
| **etcd.self.recv.pkgrate**(gauge)                                           | Rate of packets received (ETCD API V2 only)*Shown as packet*                                                      |
| **etcd.self.send.appendrequest.count**(gauge)                               | Rate of append requests this node has sent (ETCD API V2 only)*Shown as request*                                   |
| **etcd.self.send.bandwidthrate**(gauge)                                     | Rate of bytes sent (ETCD API V2 only)*Shown as byte*                                                              |
| **etcd.self.send.pkgrate**(gauge)                                           | Rate of packets sent (ETCD API V2 only)*Shown as packet*                                                          |
| **etcd.server.apply.slow.total**(count)                                     | The total number of slow apply requests (likely overloaded from slow disk)*Shown as request*                      |
| **etcd.server.go\_version**(gauge)                                          | Which Go version server is running with. 1 with label with current version*Shown as unit*                         |
| **etcd.server.has\_leader**(gauge)                                          | Whether or not a leader exists. 1 is existence, 0 is not.*Shown as check*                                         |
| **etcd.server.health.failures.total**(count)                                | The total number of failed health checks*Shown as event*                                                          |
| **etcd.server.health.success.total**(count)                                 | The total number of successful health checks*Shown as event*                                                      |
| **etcd.server.heartbeat.send.failures.total**(count)                        | The total number of leader heartbeat send failures (likely overloaded from slow disk)*Shown as event*             |
| **etcd.server.is\_leader**(gauge)                                           | Whether or not this member is a leader. 1 if is, 0 otherwise.*Shown as check*                                     |
| **etcd.server.leader.changes.seen.total**(count)                            | The number of leader changes seen.*Shown as event*                                                                |
| **etcd.server.lease.expired.total**(count)                                  | The total number of expired leases*Shown as occurrence*                                                           |
| **etcd.server.proposals.applied.total**(gauge)                              | The total number of consensus proposals applied.*Shown as occurrence*                                             |
| **etcd.server.proposals.committed.total**(gauge)                            | The total number of consensus proposals committed.*Shown as occurrence*                                           |
| **etcd.server.proposals.failed.total**(count)                               | The total number of failed proposals seen.*Shown as occurrence*                                                   |
| **etcd.server.proposals.pending**(gauge)                                    | The current number of pending proposals to commit.*Shown as occurrence*                                           |
| **etcd.server.quota.backend.bytes**(gauge)                                  | Current backend storage quota size in bytes*Shown as byte*                                                        |
| **etcd.server.read\_indexes.failed.total**(count)                           | The total number of failed read indexes seen*Shown as event*                                                      |
| **etcd.server.read\_indexes.slow.total**(count)                             | The total number of pending read indexes not in sync with leader or timed out read index requests*Shown as event* |
| **etcd.server.version**(gauge)                                              | Which version is running. 1 for 'server_version' label with current version.*Shown as item*                       |
| **etcd.snap.db.fsync.duration.seconds.count**(gauge)                        | The latency distributions of fsyncing .snap.db file*Shown as second*                                              |
| **etcd.snap.db.fsync.duration.seconds.sum**(gauge)                          | The latency distributions of fsyncing .snap.db file*Shown as second*                                              |
| **etcd.snap.db.save.total.duration.seconds.count**(gauge)                   | The total latency distributions of v3 snapshot save*Shown as second*                                              |
| **etcd.snap.db.save.total.duration.seconds.sum**(gauge)                     | The total latency distributions of v3 snapshot save*Shown as second*                                              |
| **etcd.snap.fsync.duration.seconds.count**(gauge)                           | The latency distributions of fsync called by snap*Shown as second*                                                |
| **etcd.snap.fsync.duration.seconds.sum**(gauge)                             | The latency distributions of fsync called by snap*Shown as second*                                                |
| **etcd.store.compareanddelete.fail**(gauge)                                 | Rate of compare and delete requests failure (ETCD API V2 only)*Shown as request*                                  |
| **etcd.store.compareanddelete.success**(gauge)                              | Rate of compare and delete requests success (ETCD API V2 only)*Shown as request*                                  |
| **etcd.store.compareandswap.fail**(gauge)                                   | Rate of compare and swap requests failure (ETCD API V2 only)*Shown as request*                                    |
| **etcd.store.compareandswap.success**(gauge)                                | Rate of compare and swap requests success (ETCD API V2 only)*Shown as request*                                    |
| **etcd.store.create.fail**(gauge)                                           | Rate of failed create requests (ETCD API V2 only)*Shown as request*                                               |
| **etcd.store.create.success**(gauge)                                        | Rate of successful create requests (ETCD API V2 only)*Shown as request*                                           |
| **etcd.store.delete.fail**(gauge)                                           | Rate of failed delete requests (ETCD API V2 only)*Shown as request*                                               |
| **etcd.store.delete.success**(gauge)                                        | Rate of successful delete requests (ETCD API V2 only)*Shown as request*                                           |
| **etcd.store.expire.count**(gauge)                                          | Rate of expired keys (ETCD API V2 only)*Shown as eviction*                                                        |
| **etcd.store.gets.fail**(gauge)                                             | Rate of failed get requests (ETCD API V2 only)*Shown as request*                                                  |
| **etcd.store.gets.success**(gauge)                                          | Rate of successful get requests (ETCD API V2 only)*Shown as request*                                              |
| **etcd.store.sets.fail**(gauge)                                             | Rate of failed set requests (ETCD API V2 only)*Shown as request*                                                  |
| **etcd.store.sets.success**(gauge)                                          | Rate of successful set requests (ETCD API V2 only)*Shown as request*                                              |
| **etcd.store.update.fail**(gauge)                                           | Rate of failed update requests (ETCD API V2 only)*Shown as request*                                               |
| **etcd.store.update.success**(gauge)                                        | Rate of successful update requests (ETCD API V2 only)*Shown as request*                                           |
| **etcd.store.watchers**(gauge)                                              | Rate of watchers(ETCD API V2 only)                                                                                |

Etcd metrics are tagged with `etcd_state:leader` or `etcd_state:follower`, depending on the node status, so you can easily aggregate metrics by status.

### Events{% #events %}

The Etcd check does not include any events.

### Service Checks{% #service-checks %}

**etcd.can\_connect**

Returns `CRITICAL` if unable to get metrics from etcd (timeout or non-200 HTTP code). This service check is only available on the legacy version of the etcd check.

*Statuses: ok, critical*

**etcd.healthy**

Returns `CRITICAL` when a member is unhealthy. This service check is only available on the legacy version of the etcd check.

*Statuses: ok, critical, unknown*

**etcd.prometheus.health**

Returns `CRITICAL` if the check cannot access a metrics endpoint. Otherwise, returns `OK`. This service check is only available when `use_preview` is enabled.

*Statuses: ok, critical*

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).

## Further Reading{% #further-reading %}

- [Monitor etcd performance](https://www.datadoghq.com/blog/monitor-etcd-performance)
- [How to monitor etcd with Datadog](https://www.datadoghq.com/blog/monitor-etcd-with-datadog/)
- [Tools for collecting etcd metrics and logs](https://www.datadoghq.com/blog/etcd-monitoring-tools/)
- [Key metrics for monitoring etcd](https://www.datadoghq.com/blog/etcd-key-metrics/)
