---
title: High Availability support of the Datadog Agent
description: Datadog, the leading service for cloud-scale monitoring.
breadcrumbs: >-
  Docs > Integrations > Integration Guides > High Availability support of the
  Datadog Agent
---

# High Availability support of the Datadog Agent

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site.md). ().
{% /alert %}

{% /callout %}

## Overview{% #overview %}

High Availability (HA) support of the Datadog Agent enables seamless failover between a designated active Agent and a standby Agent. If the active Agent becomes unavailable, due to unexpected issues or planned events like OS patches or Agent upgrades, the standby Agent automatically takes over. This configuration eliminates the Agent as a single point of failure, ensuring uninterrupted monitoring and increased resilience across your infrastructure.

You can configure Agents as active-standby pairs in several supported integrations. If the active Agent becomes unavailable, the standby Agent automatically takes over within 90 seconds. You can designate a preferred active Agent, allowing the primary Agent to automatically resume its role when it becomes available. This enables proactive Agent switching ahead of scheduled maintenance.

## Supported integrations{% #supported-integrations %}

The following integrations are supported for High Availability:

| Category                | Supported Integrations                                                                                                                                                                                                                                                                                                                                           |
| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Database Monitoring** | [PostgreSQL](https://docs.datadoghq.com/database_monitoring.md#postgres), [MySQL](https://docs.datadoghq.com/database_monitoring.md#mysql), [MongoDB](https://docs.datadoghq.com/database_monitoring.md#mongodb), [Oracle](https://docs.datadoghq.com/database_monitoring.md#oracle), [SQL Server](https://docs.datadoghq.com/database_monitoring.md#sql-server) |
| **Network Monitoring**  | [SNMP](https://docs.datadoghq.com/network_monitoring/devices/snmp_metrics.md), [Network Path](https://docs.datadoghq.com/network_monitoring/network_path.md), [HTTP Check](https://docs.datadoghq.com/integrations/http_check.md)                                                                                                                                |
| **Vendor-Specific**     | [Cisco ACI](https://docs.datadoghq.com/integrations/cisco_aci.md), [Cisco SD-WAN](https://docs.datadoghq.com/integrations/cisco_sdwan.md), [Versa](https://docs.datadoghq.com/integrations/versa.md)                                                                                                                                                             |
| **Virtualization**      | [Proxmox](https://docs.datadoghq.com/integrations/proxmox.md), [vSphere](https://docs.datadoghq.com/integrations/vsphere.md)                                                                                                                                                                                                                                     |
| **Cloud platforms**     | [OpenStack Controller](https://docs.datadoghq.com/integrations/openstack-controller.md)                                                                                                                                                                                                                                                                          |

## Prerequisites{% #prerequisites %}

- Agent 7.64+
- [Remote Configuration](https://docs.datadoghq.com/agent/remote_config.md) enabled for your organization.
- [fleet_policies_write](https://docs.datadoghq.com/account_management/rbac/permissions.md#fleet-automation) permission to configure the preferred active Agent.

**Supported Operating Systems**:

- Linux
- Windows
- macOS

## Setup{% #setup %}

### Installation{% #installation %}

1. Install the Datadog Agent on two similar hosts (one on each host). The following setup is for hosts with similar capabilities (CPU, RAM, and networking) and configurations (including `datadog.yaml` and integration settings).

1. Configure your `datadog.yaml` on each host, with the following settings:

   ```yaml
   ha_agent:
     enabled: true
   config_id: <CONFIG-NAME>  # example: "my-ndm-agents"
                             # only use lowercase alphanumerics, hyphen and underscore
   ```

1. Configure one of the supported integrations for High Availability:

For example, to set up the SNMP integration, install it on both Agents using the [SNMP Metrics](https://docs.datadoghq.com/network_monitoring/devices/snmp_metrics.md) setup guide.**Note**: Both [individual device monitoring](https://docs.datadoghq.com/network_monitoring/devices/snmp_metrics.md?tab=snmpv2#monitoring-individual-devices) and [Autodiscovery](https://docs.datadoghq.com/network_monitoring/devices/snmp_metrics.md?tab=snmpv2#autodiscovery) methods are supported for the SNMP integration.

After the Agents are configured, they function as an HA pair:

   - The installed integration runs only on the *active* Agent.
   - If the active Agent or host fails (due to a crash or shutdown), the standby Agent automatically takes over, maintaining uninterrupted monitoring.

### Define a preferred active Agent{% #define-a-preferred-active-agent %}

1. Go to [**Integrations > Fleet Automation > View Agents**](https://app.datadoghq.com/fleet).

1. Search for your previously configured Agents using tags or hostname, for example, `config_id:<CONFIG-NAME>`.

   {% image
      source="https://docs.dd-static.net/images/integrations/guide/high_availability/fleet-view-agents_3.36d34538dc47c26615ff96ce7aafcfca.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/integrations/guide/high_availability/fleet-view-agents_3.36d34538dc47c26615ff96ce7aafcfca.png?auto=format&fit=max&w=850&dpr=2 2x"
      alt="Fleet Automation View Agents" /%}

1. Select the Agent you want to assign as the preferred active Agent and click **View Agent details** to open the side panel.

   {% image
      source="https://docs.dd-static.net/images/integrations/guide/high_availability/view_agent_details.e44171a811a13e910e46ff44e2f5c065.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/integrations/guide/high_availability/view_agent_details.e44171a811a13e910e46ff44e2f5c065.png?auto=format&fit=max&w=850&dpr=2 2x"
      alt="Selecting a host from the Fleet Automation tab and highlighting View Agent details" /%}

1. Navigate to the **High Availability** tab and click the three dots next to the Agent you wish to designate as the preferred active Agent.

   {% image
      source="https://docs.dd-static.net/images/integrations/guide/high_availability/set_preferred.3a2358f4416ebd9c9a25636a3e894932.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/integrations/guide/high_availability/set_preferred.3a2358f4416ebd9c9a25636a3e894932.png?auto=format&fit=max&w=850&dpr=2 2x"
      alt="Fleet Automation High Availability tab, highlighting the drop-down to select the preferred Active Agent" /%}

1. On the same screen, review the health status of the preferred active Agent, standby Agent, and configured integrations:

   {% image
      source="https://docs.dd-static.net/images/integrations/guide/high_availability/high_availability_tab_fleet.c42c2266b602a9933d83872178e52572.png?auto=format&fit=max&w=850 1x, https://docs.dd-static.net/images/integrations/guide/high_availability/high_availability_tab_fleet.c42c2266b602a9933d83872178e52572.png?auto=format&fit=max&w=850&dpr=2 2x"
      alt="Fleet Automation High Availability tab, highlighting HA Preferred Active Agent" /%}

## Testing and validation{% #testing-and-validation %}

1. Test failover by shutting down the active Agent or its host.
1. The standby Agent should start monitoring the configured integration(s) after 1-3 minutes.

## FAQ{% #faq %}

### How is the active Agent determined?{% #how-is-the-active-agent-determined %}

**Without a preferred active Agent**:

- The active Agent is initially selected at random.
- Failover occurs only when the current active Agent shuts down or crashes.
- When the primary Agent recovers, it does not automatically reclaim the active role.

**With a preferred active Agent**:

- The preferred Agent always takes priority when available.
- If it fails, the standby Agent becomes active.
- When the preferred Agent recovers, it automatically resumes the active role, and the standby Agent returns to standby.

### Why is it not possible to configure the preferred active Agent?{% #why-is-it-not-possible-to-configure-the-preferred-active-agent %}

- You may not have the necessary permissions. Review the prerequisites and the [fleet_policies_write](https://docs.datadoghq.com/account_management/rbac/permissions.md#fleet-automation) documentation.

### Why does my Agent have an `unknown` HA Agent state?{% #why-does-my-agent-have-an-unknown-ha-agent-state %}

- Remote Configuration may not be setup correctly. For more information, review the prerequisites and the [Remote Configuration setup](https://docs.datadoghq.com/agent/remote_config.md?tab=configurationyamlfile#setup) documentation.

## Further reading{% #further-reading %}

- [Learn more about Fleet Automation](https://docs.datadoghq.com/agent/fleet_automation.md)
- [Network Device Monitoring Terms and Concepts](https://docs.datadoghq.com/network_monitoring/devices/glossary.md)
