---
isPrivate: true
title: Datadog Disaster Recovery
description: Datadog, the leading service for cloud-scale monitoring.
breadcrumbs: Docs > Agent > Agent Guides > Datadog Disaster Recovery
---

# Datadog Disaster Recovery

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site). ().
{% /alert %}

{% /callout %}

{% callout %}
##### Join the Preview!

Datadog Disaster Recovery is in Limited Availability. Use this form to request access!

[Request Access](https://www.datadoghq.com/product-preview/datadog-disaster-recovery/)
{% /callout %}

## Overview{% #overview %}

Datadog Disaster Recovery (DDR) provides you with observability continuity during events that may impact a cloud service provider region or Datadog services running within a cloud provider region. Using DDR, you can recover live observability at an alternate, functional Datadog site, enabling you to meet your critical observability availability goals.

DDR also allows you to periodically conduct disaster recovery drills to not only test your ability to recover from outage events, but to also meet your business and regulatory compliance needs.

## Prerequisites{% #prerequisites %}

The minimum version of the Datadog Agent you need depends on the types of telemetry you need to use:

| Supported telemetry | Supported products        | Agent version required |
| ------------------- | ------------------------- | ---------------------- |
| Logs                | Logs                      | v7.54+                 |
| Metrics             | Infrastructure Monitoring | v7.54+                 |
| Traces              | APM                       | v7.68+                 |

{% alert level="info" %}
Datadog is continuously evaluating customer requests to support DDR for additional products. Contact the [Disaster Recovery team](mailto:disaster-recovery@datadoghq.com) to learn about upcoming capabilities and your specific needs if they are not covered above.
{% /alert %}

## Setup{% #setup %}

To enable Datadog Disaster Recovery, follow these steps. If you have any questions about any of the steps, contact your [Customer Success Manager](mailto:success@datadoghq.com) or [Datadog Support](https://www.datadoghq.com/support/).

### 1. Create a DDR org and link it to your primary org

{% collapsible-section %}
##### Create and share your DDR org

{% alert level="info" %}
If required, Datadog can set this up for you.
{% /alert %}

#### Create your DDR org{% #create-your-ddr-org %}

1. Go to [Get Started with Datadog](https://app.datadoghq.com/signup). You may need to log out of your current session, or use incognito mode to access this page.
1. Choose a different Datadog site than your primary (for example, if you're on `US1`, choose `EU` or `US5`).
1. Follow the prompts to create an account.

All Datadog sites are geographically separated. Reference the [Datadog Site List](https://docs.datadoghq.com/getting_started/site#access-the-datadog-site) for options.

If you are also sending telemetry to Datadog using cloud provider integrations, you must add your cloud provider accounts in the DDR org. Datadog does not use cloud providers to receive telemetry data while the DDR site is passive (not in failover).

#### Share the DDR org information with Datadog{% #share-the-ddr-org-information-with-datadog %}

Email your new org name to your [Customer Success Manager](mailto:success@datadoghq.com). Then, your Customer Success Manager sets this new org as your DDR org.

**Note:** Although this org appears in your Datadog billing hierarchy, all usage and cost associated is *not* billed during the Preview period.
{% /collapsible-section %}

{% collapsible-section %}
##### Retrieve the public IDs and link your DDR and primary orgs

{% alert level="danger" %}
For security reasons, Datadog is unable to link the orgs on your behalf.
{% /alert %}

After the Datadog team has set your DDR org, use the Datadog [public API endpoint](https://docs.datadoghq.com/api/latest/organizations/#list-your-managed-organizations) to retrieve the public IDs of the primary and DDR org.

To link your DDR org to your primary org:

- Add the `disaster_recovery_status_write` scope to your application key in the primary org.
- Run the following commands, replacing the placeholders with the appropriate values.

```shell
export PRIMARY_DD_API_KEY=<PRIMARY_ORG_API_KEY>
export PRIMARY_DD_APP_KEY=<PRIMARY_ORG_APP_KEY>
export PRIMARY_DD_API_URL=<PRIMARY_ORG_API_SITE>

export DDR_ORG_ID=<DDR_ORG_PUBLIC_ID>
export PRIMARY_ORG_ID=<PRIMARY_ORG_PUBLIC_ID>
export USER_EMAIL=<USER_EMAIL>
export CONNECTION='{"data":{"id":"'${PRIMARY_ORG_ID}'","type":"hamr_org_connections","attributes":{"TargetOrgUuid":"'${DDR_ORG_ID}'","HamrStatus":1,"ModifiedBy":"'${USER_EMAIL}'", "IsPrimary":true}}}'

curl -v -H "Content-Type: application/json" -H \
"dd-api-key:${PRIMARY_DD_API_KEY}" -H \
"dd-application-key:${PRIMARY_DD_APP_KEY}" --data "${CONNECTION}" --request POST ${PRIMARY_DD_API_URL}/api/v2/hamr
```

After linking your orgs, only the failover org displays this banner:

{% image
   source="https://datadog-docs.imgix.net/images/agent/guide/ddr/ddr-banner.e472f1a091274ed415e35dbe7d22062f.png?auto=format"
   alt="The DDR banner in the DDR org" /%}

{% /collapsible-section %}

### 2. Set up access, integrations, syncing, and agents

{% collapsible-section %}
##### Configure Single Sign On for the DDR org

**Datadog recommends using Single Sign On (SSO)** to enable all your users to seamlessly login to your Disaster Recovery org during an outage.

Go to the [Organization Settings](https://app.datadoghq.com/organization-settings/users) in your DDR org to configure [SAML](https://docs.datadoghq.com/account_management/saml/#overview) or Google Login for your users.

You must invite each of your users to your Disaster Recovery org and give them appropriate roles and permissions. Alternatively, to streamline this operation, you can use [Just-in-Time provisioning with SAML](https://docs.datadoghq.com/account_management/saml/#just-in-time-jit-provisioning).
{% /collapsible-section %}

{% collapsible-section %}
##### Set up your cloud integrations (AWS, Azure, Google Cloud)

See the [AWS](https://docs.datadoghq.com/integrations/amazon-web-services/), [Azure](https://docs.datadoghq.com/integrations/azure/), and [Google Cloud](https://docs.datadoghq.com/integrations/google-cloud-platform/?tab=organdfolderlevelprojectdiscovery#overview) integrations for setup steps.

Your cloud integrations must be configured in both primary and DDR orgs. However, the integrations must only run in one org at a time:

- By default, the integrations must run only in the primary org.
- When in failover, the integrations must only run in the DDR org.

For more information, see the Cloud integrations failover section.
{% /collapsible-section %}

{% collapsible-section %}
##### Create your Datadog API and App key for syncing

In your DDR Datadog org, create an [API key](https://app.datadoghq.com/organization-settings/api-keys) **and** [App key](https://app.datadoghq.com/organization-settings/application-keys) set. These are useful for copying dashboards and monitors between Datadog sites.

{% alert level="info" %}
Datadog can help copy the API key signatures for your Agents to the DDR backup account. This ensures there is no need to create new API keys when operating in the DDR region. By using the existing keys, you can avoid the complexity of managing multiple sets of keys, reduce operational overhead and simplify key management. For any questions, please contact your [Customer Success Manager](mailto:success@datadoghq.com).
{% /alert %}

{% /collapsible-section %}

{% collapsible-section #syncing-data %}
##### Set up resource syncing on a schedule

#### Using the datadog-sync-cli tool{% #using-the-datadog-sync-cli-tool %}

Use the [datadog-sync-cli](https://github.com/DataDog/datadog-sync-cli) tool to copy your dashboards, monitors, and other configurations from your primary org to your DDR org.

The `datadog-sync-cli` tool is primarily intended for unidirectional copying and updating resources from your primary org to your DDR org. Resources copied to the DDR org can be edited, but any new syncing overrides changes that differ from the source in the primary org.

Regular syncing is essential to ensure that your DDR org is up-to-date in the event of a disaster. Datadog recommends performing this operation on a daily basis; you can determine the frequency and timing of syncing based on your business requirements. For information on setting up and running the backup process, see the [datadog-sync-cli README](https://github.com/DataDog/datadog-sync-cli/blob/main/README.md).

Use the `datadog-sync-cli` configuration available in the documentation to add each item to the sync scope. Here's an example of a configuration file for syncing specific dashboards and monitors using name and tag filtering from an `EU` site to a `US5` site:

```shell
destination_api_url="https://api.us5.datadoghq.com"
destination_api_key="<US5_API_KEY>"
destination_app_key="<US5_APP_KEY>"
source_api_key="<EU_API_KEY>"
source_app_key="<EU_APP_KEY>"
source_api_url="https://api.datadoghq.eu"
filter=["Type=Dashboards;Name=title","Type=Monitors;Name=tags;Value=sync:true"]

# Make sure to increase the retry timeout to cope with the rate limit
http_client_retry_timeout=600
```

Here's an example of a datadog-sync-cli command for syncing log configurations:

```shell
datadog-sync migrate –config config –resources="users,roles,logs_pipelines,logs_pipelines_order,logs_indexes,logs_indexes_order,logs_metrics,logs_restriction_queries" –cleanup=Force
```

{% alert level="danger" %}
**datadog-sync-cli limitation for log standard attributes**The datadog-sync-cli is regularly being updated with new resources. At this time, syncing log standard attributes is not supported for private beta. If you use standard attributes with your log pipelines and are remapping your logs, attributes are a dependency that you need to manually re-configure in your DDR org. See the Datadog [standard attribute documentation](https://docs.datadoghq.com/logs/log_configuration/attributes_naming_convention/#overview) for support.
{% /alert %}

#### Verify availability at the DDR site{% #verify-availability-at-the-ddr-site %}

Verify that your DDR org is accessible and that your dashboards and monitors are copied from your primary org to your DDR org.

Contact your [Customer Success Manager](mailto:success@datadoghq.com) or [Datadog Support](https://www.datadoghq.com/support/) if you need assistance.
{% /collapsible-section %}

{% collapsible-section %}
##### Enable Remote Configuration [**RECOMMENDED]

[Remote configuration (RC)](https://docs.datadoghq.com/agent/remote_config/?tab=configurationyamlfile) allows you to remotely configure and change the behavior of Datadog Agents deployed in your infrastructure.

Remote Configuration is enabled by default for new orgs, including your DDR org. Any new API keys you create are RC-enabled for use with your Agent. For more details, see the [Remote Configuration documentation](https://docs.datadoghq.com/agent/remote_config/?tab=configurationyamlfile).

Datadog strongly recommends using Remote Configuration for a more seamless failover control. As an alternative to RC, you can manually configure your Agents or use configuration management tools such as Puppet, Ansible, or Chef.
{% /collapsible-section %}

{% collapsible-section %}
##### Dual Ship telemetry to DDR org during failover or drills

[Dual Shipping](https://docs.datadoghq.com/agent/configuration/dual-shipping/?tab=helm) allows you to simultaneously route the same data to two different orgs, such as a primary and a failover org. Starting with Agent **v7.54+**, a new DDR configuration enables Datadog Agents to send telemetry (Data that is sent to the Datadog platform. For example,  logs ,  metrics ,  traces . )to the designated failover org when failover is triggered.

**Dual Shipping is disabled by default**, but you can enable it to support your periodic disaster recovery exercises and drills.

To enable Dual Shipping, Datadog recommends using [Fleet Automation](https://docs.datadoghq.com/agent/fleet_automation/#overview) for easier management and scalability. Alternatively, you can configure it manually by editing your `datadog.yaml` file.

Contact your Datadog Customer Success Manager to schedule dedicated time windows for failover testing to measure performance and Recovery Time Objective (RTO).

{% tab title="Using Fleet Automation (recommended)" %}
From the [Fleet Automation](https://app.datadoghq.com/fleet) page in your failover org, on the **Configure Agents** tab, you can create a new failover policy or reuse an existing one, and apply it to your fleet of Agents. Soon after the policy is enabled, Agents begin dual-shipping telemetry to both the primary and DDR (failover) observability sites.

To create a failover policy, click on **Create Failover Policy**.

{% image
   source="https://datadog-docs.imgix.net/images/agent/guide/ddr/ddr-fa-policy.4c932994e6282dbf021e10a67ab4f910.png?auto=format"
   alt="Manage DDR policies" /%}

Then, follow the prompt to scope the hosts and telemetry (metrics, logs, traces) that you are required to failover.

{% image
   source="https://datadog-docs.imgix.net/images/agent/guide/ddr/ddr-fa-policy-scope.6219fdab9760a8ee0c9d918311b94916.png?auto=format"
   alt="Scope the hosts and telemetry required to failover" /%}

{% alert level="danger" %}
**Note**: Cloud Integrations can only run in either your primary or DDR Datadog site, but not both at the same time, so failing them over will cease Cloud Integration data in your primary site. **During an integration failover, integrations run only in the DDR data center**. When no longer in failover, disable the failover policy to return integration data collection to the primary org.
{% /alert %}

{% /tab %}

{% tab title="Manually" %}
During a failover or failover exercises, update your Datadog Agent's `datadog.yaml` configuration file as shown in the example below and restart the Agent.

- `enabled: true` allows the Agent to send metadata (Data about the Agent and the infrastructure host. For example,  host name ,  host tags ,  Agent version . )to the DDR Datadog site so you can view Agents and your Infra hosts in the DDR org. This allows you to see your Agents and infrastructure hosts in the failover org.
- `failover_metrics`, `failover_logs`, and `failover_apm` are `false` by default. Setting these to `true` causes the Agent to start sending telemetry (Data that is sent to the Datadog platform. For example,  logs ,  metrics ,  traces . )to the DDR org.

```shell
multi_region_failover:
  enabled: true
  failover_metrics: false
  failover_logs: false
  failover_apm: false
  site: <DDR_SITE>  # For example "site: us5.datadoghq.com" for a US5 site
  api_key: <DDR_SITE_API_KEY>
```

{% /tab %}

{% /collapsible-section %}

### 3. Test run failover tests in various environments

{% collapsible-section %}
##### Activate and test DDR failover in Agent-based environments

To trigger a failover of your Agents, you can click on one of the policies in [Fleet Automation](https://app.datadoghq.com/fleet) in your DDR org, and then click **Enable**. The status of each host updates as the failover occurs.

{% image
   source="https://datadog-docs.imgix.net/images/agent/guide/ddr/ddr-fa-policy-enable3.675dbd5b8c33b201fba461e3ff252adb.png?auto=format"
   alt="Enable the failover policy in the DDR org" /%}

Use the steps appropriate for your environment to activate/test the DDR failover.

{% tab title="Agent in non-containerized environments" %}
For Agent deployments in non-containerized environments, use the below Agent CLI commands:

```shell
agent config set multi_region_failover.failover_metrics true
agent config set multi_region_failover.failover_logs true
agent config set multi_region_failover.failover_apm true
```

{% /tab %}

{% tab title="Agent in containerized environments" %}
If you are running the Agent in a containerized environment like Kubernetes, you can still use the Agent command-line tool, but you need to invoke it on the container running the Agent. You can make changes using one of the following, depending on your needs:

- kubectl
- Agent configuration file (`datadog.yaml`)
- Helm chart or Datadog Operator

##### Using kubectl{% #using-kubectl %}

Below is an example of using `kubectl` to fail over metrics and logs for a Datadog Agent pod deployed with either the official Helm chart or Datadog Operator. The `<POD_NAME>` should be replaced with the name of the Agent pod:

```shell
kubectl exec <POD_NAME> -c agent -- agent config set multi_region_failover.failover_metrics true
kubectl exec <POD_NAME> -c agent -- agent config set multi_region_failover.failover_logs true
kubectl exec <POD_NAME> -c agent -- agent config set multi_region_failover.failover_apm true
```

##### Using the Agent configuration file{% #using-the-agent-configuration-file %}

Alternatively, you can specify the below settings in the main Agent configuration file (`datadog.yaml`) and restart the Datadog Agent for the changes to apply:

```shell
multi_region_failover:
  enabled: true
  failover_metrics: true
  failover_logs: true
  failover_apm: true
  site: NEW_ORG_SITE
  api_key: NEW_SITE_API_KEY
```

##### Using the Helm chart or Datadog Operator{% #using-the-helm-chart-or-datadog-operator %}

You can make similar changes with either the official Helm chart or Datadog Operator if you need to specify a custom configuration. Otherwise, you can pass the settings as environment variables:

```shell
DD_MULTI_REGION_FAILOVER_ENABLED=true
DD_MULTI_REGION_FAILOVER_FAILOVER_METRICS=true
DD_MULTI_REGION_FAILOVER_FAILOVER_LOGS=true
DD_MULTI_REGION_FAILOVER_FAILOVER_APM=true
DD_MULTI_REGION_FAILOVER_SITE=ADD_NEW_ORG_SITE
DD_MULTI_REGION_FAILOVER_API_KEY=ADD_NEW_SITE_API_KEY
```

{% /tab %}

{% /collapsible-section %}

{% collapsible-section #id-for-cloud %}
##### Activate and test DDR failover in cloud integrations

You can test failover for your cloud integrations from your DDR organization's landing page.

{% image
   source="https://datadog-docs.imgix.net/images/agent/guide/ddr/ddr-failover-main-page.62441b2f7a7da4c39a5c8665a69663f7.png?auto=format"
   alt="Enable the failover policy in the DDR org" /%}

On the failover landing page, you can check the status of your DDR org, or click **Fail over your integrations** to test your cloud integration failover.

{% alert level="danger" %}
When no longer in failover, **disable the failover policy** in the DDR org to return integration data collection to the primary org.
{% /alert %}

During testing, integration telemetry is spread over both organizations. If you cancel a failover test, the integrations return to running in the primary data center.
{% /collapsible-section %}

## Further reading{% #further-reading %}

- [Remote Configuration](https://docs.datadoghq.com/agent/remote_config/?tab=configurationyamlfile)
- [Getting Started with Datadog Sites](https://docs.datadoghq.com/getting_started/site/)
- [Datadog Disaster Recovery mitigates cloud provider outages](https://www.datadoghq.com/blog/ddr-mitigates-cloud-provider-outages/)
