- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
",t};e.buildCustomizationMenuUi=t;function n(e){let t='
",t}function s(e){let n=e.filter.currentValue||e.filter.defaultValue,t='${e.filter.label}
`,e.filter.options.forEach(s=>{let o=s.id===n;t+=``}),t+="${e.filter.label}
`,t+=`Datadog Disaster Recovery is in Preview, but you can request access! Use this form to submit your request.
Request AccessDatadog Disaster Recovery (DDR) provides you with observability continuity during events that may impact a cloud service provider region or Datadog services running within a cloud provider region. Using DDR, you can recover live observability at an alternate, functional Datadog site, enabling you to meet your critical observability availability goals.
DDR also allows you to periodically conduct disaster recovery drills to not only test your ability to recover from outage events, but to also meet your business and regulatory compliance needs.
The minimum version of the Datadog Agent you need depends on the types of telemetry you need to use:
Supported telemetry | Supported products | Agent version required |
---|---|---|
Logs | Logs | v7.54+ |
Metrics | Infrastructure Monitoring | v7.54+ |
Traces | APM | v7.68+ |
To enable Datadog Disaster Recovery, follow these steps. If you have any questions about any of the steps, contact your Customer Success Manager or Datadog Support.
US1
, choose EU
or US5
).All Datadog sites are geographically separated. Reference the Datadog Site List for options.
If you are also sending telemetry to Datadog using cloud provider integrations, you must add your cloud provider accounts in the DDR org. Datadog does not use cloud providers to receive telemetry data while the DDR site is passive (not in failover).
Email your new org name to your Customer Success Manager. Then, your Customer Success Manager sets this new org as your DDR org.
Note: Although this org appears in your Datadog billing hierarchy, all usage and cost associated is not billed during the Preview period.
After the Datadog team has set your DDR org, use the Datadog public API endpoint to retrieve the public IDs of the primary and DDR org.
To link your DDR and primary orgs, run these commands, replacing the <PLACEHOLDERS>
with your orgs’ values:
export PRIMARY_DD_API_KEY=<PRIMARY_ORG_API_KEY>
export PRIMARY_DD_APP_KEY=<PRIMARY_ORG_APP_KEY>
export PRIMARY_DD_API_URL=<PRIMARY_ORG_API_SITE>
export DDR_ORG_ID=<DDR_ORG_PUBLIC_ID>
export PRIMARY_ORG_ID=<PRIMARY_ORG_PUBLIC_ID>
export USER_EMAIL=<USER_EMAIL>
export CONNECTION='{"data":{"id":"'${PRIMARY_ORG_ID}'","type":"hamr_org_connections","attributes":{"TargetOrgUuid":"'${DDR_ORG_ID}'","HamrStatus":1,"ModifiedBy":"'${USER_EMAIL}'", "IsPrimary":true}}}'
curl -v -H "Content-Type: application/json" -H
"dd-api-key:${PRIMARY_DD_API_KEY}" -H
"dd-application-key:${PRIMARY_DD_APP_KEY}" --data "${CONNECTION}" --request POST ${PRIMARY_DD_API_URL}/api/v2/hamr
After linking your orgs, only the failover org displays this banner:
Datadog recommends using Single Sign On (SSO) to enable all your users to seamlessly login to your Disaster Recovery org during an outage.
Go to the Organization Settings in your DDR org to configure SAML or Google Login for your users.
You must invite each of your users to your Disaster Recovery org and give them appropriate roles and permissions. Alternatively, to streamline this operation, you can use Just-in-Time provisioning with SAML.
See the AWS, Azure, and Google Cloud integrations for setup steps.
Your cloud integrations must be configured in both primary and DDR orgs. However, the integrations must only run in one org at a time:
For more information, see the Cloud integrations failover section.
In your DDR Datadog org, create an API key and App key set. These are useful for copying dashboards and monitors between Datadog sites.
Use the datadog-sync-cli tool to copy your dashboards, monitors, and other configurations from your primary org to your DDR org.
The datadog-sync-cli
tool is primarily intended for unidirectional copying and updating resources from your primary org to your DDR org. Resources copied to the DDR org can be edited, but any new syncing overrides changes that differ from the source in the primary org.
Regular syncing is essential to ensure that your DDR org is up-to-date in the event of a disaster. Datadog recommends performing this operation on a daily basis; you can determine the frequency and timing of syncing based on your business requirements. For information on setting up and running the backup process, see the datadog-sync-cli README.
Use the datadog-sync-cli
configuration available in the documentation to add each item to the sync scope. Here’s an example of a configuration file for syncing specific dashboards and monitors using name and tag filtering from an EU
site to a US5
site:
destination_api_url="https://api.us5.datadoghq.com"
destination_api_key="<US5_API_KEY>"
destination_app_key="<US5_APP_KEY>"
source_api_key="<EU_API_KEY>"
source_app_key="<EU_APP_KEY>"
source_api_url="https://api.datadoghq.eu"
filter=["Type=Dashboards;Name=title","Type=Monitors;Name=tags;Value=sync:true"]
# Make sure to increase the retry timeout to cope with the rate limit
http_client_retry_timeout=600
Here’s an example of a datadog-sync-cli command for syncing log configurations:
datadog-sync migrate –config config –resources="users,roles,logs_pipelines,logs_pipelines_order,logs_indexes,logs_indexes_order,logs_metrics,logs_restriction_queries" –cleanup=Force
Verify that your DDR org is accessible and that your dashboards and monitors are copied from your primary org to your DDR org.
Contact your Customer Success Manager or Datadog Support if you need assistance.
Remote configuration (RC) allows you to remotely configure and change the behavior of Datadog Agents deployed in your infrastructure.
Remote Configuration is enabled by default for new orgs, including your DDR org. Any new API keys you create are RC-enabled for use with your Agent. For more details, see the Remote Configuration documentation.
Datadog strongly recommends using Remote Configuration for a more seamless failover control. As an alternative to RC, you can manually configure your Agents or use configuration management tools such as Puppet, Ansible, or Chef.
Dual Shipping allows you to simultaneously route the same data to two different orgs, such as a primary and a failover org. Starting with Agent v7.54+, a new DDR configuration enables Datadog Agents to send Data that is sent to the Datadog platform. For example, logs
, metrics
, traces
. to the designated failover org when failover is triggered.
Dual Shipping is disabled by default, but you can enable it to support your periodic disaster recovery exercises and drills.
To enable Dual Shipping, Datadog recommends using Fleet Automation for easier management and scalability. Alternatively, you can configure it manually by editing your datadog.yaml
file.
Contact your Datadog Customer Success Manager to schedule dedicated time windows for failover testing to measure performance and Recovery Time Objective (RTO).
From the Fleet Automation page in your failover org, on the Configure Agents tab, you can create a new failover policy or reuse an existing one, and apply it to your fleet of Agents. Soon after the policy is enabled, Agents begin dual-shipping telemetry to both the primary and DDR (failover) observability sites.
To create a failover policy, click on Create Failover Policy.
Then, follow the prompt to scope the hosts and telemetry (metrics, logs, traces) that you are required to failover.
During a failover or failover exercises, update your Datadog Agent’s datadog.yaml
configuration file as shown in the example below and restart the Agent.
enabled: true
allows the Agent to send Data about the Agent and the infrastructure host. For example, host name
, host tags
, Agent version
. to the DDR Datadog site so you can view Agents and your Infra hosts in the DDR org. This allows you to see your Agents and infrastructure hosts in the failover org.failover_metrics
, failover_logs
, and failover_apm
are false
by default. Setting these to true
causes the Agent to start sending Data that is sent to the Datadog platform. For example, logs
, metrics
, traces
. to the DDR org.multi_region_failover:
enabled: true
failover_metrics: false
failover_logs: false
failover_apm: false
site: <DDR_SITE> # For example "site: us5.datadoghq.com" for a US5 site
api_key: <DDR_SITE_API_KEY>
To trigger a failover of your Agents, you can click on one of the policies in Fleet Automation in your DDR org, and then click Enable. The status of each host updates as the failover occurs.
Use the steps appropriate for your environment to activate/test the DDR failover.
For Agent deployments in non-containerized environments, use the below Agent CLI commands:
agent config set multi_region_failover.failover_metrics true
agent config set multi_region_failover.failover_logs true
agent config set multi_region_failover.failover_apm true
If you are running the Agent in a containerized environment like Kubernetes, you can still use the Agent command-line tool, but you need to invoke it on the container running the Agent. You can make changes using one of the following, depending on your needs:
Below is an example of using kubectl
to fail over metrics and logs for a Datadog Agent pod deployed with either the official Helm chart or Datadog Operator. The <POD_NAME>
should be replaced with the name of the Agent pod:
kubectl exec <POD_NAME> -c agent -- agent config set multi_region_failover.failover_metrics true
kubectl exec <POD_NAME> -c agent -- agent config set multi_region_failover.failover_logs true
kubectl exec <POD_NAME> -c agent -- agent config set multi_region_failover.failover_apm true
Alternatively, you can specify the below settings in the main Agent configuration file (datadog.yaml
) and restart the Datadog Agent for the changes to apply:
multi_region_failover:
enabled: true
failover_metrics: true
failover_logs: true
failover_apm: true
site: NEW_ORG_SITE
api_key: NEW_SITE_API_KEY
You can make similar changes with either the official Helm chart or Datadog Operator if you need to specify a custom configuration. Otherwise, you can pass the settings as environment variables:
DD_MULTI_REGION_FAILOVER_ENABLED=true
DD_MULTI_REGION_FAILOVER_FAILOVER_METRICS=true
DD_MULTI_REGION_FAILOVER_FAILOVER_LOGS=true
DD_MULTI_REGION_FAILOVER_FAILOVER_APM=true
DD_MULTI_REGION_FAILOVER_SITE=ADD_NEW_ORG_SITE
DD_MULTI_REGION_FAILOVER_API_KEY=ADD_NEW_SITE_API_KEY
You can test failover for your cloud integrations from your DDR organization’s landing page.
On the failover landing page, you can check the status of your DDR org, or click Fail over your integrations to test your cloud integration failover.
During testing, integration telemetry is spread over both organizations. If you cancel a failover test, the integrations return to running in the primary data center.