- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
",t};e.buildCustomizationMenuUi=t;function n(e){let t='
",t}function s(e){let n=e.filter.currentValue||e.filter.defaultValue,t='${e.filter.label}
`,e.filter.options.forEach(s=>{let o=s.id===n;t+=``}),t+="${e.filter.label}
`,t+=`Deployment Gates are in Preview. If you're interested in this feature, complete the form to request access.
Request AccessThe Deployment Gates product consists of two main components:
Setting up Deployment Gates involves two steps:
transaction-backend
).dev
).default
): Unique name for multiple gates on the same service/environment. This can be used to:fast-deploy
vs default
)pre-deploy
vs post-deploy
)pre-deploy
vs canary-20pct
)Dry Run
to test gate behavior without impacting deployments. The evaluation of
a dry run gate always responds with a pass status, but the in-app result is the real status based
on rules evaluation. This is particularly useful when performing an initial evaluation of the
gate behavior without impacting the deployment pipeline.Each gate requires one or more rules to evaluate. All rules must pass for the gate to succeed. For each rule, specify:
Check all P0 monitors
).Monitor
or Faulty Deployment Detection
.Dry Run
, its result is not taken into account when computing the overall gate result.The Monitors rule allows you to evaluate the state of a set of monitors over a configurable period of time. It will fail if at any time during the evaluation period:
ALERT
or NO_DATA
state.service:transaction-backend
scope:"service:transaction-backend"
group:"service:transaction-backend"
env:prod service:transaction-backend
env:prod (service:transaction-backend OR group:"service:transaction-backend" OR scope:"service:transaction-backend")
tag:"use_deployment_gates" team:payment
tag:"use_deployment_gates" AND (NOT group:("team:frontend"))
group
filters evaluate only matching groups.muted:false
).This rule type uses Watchdog’s APM Faulty Deployment Detection analysis to compare the deployed version against previous versions of the same service. The analysis detects:
The analysis is automatically done for all APM-instrumented services, and no prior setup is required.
database
or inferred service
.Once you have configured the gates and rules, you can request a gate evaluation when deploying the related service, and decide whether to block or continue the deployment based on the result.
A gate evaluation can be requested in several ways:
The datadog-ci deployment gate
command includes all the required logic to evaluate Deployment Gates in a single command:
datadog-ci deployment gate --service transaction-backend --env staging
If the Deployment Gate being evaluated contains APM Faulty Deployment Detection rules, you must also specify the version (for example, --version 1.0.1
).
The command has the following behavior:
--fail-on-error
parameter.Note that the deployment gate
command is available in datadog-ci versions v3.17.0 and above.
Required environment variables:
DD_API_KEY
: Your Datadog API key, used to authenticate the requests.DD_APP_KEY
: Your Datadog application key, used to authenticate the requests.DD_BETA_COMMANDS_ENABLED=1
: The deployment gate
command is a beta command, so datadog-ci needs to be run with beta commands enabled.For complete configuration options and detailed usage examples, refer to the deployment gate
command documentation.
You can call Deployment Gates from an Argo Rollouts Kubernetes Resource by creating an AnalysisTemplate or a ClusterAnalysisTemplate. The template should contain a Kubernetes job that executes the datadog-ci deployment gate command to interact with the Deployment Gates API.
Use the template below as a starting point:
<YOUR_DD_SITE>
below with your Datadog site name (for example,
).datadog
holding two data values: api-key
and app-key
. Alternatively, you can also pass the values in plain text using value
instead of valueFrom
in the script below.apiVersion: argoproj.io/v1alpha1
kind: ClusterAnalysisTemplate
metadata:
name: datadog-job-analysis
spec:
args:
- name: service
- name: env
metrics:
- name: datadog-job
provider:
job:
spec:
ttlSecondsAfterFinished: 300
backoffLimit: 0
template:
spec:
restartPolicy: Never
containers:
- name: datadog-check
image: datadog/ci:v3.17.0
env:
- name: DD_BETA_COMMANDS_ENABLED
value: "1"
- name: DD_SITE
value: "<YOUR_DD_SITE>"
- name: DD_API_KEY
valueFrom:
secretKeyRef:
name: datadog
key: api-key
- name: DD_APP_KEY
valueFrom:
secretKeyRef:
name: datadog
key: app-key
command: ["/bin/sh", "-c"]
args:
- datadog-ci deployment gate --service {{ args.service }} --env {{ args.env }}
service
and env
. Add any other optional fields if needed (such as version
). For more information, see the official Argo Rollouts docs.ttlSecondsAfterFinished
field removes the finished jobs after 5 minutes.backoffLimit
field is set to 0 as the job might fail if the gate evaluation fails, and it should not be retried in that case.After you have created the analysis template, reference it from the Argo Rollouts strategy:
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: rollouts-demo
labels:
tags.datadoghq.com/service: transaction-backend
tags.datadoghq.com/env: dev
spec:
replicas: 5
strategy:
canary:
steps:
...
- analysis:
templates:
- templateName: datadog-job-analysis
clusterScope: true # Only needed for cluster analysis
args:
- name: env
valueFrom:
fieldRef:
fieldPath: metadata.labels['tags.datadoghq.com/env']
- name: service
valueFrom:
fieldRef:
fieldPath: metadata.labels['tags.datadoghq.com/service']
- name: version #Only required if one or more APM Faulty Deployment Detection rules are evaluated
valueFrom:
fieldRef:
fieldPath: metadata.labels['tags.datadoghq.com/version']
- ...
Use this script as a starting point. Be sure to replace the following:
<YOUR_DD_SITE>
: Your Datadog site name (for example,
)<YOUR_API_KEY>
: Your API key<YOUR_APP_KEY>
: Your application key#!/bin/sh
# Configuration
MAX_RETRIES=3
DELAY_SECONDS=5
POLL_INTERVAL_SECONDS=15
MAX_POLL_TIME_SECONDS=10800 # 3 hours
API_URL="https://api.<YOUR_DD_SITE>/api/unstable/deployments/gates/evaluation"
API_KEY="<YOUR_API_KEY>"
APP_KEY="<YOUR_APP_KEY>"
PAYLOAD=$(cat <<EOF
{
"data": {
"type": "deployment_gates_evaluation_request",
"attributes": {
"service": "$1",
"env": "$2",
"version": "$3"
}
}
}
EOF
)
# Step 1: Request evaluation
echo "Requesting evaluation..."
current_attempt=0
while [ $current_attempt -lt $MAX_RETRIES ]; do
current_attempt=$((current_attempt + 1))
RESPONSE=$(curl -s -w "%{http_code}" -o response.txt -X POST "$API_URL" \
-H "Content-Type: application/json" \
-H "DD-API-KEY: $API_KEY" \
-H "DD-APPLICATION-KEY: $APP_KEY" \
-d "$PAYLOAD")
# Extracts the last 3 digits of the status code
HTTP_CODE=$(echo "$RESPONSE" | tail -c 4)
RESPONSE_BODY=$(cat response.txt)
if [ ${HTTP_CODE} -ge 500 ] && [ ${HTTP_CODE} -le 599 ]; then
# Status code 5xx indicates a server error, so the call is retried
echo "Attempt $current_attempt: 5xx Error ($HTTP_CODE). Retrying in $DELAY_SECONDS seconds..."
sleep $DELAY_SECONDS
continue
elif [ ${HTTP_CODE} -ge 400 ] && [ ${HTTP_CODE} -le 499 ]; then
# 4xx errors are client errors and not retriable
echo "Client error ($HTTP_CODE): $RESPONSE_BODY"
exit 1
fi
# Successfully started evaluation, extract evaluation_id
EVALUATION_ID=$(echo "$RESPONSE_BODY" | jq -r '.data.attributes.evaluation_id')
if [ "$EVALUATION_ID" = "null" ] || [ -z "$EVALUATION_ID" ]; then
echo "Failed to extract evaluation_id from response: $RESPONSE_BODY"
exit 1
fi
echo "Evaluation started with ID: $EVALUATION_ID"
break
done
if [ $current_attempt -eq $MAX_RETRIES ]; then
echo "All retries exhausted for evaluation request, but treating 5xx errors as success."
exit 0
fi
# Step 2: Poll for results
echo "Polling for results..."
start_time=$(date +%s)
poll_count=0
while true; do
poll_count=$((poll_count + 1))
current_time=$(date +%s)
elapsed_time=$((current_time - start_time))
# Check if we've exceeded the maximum polling time
if [ $elapsed_time -ge $MAX_POLL_TIME_SECONDS ]; then
echo "Evaluation polling timeout after ${MAX_POLL_TIME_SECONDS} seconds"
exit 1
fi
RESPONSE=$(curl -s -w "%{http_code}" -o response.txt -X GET "$API_URL/$EVALUATION_ID" \
-H "DD-API-KEY: $API_KEY" \
-H "DD-APPLICATION-KEY: $APP_KEY")
HTTP_CODE=$(echo "$RESPONSE" | tail -c 4)
RESPONSE_BODY=$(cat response.txt)
if [ ${HTTP_CODE} -eq 404 ]; then
# Evaluation might not have started yet, retry after a short delay
echo "Evaluation not ready yet (404), retrying in $POLL_INTERVAL_SECONDS seconds... (attempt $poll_count, elapsed: ${elapsed_time}s)"
sleep $POLL_INTERVAL_SECONDS
continue
elif [ ${HTTP_CODE} -ge 500 ] && [ ${HTTP_CODE} -le 599 ]; then
echo "Server error ($HTTP_CODE) while polling, retrying in $POLL_INTERVAL_SECONDS seconds... (attempt $poll_count, elapsed: ${elapsed_time}s)"
sleep $POLL_INTERVAL_SECONDS
continue
elif [ ${HTTP_CODE} -ge 400 ] && [ ${HTTP_CODE} -le 499 ]; then
# 4xx errors (except 404) are client errors and not retriable
echo "Client error ($HTTP_CODE) while polling: $RESPONSE_BODY"
exit 1
fi
# Check gate status
GATE_STATUS=$(echo "$RESPONSE_BODY" | jq -r '.data.attributes.gate_status')
if [ "$GATE_STATUS" = "pass" ]; then
echo "Gate evaluation PASSED"
exit 0
elif [ "$GATE_STATUS" = "fail" ]; then
echo "Gate evaluation FAILED"
exit 1
else
# Treat any other status (in_progress, unexpected, etc.) as still in progress
echo "Evaluation still in progress (status: $GATE_STATUS), retrying in $POLL_INTERVAL_SECONDS seconds... (attempt $poll_count, elapsed: ${elapsed_time}s)"
sleep $POLL_INTERVAL_SECONDS
continue
fi
done
The script has the following characteristics:
service
, environment
, and version
(optionally add identifier
and primary_tag
if needed). The version
is only required if one or more APM Faulty Deployment Detection rules are evaluated.This is a general behavior, and you should change it based on your personal use case and preferences. The script uses curl
(to perform the request) and jq
(to process the returned JSON). If those commands are not available, install them at the beginning of the script (for example, by adding apk add --no-cache curl jq
).
Deployment Gate evaluations are asynchronous, as the evaluation process can take some time to complete. When you trigger an evaluation, it’s started in the background, and the API returns an evaluation ID that can be used to track its progress. The high-level interaction with the Deployment Gates API is the following:
A Deployment Gate evaluation can be requested with an API call.
Be sure to replace the following:
<YOUR_DD_SITE>
: Your Datadog site name (for example,
)<YOUR_API_KEY>
: Your API key<YOUR_APP_KEY>
: Your application keycurl -X POST "https://api.<YOUR_DD_SITE>/api/unstable/deployments/gates/evaluation" \
-H "Content-Type: application/json" \
-H "DD-API-KEY: <YOUR_API_KEY>" \
-H "DD-APPLICATION-KEY: <YOUR_APP_KEY>" \
-d @- << EOF
{
"data": {
"type": "deployment_gates_evaluation_request",
"attributes": {
"service": "transaction-backend",
"env": "staging",
"identifier": "my-custom-identifier", # Optional, defaults to "default"
"version": "v123-456", # Required for APM Faulty Deployment Detection rules
"primary_tag": "region:us-central-1" # Optional, scopes down APM Faulty Deployment Detection rules analysis to the selected primary tag
}
}
}'
Note: A 404 HTTP response can be because the gate was not found, or because the gate was found but has no rules.
If the gate evaluation was successfully started, a 202 HTTP status code is returned. The response is in the following format:
{
"data": {
"id": "<random_response_uuid>",
"type": "deployment_gates_evaluation_response",
"attributes": {
"evaluation_id": "e9d2f04f-4f4b-494b-86e5-52f03e10c8e9"
}
}
}
The field data.attributes.evaluation_id
contains the unique identifier for this gate evaluation.
You can fetch the status of a gate evaluation by polling an additional API endpoint using the gate evaluation ID:
curl -X GET "https://api.<YOUR_DD_SITE>/api/unstable/deployments/gates/evaluation/<evaluation_id>" \
-H "DD-API-KEY: <YOUR_API_KEY>" \
-H "DD-APPLICATION-KEY: <YOUR_APP_KEY>"
Note: If you call this endpoint too quickly after requesting the evaluation, a 404 HTTP response may be returned because the evaluation did not start yet. If this is the case, retry a few seconds later.
When a 200 HTTP response is returned, it has the following format:
{
"data": {
"id": "<random_response_uuid>",
"type": "deployment_gates_evaluation_result_response",
"attributes": {
"dry_run": false,
"evaluation_id": "e9d2f04f-4f4b-494b-86e5-52f03e10c8e9",
"evaluation_url": "https://app.datadoghq.com/ci/deployment-gates/evaluations?index=cdgates&query=level%3Agate+%40evaluation_id%3Ae9d2f14f-4f4b-494b-86e5-52f03e10c8e9",
"gate_id": "e140302e-0cba-40d2-978c-6780647f8f1c",
"gate_status": "pass",
"rules": [
{
"name": "Check service monitors",
"status": "fail",
"reason": "One or more monitors in ALERT state: https://app.datadoghq.com/monitors/34330981",
"dry_run": true
}
]
}
}
}
The field data.attributes.gate_status
contains the result of the evaluation. It can contain one of these values:
in_progress
: The Deployment Gate evaluation is still in progress; you should continue polling.pass
: The Deployment Gate evaluation passed.fail
: The Deployment Gate evaluation failed.Note: If the field data.attributes.dry_run
is true
, the field data.attributes.gate_status
is always pass
.
When integrating Deployment Gates into your Continuous Delivery workflow, an evaluation phase is recommended to confirm the product is working as expected before it impacts deployments. You can do this using the Dry Run evaluation mode and the Deployment Gates Evaluations page:
Dry Run
.pass
status and the deployments are not impacted by the gate result.Dry Run
or Active
). It means you can understand when the gate would have failed and what was the reason behind it.Dry Run
to Active
. Afterwards, the API starts returning the “real” status and deployments start getting promoted or rolled back based on the gate result.추가 유용한 문서, 링크 및 기사: