AWS

Overview

Connect to Amazon Web Services (AWS) to:

  • See automatic AWS status updates in your event stream
  • Get CloudWatch metrics for EC2 hosts without installing the Agent
  • Tag your EC2 hosts with EC2-specific information
  • See EC2 scheduled maintenance events in your stream
  • Collect CloudWatch metrics and events from many other AWS products
  • See CloudWatch alarms in your event stream

To quickly get started using the AWS integration, check out the AWS getting started guide.

Datadog's Amazon Web Services integration is built to collect ALL metrics from CloudWatch. Datadog strives to continually update the docs to show every sub-integration, but cloud services rapidly release new metrics and services so the list of integrations are sometimes lagging.
IntegrationDescription
API GatewayCreate, publish, maintain, and secure APIs
App RunnerA service that provides a fast, simple, and cost-effective way to deploy from source code or a container image.
AppstreamFully managed application streaming on AWS
AppSyncA GraphQL service with real-time data synchronization and offline programming features
AthenaServerless interactive query service
AutoscalingScale EC2 capacity
BillingBilling and budgets
CloudFrontLocal content delivery network
CloudhsmManaged hardware security module (HSM)
CloudSearchAccess to log files and AWS API calls
CloudTrailAccess to log files and AWS API calls
CodeBuildFully managed build service
CodeDeployAutomate code deployments
CognitoSecure user sign-up and sign-in
ConnectA self-service, cloud-based contact center service
Direct ConnectDedicated network connection to AWS
DMSDatabase Migration Service
DocumentDBMongoDB-compatible database
Dynamo DBNoSQL Database
EBS (Elastic Block Store)Persistent block level storage volumes
EC2 (Elastic Cloud Compute)Resizable compute capacity in the cloud
EC2 SpotTake advantage of unused EC2 capacity
ECS (Elastic Container Service)Container management service that supports Docker containers
EFS (Elastic File System)Shared file storage
EKSElastic Container Service for Kubernetes
Elastic TranscoderMedia and video transcoding in the cloud
ElastiCacheIn-memory cache in the cloud
Elastic BeanstalkService for deploying and scaling web applications and services
ELB (Elastic Load Balancing)Distributes incoming application traffic across multiple Amazon EC2 instances
EMR (Elastic Map Reduce)Data processing using Hadoop
ES (Elasticsearch)Deploy, operate, and scale Elasticsearch clusters
FirehoseCapture and load streaming data
FSxManaged service providing scalable storage for Windows File Server or Lustre.
GameliftDedicated game server hosting
GlueExtract, transform, and load data for analytics
GuardDutyIntelligent threat detection
HealthVisibility into the state of your AWS resources, services, and accounts
InspectorAutomated security assessment
IOT (Internet of Things)Connect IOT devices with cloud services
KeyspacesManaged Apache Cassandra–compatible database service
KinesisService for real-time processing of large, distributed data streams
KMS (Key Management Service)Create and control encryption keys
LambdaServerless computing
LexBuild conversation bots
Machine LearningCreate machine learning models
MediaConnectTransport for live video
MediaConvertVideo processing for broadcast and multiscreen delivery
MediaPackagePrepare and protect video for delivery over the internet
MediaTailorScalable server-side ad insertion
MQManaged message broker for ActiveMQ
Managed Streaming for KafkaBuild and run applications that use Apache Kafka to process streaming data
NAT GatewayEnable instances in a private subnet to connect to the internet or other AWS services
NeptuneFast, reliable graph database built for the cloud
Network FirewallFilter traffic at the perimeter of a VPC
OpsWorksConfiguration management
PollyText-speech service
RDS (Relational Database Service)Relational database in the cloud
RedshiftData warehouse solution
RekognitionImage and video analysis for applications
Route 53DNS and traffic management with availability monitoring
S3 (Simple Storage Service)Highly available and scalable cloud storage service
SageMakerMachine learning models and algorithms
SES (Simple Email Service)Cost-effective, outbound-only email-sending service
SNS (Simple Notification System)Alerts and notifications
SQS (Simple Queue Service)Messaging queue service
Storage GatewayHybrid cloud storage
SWF (Simple Workflow Service)Cloud workflow management
VPC (Virtual Private Cloud)Launch AWS resources into a virtual network
Web Application Firewall (WAF)Protect web applications from common web exploits
WorkspacesSecure desktop computing service
X-RayTracing for distributed applications

Setup

AWS role delegation is not supported on the Datadog for Government site. Access keys must be used.

Use one of the following methods to integrate your AWS accounts into Datadog for metric, event, tag, and log collection.

Automatic

Manual

  • Role delegation
    To set up the AWS integration manually with role delegation, see the manual setup guide.

  • Access keys (GovCloud or China Only)
    To set up the AWS integration with access keys, see the manual setup guide.

AWS IAM Permissions

AWS IAM permissions enable Datadog to collect metrics, tags, CloudWatch events, and other data necessary to monitor your AWS environment.

To correctly set up the AWS Integration, you must attach the relevant IAM policies to the Datadog AWS Integration IAM Role in your AWS account.

AWS Integration IAM Policy

The set of permissions necessary to use all the integrations for individual AWS services.

The following permissions included in the policy document use wild cards such as List* and Get*. If you require strict policies, use the complete action names as listed and reference the Amazon API documentation for your respective services.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "apigateway:GET",
                "autoscaling:Describe*",
                "backup:List*",
                "budgets:ViewBudget",
                "cloudfront:GetDistributionConfig",
                "cloudfront:ListDistributions",
                "cloudtrail:DescribeTrails",
                "cloudtrail:GetTrailStatus",
                "cloudtrail:LookupEvents",
                "cloudwatch:Describe*",
                "cloudwatch:Get*",
                "cloudwatch:List*",
                "codedeploy:List*",
                "codedeploy:BatchGet*",
                "directconnect:Describe*",
                "dynamodb:List*",
                "dynamodb:Describe*",
                "ec2:Describe*",
                "ecs:Describe*",
                "ecs:List*",
                "elasticache:Describe*",
                "elasticache:List*",
                "elasticfilesystem:DescribeFileSystems",
                "elasticfilesystem:DescribeTags",
                "elasticfilesystem:DescribeAccessPoints",
                "elasticloadbalancing:Describe*",
                "elasticmapreduce:List*",
                "elasticmapreduce:Describe*",
                "es:ListTags",
                "es:ListDomainNames",
                "es:DescribeElasticsearchDomains",
                "events:CreateEventBus",
                "fsx:DescribeFileSystems",
                "fsx:ListTagsForResource",
                "health:DescribeEvents",
                "health:DescribeEventDetails",
                "health:DescribeAffectedEntities",
                "kinesis:List*",
                "kinesis:Describe*",
                "lambda:GetPolicy",
                "lambda:List*",
                "logs:DeleteSubscriptionFilter",
                "logs:DescribeLogGroups",
                "logs:DescribeLogStreams",
                "logs:DescribeSubscriptionFilters",
                "logs:FilterLogEvents",
                "logs:PutSubscriptionFilter",
                "logs:TestMetricFilter",
                "organizations:Describe*",
                "organizations:List*",
                "rds:Describe*",
                "rds:List*",
                "redshift:DescribeClusters",
                "redshift:DescribeLoggingStatus",
                "route53:List*",
                "s3:GetBucketLogging",
                "s3:GetBucketLocation",
                "s3:GetBucketNotification",
                "s3:GetBucketTagging",
                "s3:ListAllMyBuckets",
                "s3:PutBucketNotification",
                "ses:Get*",
                "sns:List*",
                "sns:Publish",
                "sqs:ListQueues",
                "states:ListStateMachines",
                "states:DescribeStateMachine",
                "support:DescribeTrustedAdvisor*",
                "support:RefreshTrustedAdvisorCheck",
                "tag:GetResources",
                "tag:GetTagKeys",
                "tag:GetTagValues",
                "xray:BatchGetTraces",
                "xray:GetTraceSummaries"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

AWS Security Audit Policy

To use Cloud Security Posture Management, attach AWS’s managed SecurityAudit Policy to your Datadog IAM role.

Log collection

There are two ways of sending AWS service logs to Datadog:

  • Kinesis Firehose destination: Use the Datadog destination in your Kinesis Firehose delivery stream to forward logs to Datadog. It is recommended to use this approach when sending logs from CloudWatch in a very high volume.
  • Forwarder Lambda function: Deploy the Datadog Forwarder Lambda function, which subscribes to S3 buckets or your CloudWatch log groups and forwards logs to Datadog. You must use this approach to send traces, enhanced metrics, or custom metrics from Lambda functions asynchronously through logs. Datadog also recommends you use this approach to sending logs from S3 or other resources that cannot directly stream data to Kinesis.

Metric collection

There are two ways to send AWS metrics to Datadog:

  • Metric polling: API polling comes out of the box with the AWS integration. A metric-by-metric crawl of the CloudWatch API pulls data and sends it to Datadog. New metrics are pulled every ten minutes, on average.
  • Metric streams with Kinesis Firehose: You can use Amazon CloudWatch Metric Streams and Amazon Kinesis Data Firehose to see your metrics. Note: This method has a two to three minute latency, and requires a separate setup.

Resource collection

Some Datadog products leverage information about how your AWS resources (such as S3 Buckets, RDS snapshots, and CloudFront distributions) are configured. Datadog collects this information by making read only API calls into your AWS account.

Cloud Security Posture Management

Setup

If you do not have the AWS integration set up for your AWS account, complete the set up process above. Ensure to enable resource collection when mentioned.

Note: The AWS integration must be set up with Role delegation to use this feature.

To add Cloud Security Posture Management to an existing AWS integration, follow the steps below to enable resource collection.

  1. Provide the necessary permissions to the Datadog IAM role with the automatic or manual steps:

    Automatic - Update your CloudFormation template.
    a. In the CloudFormation console, find the main stack you used to install the Datadog integration and select Update.
    b. Select Replace current template.
    c. Select Amazon S3 URL, enter https://datadog-cloudformation-template.s3.amazonaws.com/aws/main.yaml and click Next.
    d. Set CloudSecurityPostureManagementPermissions to true and click Next without modifying other existing parameters until you reach the Review page. Here you can verify the change set preview.
    e. Check the two acknowledgment boxes at the bottom and click Update stack.

    Manual - Attach the AWS managed SecurityAudit policy to your Datadog AWS IAM role. You can find this policy in the AWS console.

  2. Complete the setup in the Datadog AWS integration tile with the steps below. Alternatively, you can use the Update an AWS Integration API endpoint.

    1. Click on the AWS account where you wish to enable resource collection.
    2. Go to the Resource collection section for that account and check the box Expanded collection required for Cloud Security Posture Management.
    3. At the bottom left of the tile, click Update Configuration.

Alarm collection

There are two ways to send AWS CloudWatch alarms to the Datadog Event Stream:

  • Alarm polling: Alarm polling comes out of the box with the AWS integration and fetches metric alarms through the DescribeAlarmHistory API. If you follow this method, your alarms are categorized under the event source Amazon Web Services. Note: The crawler does not collect composite alarms.
  • SNS topic: You can see all AWS CloudWatch alarms in your event stream by subscribing the alarms to an SNS topic, then forwarding the SNS messages to Datadog. To learn how to receive SNS messages as events in Datadog, see Receive SNS messages. If you follow this method, your alarms are categorized under the event source Amazon SNS.

Data Collected

Metrics

aws.logs.incoming_bytes
(gauge)
The volume of log events in uncompressed bytes uploaded to Cloudwatch Logs.
Shown as byte
aws.logs.incoming_log_events
(count)
The number of log events uploaded to Cloudwatch Logs.
Shown as event
aws.logs.forwarded_bytes
(gauge)
The volume of log events in compressed bytes forwarded to the subscription destination.
Shown as byte
aws.logs.forwarded_log_events
(count)
The number of log events forwarded to the subscription destination.
Shown as event
aws.logs.delivery_errors
(count)
The number of log events for which CloudWatch Logs received an error when forwarding data to the subscription destination.
Shown as event
aws.logs.delivery_throttling
(count)
The number of log events for which CloudWatch Logs was throttled when forwarding data to the subscription destination.
Shown as event
aws.events.invocations
(count)
Measures the number of times a target is invoked for a rule in response to an event. This includes successful and failed invocations but does not include throttled or retried attempts until they fail permanently.
aws.events.failed_invocations
(count)
Measures the number of invocations that failed permanently. This does not include invocations that are retried or that succeeded after a retry attempt
aws.events.triggered_rules
(count)
Measures the number of triggered rules that matched with any event.
aws.events.matched_events
(count)
Measures the number of events that matched with any rule.
aws.events.throttled_rules
(count)
Measures the number of triggered rules that are being throttled.
aws.usage.call_count
(count)
The number of specified operations performed in your account
Shown as operation
aws.usage.resource_count
(count)
The number of specified resources in your account
Shown as resource

Events

Events from AWS are collected on a per AWS-service basis. See the your AWS service’s documentation to learn more about collected events.

Tags

The following tags are collected with the AWS integration. Note: Some tags only display on specific metrics.

IntegrationDatadog Tag Keys
Allregion
API Gatewayapiid, apiname, method, resource, stage
App Runnerinstance, serviceid, servicename
Auto Scalingautoscalinggroupname, autoscaling_group
Billingaccount_id, budget_name, budget_type, currency, servicename, time_unit
CloudFrontdistributionid
CodeBuildproject_name
CodeDeployapplication, creator, deployment_config, deployment_group, deployment_option, deployment_type, status
DirectConnectconnectionid
DynamoDBglobalsecondaryindexname, operation, streamlabel, tablename
EBSvolumeid, volume-name, volume-type
EC2autoscaling_group, availability-zone, image, instance-id, instance-type, kernel, name, security_group_name
ECSclustername, servicename, instance_id
EFSfilesystemid
ElastiCachecachenodeid, cache_node_type, cacheclusterid, cluster_name, engine, engine_version, preferred_availability-zone, replication_group
ElasticBeanstalkenvironmentname, enviromentid
ELBavailability-zone, hostname, loadbalancername, name, targetgroup
EMRcluster_name, jobflowid
ESdedicated_master_enabled, ebs_enabled, elasticsearch_version, instance_type, zone_awareness_enabled
Firehosedeliverystreamname
FSxfilesystemid, filesystemtype
Healthevent_category, status, service
IoTactiontype, protocol, rulename
Kinesisstreamname, name, state
KMSkeyid
Lambdafunctionname, resource, executedversion, memorysize, runtime
Machine Learningmlmodelid, requestmode
MQbroker, queue, topic
OpsWorksstackid, layerid, instanceid
Pollyoperation
RDSauto_minor_version_upgrade, dbinstanceclass, dbclusteridentifier, dbinstanceidentifier, dbname, engine, engineversion, hostname, name, publicly_accessible, secondary_availability-zone
RDS Proxyproxyname, target, targetgroup, targetrole
Redshiftclusteridentifier, latency, nodeid, service_class, stage, wlmid
Route 53healthcheckid
S3bucketname, filterid, storagetype
SESTag keys are custom set in AWS.
SNStopicname
SQSqueuename
VPCnategatewayid, vpnid, tunnelipaddress
WorkSpacesdirectoryid, workspaceid

Service Checks

aws.status
Returns CRITICAL if one or more AWS regions are experiencing issues. Returns OK otherwise.
Statuses: ok, critical

Troubleshooting

See the AWS Integration Troubleshooting guide to resolve issues related to the AWS integration.

Further Reading

Additional helpful documentation, links, and articles: