Connect to Amazon Web Services (AWS) to:
Integration | Description |
---|---|
API Gateway | Create, publish, maintain, and secure APIs |
Appstream | Fully managed application streaming on AWS |
AppSync | A GraphQL service with real-time data synchronization and offline programming features |
Athena | Serverless interactive query service |
Autoscaling | Scale EC2 capacity |
Billing | Billing and budgets |
CloudFront | Local content delivery network |
Cloudhsm | Managed hardware security module (HSM) |
CloudSearch | Access to log files and AWS API calls |
CloudTrail | Access to log files and AWS API calls |
CodeBuild | Fully managed build service |
CodeDeploy | Automate code deployments |
Cognito | Secure user sign-up and sign-in |
Connect | A self-service, cloud-based contact center service |
Direct Connect | Dedicated network connection to AWS |
DMS | Database Migration Service |
DocumentDB | MongoDB-compatible database |
Dynamo DB | NoSQL Database |
EBS (Elastic Block Store) | Persistent block level storage volumes |
EC2 (Elastic Cloud Compute) | Resizable compute capacity in the cloud |
EC2 Spot | Take advantage of unused EC2 capacity |
ECS (Elastic Container Service) | Container management service that supports Docker containers |
EFS (Elastic File System) | Shared file storage |
EKS | Elastic Container Service for Kubernetes |
Elastic Transcoder | Media and video transcoding in the cloud |
ElastiCache | In-memory cache in the cloud |
Elastic Beanstalk | Service for deploying and scaling web applications and services |
ELB (Elastic Load Balancing) | Distributes incoming application traffic across multiple Amazon EC2 instances |
EMR (Elastic Map Reduce) | Data processing using Hadoop |
ES (Elasticsearch) | Deploy, operate, and scale Elasticsearch clusters |
Firehose | Capture and load streaming data |
Gamelift | Dedicated game server hosting |
Glue | Extract, transform, and load data for analytics |
GuardDuty | Intelligent threat detection |
Health | Visibility into the state of your AWS resources, services, and accounts |
Inspector | Automated security assessment |
IOT (Internet of Things) | Connect IOT devices with cloud services |
Kinesis | Service for real-time processing of large, distributed data streams |
KMS (Key Management Service) | Create and control encryption keys |
Lambda | Serverless computing |
Lex | Build conversation bots |
Machine Learning | Create machine learning models |
MediaConnect | Transport for live video |
MediaConvert | Video processing for broadcast and multiscreen delivery |
MediaPackage | Prepare and protect video for delivery over the internet |
MediaTailor | Scalable server-side ad insertion |
MQ | Managed message broker for ActiveMQ |
Managed Streaming for Kafka | Build and run applications that use Apache Kafka to process streaming data |
NAT Gateway | Enable instances in a private subnet to connect to the internet or other AWS services |
Neptune | Fast, reliable graph database built for the cloud |
OpsWorks | Configuration management |
Polly | Text-speech service |
RDS (Relational Database Service) | Relational database in the cloud |
Redshift | Data warehouse solution |
Rekognition | Image and video analysis for applications |
Route 53 | DNS and traffic management with availability monitoring |
S3 (Simple Storage Service) | Highly available and scalable cloud storage service |
SageMaker | Machine learning models and algorithms |
SES (Simple Email Service) | Cost-effective, outbound-only email-sending service |
SNS (Simple Notification System) | Alerts and notifications |
SQS (Simple Queue Service) | Messaging queue service |
Storage Gateway | Hybrid cloud storage |
SWF (Simple Workflow Service) | Cloud workflow management |
VPC (Virtual Private Cloud) | Launch AWS resources into a virtual network |
Web Application Firewall (WAF) | Protect web applications from common web exploits |
Workspaces | Secure desktop computing service |
X-Ray | Tracing for distributed applications |
Setting up the Datadog integration with Amazon Web Services requires configuring role delegation using AWS IAM. To get a better understanding of role delegation, refer to the AWS IAM Best Practices guide.
Choose a method for setting up the necessary AWS role. CloudFormation is recommended.
Another AWS account
for the Role Type.464622532012
(Datadog’s account ID). This means that you are granting Datadog read only access to your AWS data.Require external ID
and enter the one generated in the AWS integration tile. Make sure you leave Require MFA disabled. For more information about the External ID, refer to this document in the IAM User Guide.Next: Permissions
.Create Policy
, which opens in a new window.JSON
tab. To take advantage of every AWS integration offered by Datadog, use policy snippet below in the textbox. As other components are added to an integration, these permissions may change.Review policy
.DatadogAWSIntegrationPolicy
or one of your own choosing, and provide an apt description.Create policy
. You can now close this window.Next: Review
.DatadogAWSIntegrationRole
, as well as an apt description. Click Create Role
.Bonus: If you use Terraform, set up your Datadog IAM policy using - The AWS Integration with Terraform.
123456789012
. Your Account ID can be found in the ARN of the role created during the installation of the AWS integration.to hosts with tag
. Note: This also applies to an instance’s attached EBS volumes.to Lambdas with tag
.The permissions listed below are included in the Policy Document using wild cards such as List*
and Get*
. If you require strict policies, use the complete action names as listed and reference the Amazon API documentation for the services you require.
If you are not comfortable with granting all permissions, at the very least use the existing policies named AmazonEC2ReadOnlyAccess and CloudWatchReadOnlyAccess, for more detailed information regarding permissions see the Core Permissions section.
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"apigateway:GET",
"autoscaling:Describe*",
"budgets:ViewBudget",
"cloudfront:GetDistributionConfig",
"cloudfront:ListDistributions",
"cloudtrail:DescribeTrails",
"cloudtrail:GetTrailStatus",
"cloudtrail:LookupEvents",
"cloudwatch:Describe*",
"cloudwatch:Get*",
"cloudwatch:List*",
"codedeploy:List*",
"codedeploy:BatchGet*",
"directconnect:Describe*",
"dynamodb:List*",
"dynamodb:Describe*",
"ec2:Describe*",
"ecs:Describe*",
"ecs:List*",
"elasticache:Describe*",
"elasticache:List*",
"elasticfilesystem:DescribeFileSystems",
"elasticfilesystem:DescribeTags",
"elasticfilesystem:DescribeAccessPoints",
"elasticloadbalancing:Describe*",
"elasticmapreduce:List*",
"elasticmapreduce:Describe*",
"es:ListTags",
"es:ListDomainNames",
"es:DescribeElasticsearchDomains",
"health:DescribeEvents",
"health:DescribeEventDetails",
"health:DescribeAffectedEntities",
"kinesis:List*",
"kinesis:Describe*",
"lambda:AddPermission",
"lambda:GetPolicy",
"lambda:List*",
"lambda:RemovePermission",
"logs:DeleteSubscriptionFilter",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams",
"logs:DescribeSubscriptionFilters",
"logs:FilterLogEvents",
"logs:PutSubscriptionFilter",
"logs:TestMetricFilter",
"rds:Describe*",
"rds:List*",
"redshift:DescribeClusters",
"redshift:DescribeLoggingStatus",
"route53:List*",
"s3:GetBucketLogging",
"s3:GetBucketLocation",
"s3:GetBucketNotification",
"s3:GetBucketTagging",
"s3:ListAllMyBuckets",
"s3:PutBucketNotification",
"ses:Get*",
"sns:List*",
"sns:Publish",
"sqs:ListQueues",
"states:ListStateMachines",
"states:DescribeStateMachine",
"support:*",
"tag:GetResources",
"tag:GetTagKeys",
"tag:GetTagValues",
"xray:BatchGetTraces",
"xray:GetTraceSummaries"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
The core Datadog AWS integration pulls data from AWS CloudWatch. At a minimum, your Policy Document needs to allow the following actions:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"cloudwatch:Get*",
"cloudwatch:List*",
"ec2:Describe*",
"support:*",
"tag:GetResources",
"tag:GetTagKeys",
"tag:GetTagValues"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
AWS Permission | Description |
---|---|
cloudwatch:ListMetrics | List the available CloudWatch metrics. |
cloudwatch:GetMetricData | Fetch data points for a given metric. |
support:* : | Add metrics about service limits. It requires full access because of AWS limitations |
tag:getResources | Get custom tags by resource type. |
tag:getTagKeys | Get tag keys by region within an AWS account. |
tag:getTagValues | Get tag values by region within an AWS account. |
The main use of the Resource Group Tagging API is to reduce the number of API calls needed to collect custom tags. For more information, review the Tag policies documentation on the AWS website.
to hosts with tag
. Note: This also applies to an instance’s attached EBS volumes.to Lambdas with tag
.There are two ways of sending AWS service logs to Datadog:
aws.logs.incoming_bytes (gauge) | The volume of log events in uncompressed bytes uploaded to Cloudwatch Logs. Shown as byte |
aws.logs.incoming_log_events (count) | The number of log events uploaded to Cloudwatch Logs. Shown as event |
aws.logs.forwarded_bytes (gauge) | The volume of log events in compressed bytes forwarded to the subscription destination. Shown as byte |
aws.logs.forwarded_log_events (count) | The number of log events forwarded to the subscription destination. Shown as event |
aws.logs.delivery_errors (count) | The number of log events for which CloudWatch Logs received an error when forwarding data to the subscription destination. Shown as event |
aws.logs.delivery_throttling (count) | The number of log events for which CloudWatch Logs was throttled when forwarding data to the subscription destination. Shown as event |
aws.events.invocations (count) | Measures the number of times a target is invoked for a rule in response to an event. This includes successful and failed invocations but does not include throttled or retried attempts until they fail permanently. |
aws.events.failed_invocations (count) | Measures the number of invocations that failed permanently. This does not include invocations that are retried or that succeeded after a retry attempt |
aws.events.triggered_rules (count) | Measures the number of triggered rules that matched with any event. |
aws.events.matched_events (count) | Measures the number of events that matched with any rule. |
aws.events.throttled_rules (count) | Measures the number of triggered rules that are being throttled. |
aws.usage.call_count (count) | The number of specified operations performed in your account Shown as operation |
aws.usage.resource_count (count) | The number of specified resources in your account Shown as resource |
Events from AWS are collected on a per AWS-service basis. Please refer to the documentation of specific AWS services to learn more about the events collected.
The following tags are collected from AWS integrations. Note: Some tags only display on specific metrics.
Integration | Datadog Tag Keys |
---|---|
All | region |
API Gateway | apiid , apiname , method , resource , stage |
Auto Scaling | autoscalinggroupname , autoscaling_group |
Billing | account_id , budget_name , budget_type , currency , servicename , time_unit |
CloudFront | distributionid |
CodeBuild | project_name |
CodeDeploy | application , creator , deployment_config , deployment_group , deployment_option , deployment_type , status |
DirectConnect | connectionid |
DynamoDB | globalsecondaryindexname , operation , streamlabel , tablename |
EBS | volumeid , volume-name , volume-type |
EC2 | autoscaling_group , availability-zone , image , instance-id , instance-type , kernel , name , security_group_name |
ECS | clustername , servicename , instance_id |
EFS | filesystemid |
[ElastiCache][] | cachenodeid , cache_node_type , cacheclusterid , cluster_name , engine , engine_version , prefered_availability-zone , replication_group |
ElasticBeanstalk | environmentname , enviromentid |
ELB | availability-zone , hostname , loadbalancername , name , targetgroup |
EMR | cluster_name , jobflowid |
ES | dedicated_master_enabled , ebs_enabled , elasticsearch_version , instance_type , zone_awareness_enabled |
Firehose | deliverystreamname |
Health | event_category , status , service |
IoT | actiontype , protocol , rulename |
Kinesis | streamname , name , state |
KMS | keyid |
Lambda | functionname , resource , executedversion , memorysize , runtime |
Machine Learning | mlmodelid , requestmode |
MQ | broker , queue , topic |
OpsWorks | stackid , layerid , instanceid |
Polly | operation |
RDS | auto_minor_version_upgrade , dbinstanceclass , dbclusteridentifier , dbinstanceidentifier , dbname , engine , engineversion , hostname , name , publicly_accessible , secondary_availability-zone |
Redshift | clusteridentifier , latency , nodeid , service_class , stage , wlmid |
Route 53 | healthcheckid |
S3 | bucketname , filterid , storagetype |
SES | Tag keys are custom set in AWS. |
SNS | topicname |
SQS | queuename |
VPC | nategatewayid , vpnid , tunnelipaddress |
WorkSpaces | directoryid , workspaceid |
There are two important distinctions to be aware of:
system.cpu.idle
without any filter would return one series for each host that reports that metric and those series need to be combined to be graphed. On the other hand, if you requested system.cpu.idle
from a single host, no aggregation would be necessary and switching between average and max would yield the same result.When using the AWS integration, Datadog pulls in your metrics via the CloudWatch API. You may see a slight delay in metrics from AWS due to some constraints that exist for their API.
To begin, the CloudWatch API only offers a metric-by-metric crawl to pull data. The CloudWatch APIs have a rate limit that varies based on the combination of authentication credentials, region, and service. Metrics are made available by AWS dependent on the account level. For example, if you are paying for “detailed metrics” within AWS, they are available more quickly. This level of service for detailed metrics also applies to granularity, with some metrics being available per minute and others per five minutes.
Datadog has the ability to prioritize certain metrics within an account to pull them in faster, depending on the circumstances. Please contact Datadog support for more info.
To obtain metrics with virtually zero delay, install the Datadog Agent on the host. For more information, see Datadog’s blog post Don’t fear the Agent: Agent-based monitoring.
CloudWatch’s API returns only metrics with data points, so if for instance an ELB has no attached instances, it is expected not to see metrics related to this ELB in Datadog.
When the cross-zone load balancing option is enabled on an ELB, all the instances attached to this ELB are considered part of all availability zones (on CloudWatch’s side), so if you have 2 instances in 1a and 3 in ab, the metric displays 5 instances per availability zone. As this can be counter intuitive, we’ve added new metrics, aws.elb.healthy_host_count_deduped and aws.elb.un_healthy_host_count_deduped, that display the count of healthy and unhealthy instances per availability zone, regardless of if this cross-zone load balancing option is enabled or not.
When installing the Agent on an AWS host, you might see duplicated hosts on the infra page for a few hours if you manually set the hostname in the Agent’s configuration. This second host disappears a few hours later, and won’t affect your billing.