New announcements for Serverless, Network, RUM, and more from Dash! New announcements from Dash!

AWS

Crawler Crawler

Overview

Connect to Amazon Web Services (AWS) to:

  • See automatic AWS status updates in your stream
  • Get CloudWatch metrics for EC2 hosts without installing the Agent
  • Tag your EC2 hosts with EC2-specific information (e.g. availability zone)
  • See EC2 scheduled maintenance events in your stream
  • Collect CloudWatch metrics and events from many other AWS products

Related integrations include:

API GatewayCreate, publish, maintain, and secure APIs
AppstreamFully managed application streaming on AWS
AppSyncA GraphQL service with real-time data synchronization and offline programming features
AthenaServerless interactive query service
AutoscalingScale EC2 capacity
BillingBilling and budgets
CloudFrontLocal content delivery network
CloudhsmManaged hardware security module (HSM)
CloudSearchAccess to log files and AWS API calls
CloudTrailAccess to log files and AWS API calls
CodeBuildFully managed build service
CodeDeployAutomate code deployments
CognitoSecure user sign-up and sign-in
ConnectA self-service, cloud-based contact center service
Direct ConnectDedicated network connection to AWS
DMSDatabase Migration Service
DocumentDBMongoDB-compatible database
Dynamo DBNoSQL Database
EBS (Elastic Block Store)Persistent block level storage volumes
EC2 (Elastic Cloud Compute)Resizable compute capacity in the cloud
EC2 SpotTake advantage of unused EC2 capacity
ECS (Elastic Container Service)Container management service that supports Docker containers
EFS (Elastic File System)Shared file storage
EKSElastic Container Service for Kubernetes
Elastic TranscoderMedia and video transcoding in the cloud
ElastiCacheIn-memory cache in the cloud
Elastic BeanstalkService for deploying and scaling web applications and services
ELB (Elastic Load Balancing)Distributes incoming application traffic across multiple Amazon EC2 instances
EMR (Elastic Map Reduce)Data processing using Hadoop
ES (Elasticsearch)Deploy, operate, and scale Elasticsearch clusters
FirehoseCapture and load streaming data
GameliftDedicated game server hosting
GlueExtract, transform, and load data for analytics
GuardDutyIntelligent threat detection
HealthVisibility into the state of your AWS resources, services, and accounts
InspectorAutomated security assessment
IOT (Internet of Things)Connect IOT devices with cloud services
KinesisService for real-time processing of large, distributed data streams
KMS (Key Management Service)Create and control encryption keys
LambdaServerless computing
LexBuild conversation bots
Machine LearningCreate machine learning models
MediaConnectTransport for live video
MediaConvertVideo processing for broadcast and multiscreen delivery
MediaPackagePrepare and protect video for delivery over the internet
MediaTailorScalable server-side ad insertion
MQManaged message broker for ActiveMQ
Managed Streaming for KafkaBuild and run applications that use Apache Kafka to process streaming data
NAT GatewayEnable instances in a private subnet to connect to the internet or other AWS services
NeptuneFast, reliable graph database built for the cloud
OpsWorksConfiguration management
PollyText-speech service
RDS (Relational Database Service)Relational database in the cloud
RedshiftData warehouse solution
RekognitionImage and video analysis for applications
Route 53DNS and traffic management with availability monitoring
S3 (Simple Storage Service)Highly available and scalable cloud storage service
SageMakerMachine learning models and algorithms
SES (Simple Email Service)Cost-effective, outbound-only email-sending service
SNS (Simple Notification System)Alerts and notifications
SQS (Simple Queue Service)Messaging queue service
Storage GatewayHybrid cloud storage
SWF (Simple Workflow Service)Cloud workflow management
VPC (Virtual Private Cloud)Launch AWS resources into a virtual network
Web Application Firewall (WAF)Protect web applications from common web exploits
WorkspacesSecure desktop computing service
X-RayTracing for distributed applications

Setup

Installation

Setting up the Datadog integration with Amazon Web Services requires configuring role delegation using AWS IAM. To get a better understanding of role delegation, refer to the AWS IAM Best Practices guide.

The GovCloud and China regions do not currently support IAM role delegation. If you are deploying in these regions please skip to the configuration section below.
  1. Create a new role in the AWS IAM Console.
  2. Select Another AWS account for the Role Type.
  3. For Account ID, enter 464622532012 (Datadog’s account ID). This means that you are granting Datadog read only access to your AWS data.
  4. Check off Require external ID and enter the one generated in the Datadog app. Make sure you leave Require MFA disabled. For more information about the External ID, refer to this document in the IAM User Guide.
  5. Click Next: Permissions.
  6. If you’ve already created the policy, search for it on this page and select it, then skip to step 12. Otherwise, click Create Policy, which opens in a new window.
  7. Select the JSON tab. To take advantage of every AWS integration offered by Datadog, use policy snippet below in the textbox. As other components are added to an integration, these permissions may change.
  8. Click Review policy.
  9. Name the policy DatadogAWSIntegrationPolicy or one of your own choosing, and provide an apt description.
  10. Click Create policy. You can now close this window.
  11. Back in the “Create role” window, refresh the list of policies and select the policy you just created.
  12. Click Next: Review.
  13. Give the role a name such as DatadogAWSIntegrationRole, as well as an apt description. Click Create Role.

Bonus: If you use Terraform, set up your Datadog IAM policy using - The AWS Integration with Terraform .

Datadog AWS IAM Policy

The permissions listed below are included in the Policy Document using wild cards such as List* and Get*. If you require strict policies, please use the complete action names as listed and reference the Amazon API documentation for the services you require.

If you are not comfortable with granting all permissions, at the very least use the existing policies named AmazonEC2ReadOnlyAccess and CloudWatchReadOnlyAccess, for more detailed information regarding permissions see the Core Permissions tab.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "apigateway:GET",
        "autoscaling:Describe*",
        "budgets:ViewBudget",
        "cloudfront:GetDistributionConfig",
        "cloudfront:ListDistributions",
        "cloudtrail:DescribeTrails",
        "cloudtrail:GetTrailStatus",
        "cloudwatch:Describe*",
        "cloudwatch:Get*",
        "cloudwatch:List*",
        "codedeploy:List*",
        "codedeploy:BatchGet*",
        "directconnect:Describe*",
        "dynamodb:List*",
        "dynamodb:Describe*",
        "ec2:Describe*",
        "ecs:Describe*",
        "ecs:List*",
        "elasticache:Describe*",
        "elasticache:List*",
        "elasticfilesystem:DescribeFileSystems",
        "elasticfilesystem:DescribeTags",
        "elasticloadbalancing:Describe*",
        "elasticmapreduce:List*",
        "elasticmapreduce:Describe*",
        "es:ListTags",
        "es:ListDomainNames",
        "es:DescribeElasticsearchDomains",
        "health:DescribeEvents",
        "health:DescribeEventDetails",
        "health:DescribeAffectedEntities",
        "kinesis:List*",
        "kinesis:Describe*",
        "lambda:AddPermission",
        "lambda:GetPolicy",
        "lambda:List*",
        "lambda:RemovePermission",
        "logs:Get*",
        "logs:Describe*",
        "logs:FilterLogEvents",
        "logs:TestMetricFilter",
        "logs:PutSubscriptionFilter",
        "logs:DeleteSubscriptionFilter",
        "logs:DescribeSubscriptionFilters",
        "rds:Describe*",
        "rds:List*",
        "redshift:DescribeClusters",
        "redshift:DescribeLoggingStatus",
        "route53:List*",
        "s3:GetBucketLogging",
        "s3:GetBucketLocation",
        "s3:GetBucketNotification",
        "s3:GetBucketTagging",
        "s3:ListAllMyBuckets",
        "s3:PutBucketNotification",
        "ses:Get*",
        "sns:List*",
        "sns:Publish",
        "sqs:ListQueues",
        "support:*",
        "tag:GetResources",
        "tag:GetTagKeys",
        "tag:GetTagValues",
        "xray:BatchGetTraces",
        "xray:GetTraceSummaries"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

The core Datadog AWS integration pulls data from AWS CloudWatch. At a minimum, your Policy Document needs to allow the following actions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "cloudwatch:Get*",
        "cloudwatch:List*",
        "ec2:Describe*",
        "support:*",
        "tag:GetResources",
        "tag:GetTagKeys",
        "tag:GetTagValues"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}
AWS PermissionDescription
cloudwatch:ListMetricsList the available CloudWatch metrics.
cloudwatch:GetMetricDataFetch data points for a given metric.
support:*:Add metrics about service limits.
It requires full access because of AWS limitations
tag:getResourcesGet custom tags by resource type.
tag:getTagKeysGet tag keys by region within an AWS account.
tag:getTagValuesGet tag values by region within an AWS account.

The main use of the Resource Group Tagging API is to reduce the number of API calls needed to collect custom tags. For more information, review the Tag policies documentation on the AWS website.

Configuration

  1. Open the AWS integration tile.
  2. Select the Role Delegation tab.
  3. Enter your AWS Account ID without dashes, e.g. 123456789012, not 1234-5678-9012. Your Account ID can be found in the ARN of the role created during the installation of the AWS integration. Then enter the name of the created role. Note: The role name you enter in the Integration Tile is case sensitive and must exactly match the role name created on the AWS side.
  4. Choose the services you want to collect metrics for on the left side of the dialog. You can optionally add tags to all hosts and metrics. Also if you want to only monitor a subset of EC2 instances on AWS, tag them and specify the tag in the limit textbox here.
  5. Click Install Integration.
  1. Open the AWS integration tile.
  2. Select the Access Keys (GovCloud or China Only) tab.
  3. Enter your AWS Access Key and AWS Secret Key. Only access and secret keys for GovCloud and China are accepted.
  4. Choose the services you want to collect metrics for on the left side of the dialog. You can optionally add tags to all hosts and metrics. Also if you want to only monitor a subset of EC2 instances on AWS, tag them and specify the tag in the limit textbox here.
  5. Click Install Integration.

Log collection

AWS service logs are collected via the Datadog Lambda Function. This lambda—which triggers on S3 Buckets, Cloudwatch Log Groups, and Cloudwatch Events—forwards logs to Datadog.

To start collecting logs from your AWS services:

  1. Set up the Datadog lambda function
  2. Enable logging for your AWS service (most AWS services can log to a S3 bucket or CloudWatch Log Group).
  3. Configure the triggers that cause the lambda to execute. There are two ways to configure the triggers:

    • automatically: Datadog automatically retrieves the logs for the selected AWS services and adds them as triggers on the Datadog Lambda function. Datadog also keeps the list up to date.
    • manually: Set up each trigger yourself via the AWS console.

Set up the Datadog Lambda function

To add the Datadog log-forwarder Lambda to your AWS account, you can either use the AWS Serverless Repository or manually create a new Lambda.

AWS Serverless Repository

Use the AWS Serverless Repository to deploy the Lambda in your AWS account.

Manually create a new lambda

Create a new Lambda Function
  1. Navigate to the Lambda console and create a new function:

  2. Select Author from scratch and give the function a unique name.

  3. Change the Runtime to Python 2.7, Python 3.6, or Python 3.7.

  4. For Role, select Create new role from template(s) and give the role a unique name.

  5. If you are pulling logs from a S3 bucket, under Policy templates search for and select s3 object read-only permissions.

  6. Select Create Function.

Provide the code and configure the Lambda
  1. Copy and paste the code from this repo into the function code area.
  2. Ensure the Handler reads lambda_function.lambda_handler
  3. At the top of the script you’ll find a section called #DD_API_KEY: Datadog API Key. You have two options for providing the API Key that the Lambda function requires:

    • Setup an environment variable (Preferred)
    • Edit the code directly with your Datadog API Key
  4. If you are using Datadog EU site, set the DD_SITE to datadoghq.eu either as an environment variable or directly in the code.

  5. Scroll down beyond the inline code area to Basic Settings.

  6. Set the memory to around 1GB.

  7. Set the timeout limit (120 seconds. recommended).

  8. Set the Execution Role to the role created earlier

  9. Add the Datadog Lambda layer using the following ARN. Replace us-east-1 with the region where your function is deployed and replace Python27 with the Python runtime your function uses (Python27, Python36, or Python37).

    arn:aws:lambda:us-east-1:464622532012:layer:Datadog-Python27:5
    
  10. Scroll back to the top of the page and hit Save.

Test your Lambda

  1. Press Test.
  2. Search for and select CloudWatch Logs as the sample event.
  3. Give the event a unique name and press Create.
  4. Press Test and ensure the test passes with no errors (test logs won’t appear in your Datadog platform).

Advanced settings (optional filtering and redaction)

Supply redaction or filtering rules using environment variables
Redaction rules

Multiple scrubbing options are available: REDACT_IP and REDACT_EMAIL match against hard-coded patterns, while DD_SCRUBBING_RULE allows users to supply a regular expression.

  • To use REDACT_IP, add it as an environment variable and set the value to true.
    • Text matching \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} is replaced with xxx.xxx.xxx.xxx.
  • To use REDACT_EMAIL, add it as an environment variable and set the value to true.
    • Text matching [a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+ is replaced with xxxxx@xxxxx.com.
  • To use DD_SCRUBBING_RULE, add it as a environment variable, and supply a regular expression as the value.
    • Text matching the user-supplied regular expression is replaced with xxxxx, by default.
    • Use the DD_SCRUBBING_RULE_REPLACEMENT environment variable to supply a replacement value instead of xxxxx.
  • Scrubbing rules are applied to the full JSON-formatted log, including any metadata that is automatically added by the Lambda function.
  • Each instance of a pattern match is replaced until no more matches are found in each log.
Filtering rules

Use the EXCLUDE_AT_MATCH OR INCLUDE_AT_MATCH environment variables to filter logs based on a regular expression match.

  • To use EXCLUDE_AT_MATCH, add it as an environment variable and set the value to a regular expression. Logs matching the regular expression are excluded.
  • To use INCLUDE_AT_MATCH, add it as an environment variable and set the value to a regular expression. If not excluded by EXCLUDE_AT_MATCH, logs matching the regular expression are included.
  • If a log matches both the inclusion and exclusion criteria, it is excluded.
  • Filtering rules are applied to the full JSON-formatted log, including any metadata that is automatically added by the function.

Remember to test your Lambda after configuring advanced settings to ensure the test passes with no errors.

Enable logging for your AWS service

Any AWS service that generates logs into a S3 bucket or a CloudWatch Log Group is supported. Find specific setup instructions for the most used services in the table below:

AWS serviceActivate AWS service loggingSend AWS logs to Datadog
API GatewayEnable AWS API Gateway logsManual log collection
CloudfrontEnable AWS Cloudfront logsManual and automatic log collection
CloudtrailEnable AWS Cloudtrail logsManual log collection
DynamoDBEnable AWS DynamoDB logsManual log collection
EC2-Use the Datadog Agent to send your logs to Datadog
ECS-Use the docker agent to gather your logs
Elastic Load Balancing (ELB)Enable AWS ELB logsManual and automatic log collection
Lambda-Manual and automatic log collection
RDSEnable AWS RDS logsManual log collection
Route 53Enable AWS Route 53 logsManual log collection
S3Enable AWS S3 logsManual and automatic log collection
SNSThere is no “SNS Logs”. Process logs and events that are transiting through to the SNS Service.Manual log collection
RedShiftEnable AWS Redshift logsManual and automatic log collection
VPCEnable AWS VPC logsManual log collection

Send AWS service logs to Datadog

There are two options when configuring triggers on the Datadog Lambda function:

  • Manually set up triggers on S3 buckets, Cloudwatch Log Groups, or Cloudwatch Events.
  • Let Datadog automatically set and manage the list of triggers.

Automatically setup triggers

If you are storing logs in many S3 buckets or Cloudwatch Log groups, Datadog can automatically manage triggers for you.

  1. If you haven’t already, set up the Datadog log collection AWS Lambda function.
  2. The AWS serverless repository deploys the Lamdba function within the account as well as creates a role, usually with a name such as ‘serverlessrepo-Datadog-Log-loglambdaddfunctionRole-xyz’. However, this role needs to have a policy applied to it with the correct number of permissions. Typically, the role is associated with a policy that does not have enough permissions for the Lambda function logging to work. Add all the required permissions listed below to the policy that is part of the Datadog Lambda function’s role by using the IAM Console. These are the permissions that are already being used in the Datadog-AWS integration. Another approach to updating the permissions of the policy associated with the Lambda function’s role is to attach a second policy which is the one being used by the Datadog-AWS integration itself. Information on how these permissions are used can be found in the descriptions below:

    "cloudfront:GetDistributionConfig",
    "cloudfront:ListDistributions",
    "elasticloadbalancing:DescribeLoadBalancers",
    "elasticloadbalancing:DescribeLoadBalancerAttributes",
    "lambda:AddPermission",
    "lambda:GetPolicy",
    "lambda:RemovePermission",
    "redshift:DescribeClusters",
    "redshift:DescribeLoggingStatus",
    "s3:GetBucketLogging",
    "s3:GetBucketLocation",
    "s3:GetBucketNotification",
    "s3:ListAllMyBuckets",
    "s3:PutBucketNotification",
    "logs:PutSubscriptionFilter",
    "logs:DeleteSubscriptionFilter",
    "logs:DescribeSubscriptionFilters"
    
AWS PermissionDescription
cloudfront:GetDistributionConfigGet the name of the S3 bucket containing CloudFront access logs.
cloudfront:ListDistributionsList all CloudFront distributions.
elasticloadbalancing:DescribeLoadBalancersList all load balancers.
elasticloadbalancing:DescribeLoadBalancerAttributesGet the name of the S3 bucket containing ELB access logs.
lambda:AddPermissionAdd permission allowing a particular S3 bucket to trigger a Lambda function.
lambda:GetPolicyGets the Lambda policy when triggers are to be removed.
lambda:RemovePermissionRemove permissions from a Lambda policy.
redshift:DescribeClustersList all Redshift clusters.
redshift:DescribeLoggingStatusGet the name of the S3 bucket containing Redshift Logs.
s3:GetBucketLoggingGet the name of the S3 bucket containing S3 access logs.
s3:GetBucketLocationGet the region of the S3 bucket containing S3 access logs.
s3:GetBucketNotificationGet existing Lambda trigger configurations.
s3:ListAllMyBucketsList all S3 buckets.
s3:PutBucketNotificationAdd or remove a Lambda trigger based on S3 bucket events.
logs:PutSubscriptionFilterAdd a Lambda trigger based on CloudWatch Log events
logs:DeleteSubscriptionFilterRemove a Lambda trigger based on CloudWatch Log events
logs:DescribeSubscriptionFiltersLists the subscription filters for the specified log group.
  1. Navigate to the Collect Logs tab in the AWS Integration tile
  2. Select the AWS Account from where you want to collect logs, and enter the ARN of the Lambda created in the previous section.
  3. Check off the services from which you’d like to collect logs and hit save. To stop collecting logs from a particular service, uncheck it.
  4. If you have logs across multiple regions, you must create additional Lambda functions in those regions and enter them in this tile.
  5. To stop collecting all AWS logs, press the x next to each Lambda ARN. All triggers for that function are removed.
  6. Within a few minutes of this initial setup, your AWS Logs appear in your Datadog log explorer page in near real time.

Manually setup triggers

Collecting logs from Cloudwatch Log Group

If you are storing logs in a CloudWatch Log Group, send them to Datadog as follows:

  1. If you haven’t already, set up the Datadog log collection AWS Lambda function.
  2. Once the lambda function is installed, manually add a trigger on the CloudWatch Log Group that contains your logs in the AWS console:

Select the corresponding CloudWatch Log Group, add a filter name (but feel free to leave the filter empty) and add the trigger:

Once done, go into your Datadog Log section to start exploring your logs!

Collecting logs from S3 buckets

If you are storing logs in a S3 bucket, send them to Datadog as follows:

  1. If you haven’t already, set up the Datadog log collection AWS Lambda function.
  2. Once the lambda function is installed, manually add a trigger on the S3 bucket that contains your logs in the AWS console:

  3. Select the bucket and then follow the AWS instructions:

  4. Set the correct event type on S3 buckets:

Once done, go into your Datadog Log section to start exploring your logs!

Data Collected

Metrics

aws.logs.incoming_bytes
(gauge)
The volume of log events in uncompressed bytes uploaded to Cloudwatch Logs.
shown as byte
aws.logs.incoming_log_events
(count)
The number of log events uploaded to Cloudwatch Logs.
shown as event
aws.logs.forwarded_bytes
(gauge)
The volume of log events in compressed bytes forwarded to the subscription destination.
shown as byte
aws.logs.forwarded_log_events
(count)
The number of log events forwarded to the subscription destination.
shown as event
aws.logs.delivery_errors
(count)
The number of log events for which CloudWatch Logs received an error when forwarding data to the subscription destination.
shown as event
aws.logs.delivery_throttling
(count)
The number of log events for which CloudWatch Logs was throttled when forwarding data to the subscription destination.
shown as event
aws.events.invocations
(count)
Measures the number of times a target is invoked for a rule in response to an event. This includes successful and failed invocations but does not include throttled or retried attempts until they fail permanently.
aws.events.failed_invocations
(count)
Measures the number of invocations that failed permanently. This does not include invocations that are retried or that succeeded after a retry attempt
aws.events.triggered_rules
(count)
Measures the number of triggered rules that matched with any event.
aws.events.matched_events
(count)
Measures the number of events that matched with any rule.
aws.events.throttled_rules
(count)
Measures the number of triggered rules that are being throttled.

Events

Events from AWS are collected on a per AWS-service basis. Please refer to the documentation of specific AWS services to learn more about the events collected.

Troubleshooting

Do you believe you’re seeing a discrepancy between your data in CloudWatch and Datadog?

There are two important distinctions to be aware of:

  1. In AWS for counters, a graph that is set to ‘sum’ ‘1minute’ shows the total number of occurrences in one minute leading up to that point, i.e. the rate per 1 minute. Datadog is displaying the raw data from AWS normalized to per second values, regardless of the time frame selected in AWS. This is why you might see Datadog’s value as lower.
  2. Overall, min/max/avg have a different meaning within AWS than in Datadog. In AWS, average latency, minimum latency, and maximum latency are three distinct metrics that AWS collects. When Datadog pulls metrics from AWS CloudWatch, the average latency is received as a single time series per ELB. Within Datadog, when you are selecting ‘min’, ‘max’, or ‘avg’, you are controlling how multiple time series are combined. For example, requesting system.cpu.idle without any filter would return one series for each host that reports that metric and those series need to be combined to be graphed. On the other hand, if you requested system.cpu.idle from a single host, no aggregation would be necessary and switching between average and max would yield the same result.

Metrics delayed?

When using the AWS integration, Datadog pulls in your metrics via the CloudWatch API. You may see a slight delay in metrics from AWS due to some constraints that exist for their API.

To begin, the CloudWatch API only offers a metric-by-metric crawl to pull data. The CloudWatch APIs have a rate limit that varies based on the combination of authentication credentials, region, and service. Metrics are made available by AWS dependent on the account level. For example, if you are paying for “detailed metrics” within AWS, they are available more quickly. This level of service for detailed metrics also applies to granularity, with some metrics being available per minute and others per five minutes.

Datadog has the ability to prioritize certain metrics within an account to pull them in faster, depending on the circumstances. Please contact Datadog support for more info.

To obtain metrics with virtually zero delay, install the Datadog Agent on the host. For more information, see Datadog’s blog post Don’t fear the Agent: Agent-based monitoring.

Missing metrics?

CloudWatch’s API returns only metrics with data points, so if for instance an ELB has no attached instances, it is expected not to see metrics related to this ELB in Datadog.

Wrong count of aws.elb.healthy_host_count?

When the cross-zone load balancing option is enabled on an ELB, all the instances attached to this ELB are considered part of all availability zones (on CloudWatch’s side), so if you have 2 instances in 1a and 3 in ab, the metric displays 5 instances per availability zone. As this can be counter intuitive, we’ve added new metrics, aws.elb.healthy_host_count_deduped and aws.elb.un_healthy_host_count_deduped, that display the count of healthy and unhealthy instances per availability zone, regardless of if this cross-zone load balancing option is enabled or not.

Duplicated hosts when installing the Agent?

When installing the Agent on an AWS host, you might see duplicated hosts on the infra page for a few hours if you manually set the hostname in the Agent’s configuration. This second host disappears a few hours later, and won’t affect your billing.