AWS
ネットワーク パフォーマンス モニタリングの正式提供を開始しました! ネットワーク パフォーマンス モニタリング提供開始!

AWS

Crawler クローラー
このページは日本語には対応しておりません。随時翻訳に取り組んでいます。翻訳に関してご質問やご意見ございましたら、お気軽にご連絡ください。

Overview

Connect to Amazon Web Services (AWS) to:

  • See automatic AWS status updates in your stream
  • Get CloudWatch metrics for EC2 hosts without installing the Agent
  • Tag your EC2 hosts with EC2-specific information (e.g. availability zone)
  • See EC2 scheduled maintenance events in your stream
  • Collect CloudWatch metrics and events from many other AWS products

Related integrations include:

IntegrationDescription
API GatewayCreate, publish, maintain, and secure APIs
AppstreamFully managed application streaming on AWS
AppSyncA GraphQL service with real-time data synchronization and offline programming features
AthenaServerless interactive query service
AutoscalingScale EC2 capacity
BillingBilling and budgets
CloudFrontLocal content delivery network
CloudhsmManaged hardware security module (HSM)
CloudSearchAccess to log files and AWS API calls
CloudTrailAccess to log files and AWS API calls
CodeBuildFully managed build service
CodeDeployAutomate code deployments
CognitoSecure user sign-up and sign-in
ConnectA self-service, cloud-based contact center service
Direct ConnectDedicated network connection to AWS
DMSDatabase Migration Service
DocumentDBMongoDB-compatible database
Dynamo DBNoSQL Database
EBS (Elastic Block Store)Persistent block level storage volumes
EC2 (Elastic Cloud Compute)Resizable compute capacity in the cloud
EC2 SpotTake advantage of unused EC2 capacity
ECS (Elastic Container Service)Container management service that supports Docker containers
EFS (Elastic File System)Shared file storage
EKSElastic Container Service for Kubernetes
Elastic TranscoderMedia and video transcoding in the cloud
ElastiCacheIn-memory cache in the cloud
Elastic BeanstalkService for deploying and scaling web applications and services
ELB (Elastic Load Balancing)Distributes incoming application traffic across multiple Amazon EC2 instances
EMR (Elastic Map Reduce)Data processing using Hadoop
ES (Elasticsearch)Deploy, operate, and scale Elasticsearch clusters
FirehoseCapture and load streaming data
GameliftDedicated game server hosting
GlueExtract, transform, and load data for analytics
GuardDutyIntelligent threat detection
HealthVisibility into the state of your AWS resources, services, and accounts
InspectorAutomated security assessment
IOT (Internet of Things)Connect IOT devices with cloud services
KinesisService for real-time processing of large, distributed data streams
KMS (Key Management Service)Create and control encryption keys
LambdaServerless computing
LexBuild conversation bots
Machine LearningCreate machine learning models
MediaConnectTransport for live video
MediaConvertVideo processing for broadcast and multiscreen delivery
MediaPackagePrepare and protect video for delivery over the internet
MediaTailorScalable server-side ad insertion
MQManaged message broker for ActiveMQ
Managed Streaming for KafkaBuild and run applications that use Apache Kafka to process streaming data
NAT GatewayEnable instances in a private subnet to connect to the internet or other AWS services
NeptuneFast, reliable graph database built for the cloud
OpsWorksConfiguration management
PollyText-speech service
RDS (Relational Database Service)Relational database in the cloud
RedshiftData warehouse solution
RekognitionImage and video analysis for applications
Route 53DNS and traffic management with availability monitoring
S3 (Simple Storage Service)Highly available and scalable cloud storage service
SageMakerMachine learning models and algorithms
SES (Simple Email Service)Cost-effective, outbound-only email-sending service
SNS (Simple Notification System)Alerts and notifications
SQS (Simple Queue Service)Messaging queue service
Storage GatewayHybrid cloud storage
SWF (Simple Workflow Service)Cloud workflow management
VPC (Virtual Private Cloud)Launch AWS resources into a virtual network
Web Application Firewall (WAF)Protect web applications from common web exploits
WorkspacesSecure desktop computing service
X-RayTracing for distributed applications

Setup

Installation

Setting up the Datadog integration with Amazon Web Services requires configuring role delegation using AWS IAM. To get a better understanding of role delegation, refer to the AWS IAM Best Practices guide.

The GovCloud and China regions do not currently support IAM role delegation. If you are deploying in these regions please skip to the configuration section below.
  1. Create a new role in the AWS IAM Console.
  2. Select Another AWS account for the Role Type.
  3. For Account ID, enter 464622532012 (Datadog’s account ID). This means that you are granting Datadog read only access to your AWS data.
  4. Select Require external ID and enter the one generated in the Datadog app. Make sure you leave Require MFA disabled. For more information about the External ID, refer to this document in the IAM User Guide.
  5. Click Next: Permissions.
  6. If you’ve already created the policy, search for it on this page and select it, then skip to step 12. Otherwise, click Create Policy, which opens in a new window.
  7. Select the JSON tab. To take advantage of every AWS integration offered by Datadog, use policy snippet below in the textbox. As other components are added to an integration, these permissions may change.
  8. Click Review policy.
  9. Name the policy DatadogAWSIntegrationPolicy or one of your own choosing, and provide an apt description.
  10. Click Create policy. You can now close this window.
  11. Back in the “Create role” window, refresh the list of policies and select the policy you just created.
  12. Click Next: Review.
  13. Give the role a name such as DatadogAWSIntegrationRole, as well as an apt description. Click Create Role.

Bonus: If you use Terraform, set up your Datadog IAM policy using - The AWS Integration with Terraform .

Datadog AWS IAM Policy

The permissions listed below are included in the Policy Document using wild cards such as List* and Get*. If you require strict policies, please use the complete action names as listed and reference the Amazon API documentation for the services you require.

If you are not comfortable with granting all permissions, at the very least use the existing policies named AmazonEC2ReadOnlyAccess and CloudWatchReadOnlyAccess, for more detailed information regarding permissions see the Core Permissions tab.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "apigateway:GET",
                "autoscaling:Describe*",
                "budgets:ViewBudget",
                "cloudfront:GetDistributionConfig",
                "cloudfront:ListDistributions",
                "cloudtrail:DescribeTrails",
                "cloudtrail:GetTrailStatus",
                "cloudwatch:Describe*",
                "cloudwatch:Get*",
                "cloudwatch:List*",
                "codedeploy:List*",
                "codedeploy:BatchGet*",
                "directconnect:Describe*",
                "dynamodb:List*",
                "dynamodb:Describe*",
                "ec2:Describe*",
                "ecs:Describe*",
                "ecs:List*",
                "elasticache:Describe*",
                "elasticache:List*",
                "elasticfilesystem:DescribeFileSystems",
                "elasticfilesystem:DescribeTags",
                "elasticloadbalancing:Describe*",
                "elasticmapreduce:List*",
                "elasticmapreduce:Describe*",
                "es:ListTags",
                "es:ListDomainNames",
                "es:DescribeElasticsearchDomains",
                "health:DescribeEvents",
                "health:DescribeEventDetails",
                "health:DescribeAffectedEntities",
                "kinesis:List*",
                "kinesis:Describe*",
                "lambda:AddPermission",
                "lambda:GetPolicy",
                "lambda:List*",
                "lambda:RemovePermission",
                "logs:TestMetricFilter",
                "logs:PutSubscriptionFilter",
                "logs:DeleteSubscriptionFilter",
                "logs:DescribeSubscriptionFilters",
                "rds:Describe*",
                "rds:List*",
                "redshift:DescribeClusters",
                "redshift:DescribeLoggingStatus",
                "route53:List*",
                "s3:GetBucketLogging",
                "s3:GetBucketLocation",
                "s3:GetBucketNotification",
                "s3:GetBucketTagging",
                "s3:ListAllMyBuckets",
                "s3:PutBucketNotification",
                "ses:Get*",
                "sns:List*",
                "sns:Publish",
                "sqs:ListQueues",
                "states:ListStateMachines",
                "states:DescribeStateMachine",
                "support:*",
                "tag:GetResources",
                "tag:GetTagKeys",
                "tag:GetTagValues",
                "xray:BatchGetTraces",
                "xray:GetTraceSummaries"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

The core Datadog AWS integration pulls data from AWS CloudWatch. At a minimum, your Policy Document needs to allow the following actions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "cloudwatch:Get*",
                "cloudwatch:List*",
                "ec2:Describe*",
                "support:*",
                "tag:GetResources",
                "tag:GetTagKeys",
                "tag:GetTagValues"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}
AWS PermissionDescription
cloudwatch:ListMetricsList the available CloudWatch metrics.
cloudwatch:GetMetricDataFetch data points for a given metric.
support:*:Add metrics about service limits.
It requires full access because of AWS limitations
tag:getResourcesGet custom tags by resource type.
tag:getTagKeysGet tag keys by region within an AWS account.
tag:getTagValuesGet tag values by region within an AWS account.

The main use of the Resource Group Tagging API is to reduce the number of API calls needed to collect custom tags. For more information, review the Tag policies documentation on the AWS website.

Configuration

  1. Open the AWS integration tile.
  2. Select the Role Delegation tab.
  3. Enter your AWS Account ID without dashes, e.g. 123456789012, not 1234-5678-9012. Your Account ID can be found in the ARN of the role created during the installation of the AWS integration. Then enter the name of the created role. Note: The role name you enter in the Integration Tile is case sensitive and must exactly match the role name created on the AWS side.
  4. Choose the services you want to collect metrics for on the left side of the dialog. You can optionally add tags to all hosts and metrics. Also if you want to only monitor a subset of EC2 instances on AWS, tag them and specify the tag in the limit textbox here.
  5. Click Install Integration.
  1. Open the AWS integration tile.
  2. Select the Access Keys (GovCloud or China Only) tab.
  3. Enter your AWS Access Key and AWS Secret Key. Only access and secret keys for GovCloud and China are accepted.
  4. Choose the services you want to collect metrics for on the left side of the dialog. You can optionally add tags to all hosts and metrics. Also if you want to only monitor a subset of EC2 instances on AWS, tag them and specify the tag in the limit textbox here.
  5. Click Install Integration.

Log collection

AWS service logs are collected with the Datadog Forwarder Lambda function. This Lambda—which triggers on S3 Buckets, Cloudwatch log groups, and Cloudwatch events—forwards logs to Datadog.

To start collecting logs from your AWS services:

  1. Set up the Datadog Forwarder Lambda function.
  2. Enable logging for your AWS service (most AWS services can log to a S3 bucket or CloudWatch Log Group).
  3. Configure the triggers that cause the Lambda to execute. There are two ways to configure the triggers:

    • automatically: Datadog automatically retrieves the logs for the selected AWS services and adds them as triggers on the Datadog Lambda function. Datadog also keeps the list up to date.
    • manually: Set up each trigger yourself via the AWS console.

Set up the Datadog Lambda function

Deploy the Datadog Forwarder Lambda function to your AWS account by following the instructions in the DataDog/datadog-serverless-functions Github repository.

Enable logging for your AWS service

Any AWS service that generates logs into a S3 bucket or a CloudWatch Log Group is supported. Find specific setup instructions for the most used services in the table below:

AWS serviceActivate AWS service loggingSend AWS logs to Datadog
API GatewayEnable AWS API Gateway logsManual log collection
CloudfrontEnable AWS Cloudfront logsManual and automatic log collection
CloudtrailEnable AWS Cloudtrail logsManual log collection
DynamoDBEnable AWS DynamoDB logsManual log collection
EC2-Use the Datadog Agent to send your logs to Datadog
ECS-Use the docker agent to gather your logs
Elastic Load Balancing (ELB)Enable AWS ELB logsManual and automatic log collection
Lambda-Manual and automatic log collection
RDSEnable AWS RDS logsManual log collection
Route 53Enable AWS Route 53 logsManual log collection
S3Enable AWS S3 logsManual and automatic log collection
SNSThere is no “SNS Logs”. Process logs and events that are transiting through to the SNS Service.Manual log collection
RedShiftEnable AWS Redshift logsManual and automatic log collection
VPCEnable AWS VPC logsManual log collection

Send AWS service logs to Datadog

There are two options when configuring triggers on the Datadog Lambda function:

  • Manually set up triggers on S3 buckets, Cloudwatch Log Groups, or Cloudwatch Events.
  • Let Datadog automatically set and manage the list of triggers.

Automatically setup triggers

If you are storing logs in many S3 buckets or Cloudwatch Log groups, Datadog can automatically manage triggers for you.

  1. If you haven’t already, set up the Datadog log collection AWS Lambda function.
  2. Ensure the policy of the IAM role used for Datadog-AWS integration has the following permissions. Information on how these permissions are used can be found in the descriptions below:

    "cloudfront:GetDistributionConfig",
    "cloudfront:ListDistributions",
    "elasticloadbalancing:DescribeLoadBalancers",
    "elasticloadbalancing:DescribeLoadBalancerAttributes",
    "lambda:AddPermission",
    "lambda:GetPolicy",
    "lambda:RemovePermission",
    "redshift:DescribeClusters",
    "redshift:DescribeLoggingStatus",
    "s3:GetBucketLogging",
    "s3:GetBucketLocation",
    "s3:GetBucketNotification",
    "s3:ListAllMyBuckets",
    "s3:PutBucketNotification",
    "logs:PutSubscriptionFilter",
    "logs:DeleteSubscriptionFilter",
    "logs:DescribeSubscriptionFilters"
    AWS PermissionDescription
    cloudfront:GetDistributionConfigGet the name of the S3 bucket containing CloudFront access logs.
    cloudfront:ListDistributionsList all CloudFront distributions.
    elasticloadbalancing:
    DescribeLoadBalancers
    List all load balancers.
    elasticloadbalancing:
    DescribeLoadBalancerAttributes
    Get the name of the S3 bucket containing ELB access logs.
    lambda:AddPermissionAdd permission allowing a particular S3 bucket to trigger a Lambda function.
    lambda:GetPolicyGets the Lambda policy when triggers are to be removed.
    lambda:RemovePermissionRemove permissions from a Lambda policy.
    redshift:DescribeClustersList all Redshift clusters.
    redshift:DescribeLoggingStatusGet the name of the S3 bucket containing Redshift Logs.
    s3:GetBucketLoggingGet the name of the S3 bucket containing S3 access logs.
    s3:GetBucketLocationGet the region of the S3 bucket containing S3 access logs.
    s3:GetBucketNotificationGet existing Lambda trigger configurations.
    s3:ListAllMyBucketsList all S3 buckets.
    s3:PutBucketNotificationAdd or remove a Lambda trigger based on S3 bucket events.
    logs:PutSubscriptionFilterAdd a Lambda trigger based on CloudWatch Log events
    logs:DeleteSubscriptionFilterRemove a Lambda trigger based on CloudWatch Log events
    logs:DescribeSubscriptionFiltersLists the subscription filters for the specified log group.
  3. Navigate to the Collect Logs tab in the AWS Integration tile.

  4. Select the AWS Account from where you want to collect logs, and enter the ARN of the Lambda created in the previous section.

  5. Select the services from which you’d like to collect logs and hit save. To stop collecting logs from a particular service, uncheck it.

  6. If you have logs across multiple regions, you must create additional Lambda functions in those regions and enter them in this tile.

  7. To stop collecting all AWS logs, press the x next to each Lambda ARN. All triggers for that function are removed.

  8. Within a few minutes of this initial setup, your AWS Logs appear in your Datadog log explorer page in near real time.

Manually setup triggers

Collecting logs from Cloudwatch Log Group

If you are storing logs in a CloudWatch Log Group, send them to Datadog as follows:

  1. If you haven’t already, set up the Datadog log collection AWS Lambda function.
  2. Once the lambda function is installed, manually add a trigger on the CloudWatch Log Group that contains your logs in the AWS console:

Select the corresponding CloudWatch Log Group, add a filter name (but feel free to leave the filter empty) and add the trigger:

Once done, go into your Datadog Log section to start exploring your logs!

Collecting logs from S3 buckets

If you are storing logs in a S3 bucket, send them to Datadog as follows:

  1. If you haven’t already, set up the Datadog log collection AWS Lambda function.
  2. Once the lambda function is installed, manually add a trigger on the S3 bucket that contains your logs in the AWS console:

  3. Select the bucket and then follow the AWS instructions:

  4. Set the correct event type on S3 buckets:

Once done, go into your Datadog Log section to start exploring your logs!

Data Collected

Metrics

aws.logs.incoming_bytes
(gauge)
The volume of log events in uncompressed bytes uploaded to Cloudwatch Logs.
Shown as byte
aws.logs.incoming_log_events
(count)
The number of log events uploaded to Cloudwatch Logs.
Shown as event
aws.logs.forwarded_bytes
(gauge)
The volume of log events in compressed bytes forwarded to the subscription destination.
Shown as byte
aws.logs.forwarded_log_events
(count)
The number of log events forwarded to the subscription destination.
Shown as event
aws.logs.delivery_errors
(count)
The number of log events for which CloudWatch Logs received an error when forwarding data to the subscription destination.
Shown as event
aws.logs.delivery_throttling
(count)
The number of log events for which CloudWatch Logs was throttled when forwarding data to the subscription destination.
Shown as event
aws.events.invocations
(count)
Measures the number of times a target is invoked for a rule in response to an event. This includes successful and failed invocations but does not include throttled or retried attempts until they fail permanently.
aws.events.failed_invocations
(count)
Measures the number of invocations that failed permanently. This does not include invocations that are retried or that succeeded after a retry attempt
aws.events.triggered_rules
(count)
Measures the number of triggered rules that matched with any event.
aws.events.matched_events
(count)
Measures the number of events that matched with any rule.
aws.events.throttled_rules
(count)
Measures the number of triggered rules that are being throttled.
aws.usage.call_count
(count)
The number of specified operations performed in your account
Shown as operation
aws.usage.resource_count
(count)
The number of specified resources in your account
Shown as resource

Events

Events from AWS are collected on a per AWS-service basis. Please refer to the documentation of specific AWS services to learn more about the events collected.

Troubleshooting

Discrepancy between your data in CloudWatch and Datadog

There are two important distinctions to be aware of:

  1. In AWS for counters, a graph that is set to ‘sum’ ‘1minute’ shows the total number of occurrences in one minute leading up to that point, i.e. the rate per 1 minute. Datadog is displaying the raw data from AWS normalized to per second values, regardless of the time frame selected in AWS. This is why you might see Datadog’s value as lower.
  2. Overall, min/max/avg have a different meaning within AWS than in Datadog. In AWS, average latency, minimum latency, and maximum latency are three distinct metrics that AWS collects. When Datadog pulls metrics from AWS CloudWatch, the average latency is received as a single time series per ELB. Within Datadog, when you are selecting ‘min’, ‘max’, or ‘avg’, you are controlling how multiple time series are combined. For example, requesting system.cpu.idle without any filter would return one series for each host that reports that metric and those series need to be combined to be graphed. On the other hand, if you requested system.cpu.idle from a single host, no aggregation would be necessary and switching between average and max would yield the same result.

Metrics delayed

When using the AWS integration, Datadog pulls in your metrics via the CloudWatch API. You may see a slight delay in metrics from AWS due to some constraints that exist for their API.

To begin, the CloudWatch API only offers a metric-by-metric crawl to pull data. The CloudWatch APIs have a rate limit that varies based on the combination of authentication credentials, region, and service. Metrics are made available by AWS dependent on the account level. For example, if you are paying for “detailed metrics” within AWS, they are available more quickly. This level of service for detailed metrics also applies to granularity, with some metrics being available per minute and others per five minutes.

Datadog has the ability to prioritize certain metrics within an account to pull them in faster, depending on the circumstances. Please contact Datadog support for more info.

To obtain metrics with virtually zero delay, install the Datadog Agent on the host. For more information, see Datadog’s blog post Don’t fear the Agent: Agent-based monitoring.

Missing metrics

CloudWatch’s API returns only metrics with data points, so if for instance an ELB has no attached instances, it is expected not to see metrics related to this ELB in Datadog.

Wrong count of aws.elb.healthy_host_count

When the cross-zone load balancing option is enabled on an ELB, all the instances attached to this ELB are considered part of all availability zones (on CloudWatch’s side), so if you have 2 instances in 1a and 3 in ab, the metric displays 5 instances per availability zone. As this can be counter intuitive, we’ve added new metrics, aws.elb.healthy_host_count_deduped and aws.elb.un_healthy_host_count_deduped, that display the count of healthy and unhealthy instances per availability zone, regardless of if this cross-zone load balancing option is enabled or not.

Duplicated hosts when installing the Agent

When installing the Agent on an AWS host, you might see duplicated hosts on the infra page for a few hours if you manually set the hostname in the Agent’s configuration. This second host disappears a few hours later, and won’t affect your billing.