Logging is here!

AWS ECS

Crawler Crawler

Overview

Amazon EC2 Container Service (ECS) is a highly scalable, high performance container management service for Docker containers running on EC2 instances.

Setup

This documentation page covers AWS ECS setup with Datadog Agent v6, if you want to set it up with Datadog Agent v5, refer to the dedicated documentation page AWS ECS with Agent v5 setup.

Installation

To monitor your ECS containers and tasks with Datadog, run the Agent as a container on every EC2 instance in your ECS cluster. As detailed below, there are a few setup steps:

  1. Add an ECS Task
  2. Create or Modify your IAM Policy
  3. Create a new Instance with a User Script

This documentation assume you already have a working EC2 Container Service cluster configured. If not, review the Getting Started section in the ECS documentation.

Create an ECS Task

This task launches the Datadog container. When you need to modify the configuration, update this Task Definition as described further down in this guide.

You may either configure the task using the AWS CLI tools or using the Amazon Web Console.

AWS CLI
  1. Download datadog-agent-ecs.json.
  2. Edit datadog-agent-ecs.json and update it with the DD_API_KEY for your account.
  3. Execute the following command: aws ecs register-task-definition --cli-input-json file://path/to/datadog-agent-ecs.json
Web UI
  1. Log in to your AWS Console and navigate to the EC2 Container Service section.
  2. Click on the cluster you wish to add Datadog to.
  3. Click on Task Definitions on the left side and click the button Create new Task Definition.
  4. Enter a Task Definition Name, such as datadog-agent-task.
  5. Click on the Add volume link.
  6. For Name enter docker_sock. For Source Path, enter /var/run/docker.sock. Click Add.
  7. Add another volume with the name proc and source path of /proc/.
  8. Add another volume with the name cgroup and source path of /cgroup/.
  9. Click the large Add container button.
  10. For Container name enter datadog-agent.
  11. For Image enter datadog/agent:latest.
  12. For Maximum memory enter 256.
  13. Scroll down to the Advanced container configuration section and enter 10 in CPU units.
  14. For Env Variables, add a Key of DD_API_KEY and enter your Datadog API Key in the value. If you feel more comfortable storing secrets like this in s3, take a look at the ECS Configuration guide.
  15. Add another Environment Variable for any tags you want to add using the key DD_TAGS.
  16. Scroll down to the Storage and Logging section.
  17. In Mount points select the docker_sock source volume and enter /var/run/docker.sock in the Container path. Check the Read only checkbox.
  18. Add another mount point for proc and enter /host/proc/ in the Container path. Check the Read only checkbox.
  19. Add a third mount point for cgroup and enter /host/sys/fs/cgroup in the Container path. Check the Read only checkbox.

Create or Modify your IAM Policy

  1. Add those permissions to your Datadog IAM policy in order to collect Amazon ECS metrics:

    • ecs:ListClusters: List available clusters.
    • ecs:ListContainerInstances: List instances of a cluster.
    • ecs:DescribeContainerInstances: Describe instances to add metrics on resources and tasks running, adds cluster tag to ec2 instances.

For more information on ECS policies, review the documentation on the AWS website.

  1. Using the Identity and Access Management (IAM) console, create a new role called datadog-agent-ecs.
  2. Select Amazon EC2 Role for EC2 Container Service. On the next screen do not check any checkboxes and click Next Step.
  3. Click Create Role.
  4. Click on the newly created role.
  5. Expand the Inline Policies section. Click the link to create a new inline policy.
  6. Choose Custom Policy and press the button.
  7. For Policy Name enter datadog-agent-policy. Copy the following text into the Policy Document:
   {
     "Version": "2012-10-17",
     "Statement": [
         {
             "Effect": "Allow",
             "Action": [
                 "ecs:RegisterContainerInstance",
                 "ecs:DeregisterContainerInstance",
                 "ecs:DiscoverPollEndpoint",
                 "ecs:Submit*",
                 "ecs:Poll",
                 "ecs:StartTask",
                 "ecs:StartTelemetrySession"
             ],
             "Resource": [
                 "*"
             ]
         }
     ]
   }
  1. Click Create Policy

Create a new instance including a startup script

Ideally you want the Datadog agent to load on one container on each EC2 instance. The easiest way to achieve this is to have a startup script on each instance used. Unfortunately there is no way to add a script to an existing instance. So you need to create a new instance and add it to your ECS cluster.

Create a new Amazon Linux instance
  1. Log in to the AWS console and navigate to the EC2 section.
  2. Create a new instance by clicking the Launch Instance button.
  3. Click on Community AMIs. Visit this page to see a list of current ECS optimized instances. Choose the appropriate AMI for your region and copy the ID into the search box. Choose the AMI that comes up as a result of the search.
  4. Follow the prompts as you normally would when setting up an instance.
  5. On the third dialog, select the IAM role you created above.
  6. Expand the Advanced Details section and copy the following script into the User Data section. Change cluster name to your cluster’s name and task definition to the name you gave your task definition.
 #!/bin/bash
mset -o pipefail

cluster="MY_CLUSTER" # Enter your cluster name here

task_def="datadog-agent-task"
touch /etc/ecs/ecs.config || {
    echo "Error: it seems like we are not running on an ECS-optimized instance" >&2
    exit 2
}
set -ex
echo ECS_CLUSTER=$cluster >> /etc/ecs/ecs.config
start ecs
yum install -y aws-cli jq
instance_arn=$( curl -f http://localhost:51678/v1/metadata | jq -re .ContainerInstanceArn | awk -F/ '{print $NF}')
az=$(curl -f http://169.254.169.254/latest/meta-data/placement/availability-zone)
region=${az:0:${#az} - 1}
echo "cluster=$cluster az=$az region=$region aws ecs start-task --cluster \
$cluster --task-definition $task_def --container-instances $instance_arn --region $region" >> /etc/rc.local

This user script above will: * Start the task defined with the right parameters * Add a few lines to /etc/rc.local so that the rebooted instance starts the task

Dynamic detection and monitoring of running services

Datadog’s Autodiscovery can be used in conjunction with ECS and Docker to automatically discovery and monitor running tasks in your environment.

Log collection

ECS logs are the legacy Docker container. They are not directly related to the ECS service, but they correspond to logs written by running applications in your Docker containers.

Option 1: AWS CLI

Follow the above instructions to install the Datadog Agent but with this datadog-agent-ecs-logs.json file.

Option 2: Web UI

There are a couple extra steps from the above instructions to collect logs with the Datadog Agent:

  1. Add the two Environment Variables DD_LOGS_ENABLED and DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL, both with a true value.
  2. Add another volume with the name pointdir and source path of /opt/datadog-agent/run.
  3. Add a new mount point for pointdir and enter /opt/datadog-agent/run in the Container path. Do not check the Read only checkbox.
  4. Restart the Datadog Agent.
Activate Log integrations

The source attribute is used to identify the integration to use for each container. Override it directly in your containers labels to start using log integrations. Read our autodiscovery guide for logs in order to learn more about this process.

Data Collected

Metrics

aws.ecs.cpuutilization
(gauge)
Average percentage of CPU units that are used in the cluster or service.
shown as percent
aws.ecs.cpuutilization.minimum
(gauge)
Minimum percentage of CPU units that are used in the cluster or service.
shown as percent
aws.ecs.cpuutilization.maximum
(gauge)
Maximum percentage of CPU units that are used in the cluster or service.
shown as percent
aws.ecs.memory_utilization
(gauge)
Average percentage of memory that is used in the cluster or service.
shown as fraction
aws.ecs.memory_utilization.minimum
(gauge)
Minimum percentage of memory that is used in the cluster or service.
shown as fraction
aws.ecs.memory_utilization.maximum
(gauge)
Maximum percentage of memory that is used in the cluster or service.
shown as fraction
aws.ecs.service.cpuutilization
(gauge)
Average percentage of CPU units that are used in the service.
shown as percent
aws.ecs.service.cpuutilization.minimum
(gauge)
Minimum percentage of CPU units that are used in the service.
shown as percent
aws.ecs.service.cpuutilization.maximum
(gauge)
Maximum percentage of CPU units that are used in the service.
shown as percent
aws.ecs.service.memory_utilization
(gauge)
Average percentage of memory that is used in the service.
shown as fraction
aws.ecs.service.memory_utilization.minimum
(gauge)
Minimum percentage of memory that is used in the service.
shown as fraction
aws.ecs.service.memory_utilization.maximum
(gauge)
Maximum percentage of memory that is used in the service.
shown as fraction
aws.ecs.cluster.cpuutilization
(gauge)
Average percentage of CPU units that are used in the cluster.
shown as percent
aws.ecs.cluster.cpuutilization.minimum
(gauge)
Minimum percentage of CPU units that are used in the cluster.
shown as percent
aws.ecs.cluster.cpuutilization.maximum
(gauge)
Maximum percentage of CPU units that are used in the cluster.
shown as percent
aws.ecs.cluster.memory_utilization
(gauge)
Average percentage of memory that is used in the cluster.
shown as fraction
aws.ecs.cluster.memory_utilization.minimum
(gauge)
Minimum percentage of memory that is used in the cluster.
shown as fraction
aws.ecs.cluster.memory_utilization.maximum
(gauge)
Maximum percentage of memory that is used in the cluster.
shown as fraction
aws.ecs.cpureservation
(gauge)
Average percentage of CPU units that are reserved by running tasks in the cluster.
shown as percent
aws.ecs.cpureservation.maximum
(gauge)
Maximum percentage of CPU units that are reserved by running tasks in the cluster.
shown as percent
aws.ecs.cpureservation.minimum
(gauge)
Minimum percentage of CPU units that are reserved by running tasks in the cluster.
shown as percent
aws.ecs.memory_reservation
(gauge)
Average percentage of memory that is reserved by running tasks in the cluster.
shown as percent
aws.ecs.memory_reservation.minimum
(gauge)
Minimum percentage of memory that is reserved by running tasks in the cluster.
shown as percent
aws.ecs.memory_reservation.maximum
(gauge)
Maximum percentage of memory that is reserved by running tasks in the cluster.
shown as percent
aws.ecs.running_tasks_count
(gauge)
The number of tasks on the container instance that are in the RUNNING status.
aws.ecs.pending_tasks_count
(gauge)
The number of tasks on the container instance that are in the PENDING status.
aws.ecs.registered_cpu
(gauge)
The number of CPU units registered on the container instance
aws.ecs.remaining_cpu
(gauge)
The number of CPU units remaining on the container instance
aws.ecs.registered_memory
(gauge)
The number of Memory units registered on the container instance
aws.ecs.remaining_memory
(gauge)
The number of Memory units remaining on the container instance
aws.ecs.services
(gauge)
The number of services running per cluster
aws.ecs.service.pending
(gauge)
The number of containers pending per service
aws.ecs.service.desired
(gauge)
The number of containers desired per service
aws.ecs.service.running
(gauge)
The number of containers running per service

Each of the metrics retrieved from AWS will be assigned the same tags that appear in the AWS console, including but not limited to host name, security-groups, and more.

Events

The AWS ECS integration collect those events:

  • Drain
  • Error
  • Fail
  • Out of memory
  • Pending
  • Reboot
  • Terminate

Service Checks

aws.ecs.agent_connected
Whether the ECS Agent is connected.

Troubleshooting

Need help? Contact Datadog Support.

Further Reading

Learn more about infrastructure monitoring and all our integrations on our blog