The Service Map for APM is here!

Amazon Elastic Container Service

Crawler Crawler

Overview

Amazon Elastic Container Service (ECS) is a highly scalable, high performance container management service for Docker containers running on EC2 instances.

Setup

This documentation page covers AWS ECS setup with Datadog Agent v6, if you want to set it up with Datadog Agent v5, refer to the dedicated documentation page AWS ECS with Agent v5 setup.

To monitor your ECS containers and tasks with Datadog, run the Agent as a container on every EC2 instance in your ECS cluster. As detailed below, there are a few setup steps:

  1. Add an ECS Task
  2. Create or Modify your IAM Policy
  3. Schedule the Datadog Agent as a Daemon Service

This documentation assume you already have a working EC2 Container Service cluster configured. If not, review the Getting Started section in the ECS documentation.

Metric collection

Create an ECS Task

This task launches the Datadog container. When you need to modify the configuration, update this Task Definition as described further down in this guide. If you’re using APM, DogSatsD or Logs, you must set the appropriate flags in the Task Definition:

  • If you are using APM, set portMappings so your downstream containers can ship traces to the Agent service. APM uses port 8126 and TCP to receive traces, so you should set this as a hostPort in the Task’s definition. Note that in order to enable trace collection from other containers, you must ensure that the DD_APM_NON_LOCAL_TRAFFIC environment variable is set to true. Learn more about APM with containers.

  • If you are using DogStatsD, set a hostPort of 8125 as UDP in the Task’s definition. Note that in order to enable DogstatsD metric collection from other containers, you must ensure the DD_DOGSTATSD_NON_LOCAL_TRAFFIC environment variable is set to true.

  • If you are using logs, refer to the dedicated Log collection section.

Double check your Security Group settings on your EC2 instances. Make sure these ports aren’t open to the public. Instead, we’ll be using the private IP to route to the Agent from the containers.

You may either configure the task using the AWS CLI tools or using the Amazon Web Console.

AWS CLI
  1. Download datadog-agent-ecs.json (datadog-agent-ecs1.json if you are using an original Amazon Linux AMI).
  2. Edit datadog-agent-ecs.json and update it with the DD_API_KEY for your account.
  3. Execute the following command: aws ecs register-task-definition --cli-input-json file://path/to/datadog-agent-ecs.json
Web UI
  1. Log in to your AWS Console and navigate to the EC2 Container Service section.
  2. Click on the cluster you wish to add Datadog to.
  3. Click on Task Definitions on the left side and click the button Create new Task Definition.
  4. Enter a Task Definition Name, such as datadog-agent-task.
  5. Click on the Add volume link.
  6. For Name enter docker_sock. For Source Path, enter /var/run/docker.sock. Click Add.
  7. Add another volume with the name proc and source path of /proc/.
  8. Add another volume with the name cgroup and source path of /cgroup/.
  9. Click the large Add container button.
  10. For Container name enter datadog-agent.
  11. For Image enter datadog/agent:latest.
  12. For Maximum memory enter 256.
  13. Scroll down to the Advanced container configuration section and enter 10 in CPU units.
  14. For Env Variables, add a Key of DD_API_KEY and enter your Datadog API Key in the value. If you feel more comfortable storing secrets like this in s3, take a look at the ECS Configuration guide.
  15. Add another Environment Variable for any tags you want to add using the key DD_TAGS.
  16. Scroll down to the Storage and Logging section.
  17. In Mount points select the docker_sock source volume and enter /var/run/docker.sock in the Container path. Check the Read only checkbox.
  18. Add another mount point for proc and enter /host/proc/ in the Container path. Check the Read only checkbox.
  19. Add a third mount point for cgroup and enter /host/sys/fs/cgroup in the Container path. Check the Read only checkbox.

Create or Modify your IAM Policy

  1. Add the following permissions to your Datadog IAM policy in order to collect Amazon ECS metrics. For more information on ECS policies, review the documentation on the AWS website.

    AWS Permission Description
    ecs:ListClusters List available clusters.
    ecs:ListContainerInstances List instances of a cluster.
    ecs:DescribeContainerInstances Describe instances to add metrics on resources and tasks running, adds cluster tag to ec2 instances.

Run the Agent as a Daemon Service

Ideally you want the Datadog Agent to load on one container on each EC2 instance. The easiest way to achieve this is to run the Datadog Agent as a Daemon Service.

Schedule a Daemon Service in AWS using Our ECS Task
  1. Log in to the AWS console and navigate to the ECS Clusters section. Click into your cluster you run the Agent on.
  2. Create a new service by clicking the Create button under Services.
  3. For launch type, select EC2. Then select the Task Definition we created before.
  4. For service type, select DAEMON, and enter a Service name. Click Next.
  5. Since the Service runs once on each instance, you won’t need a load balancer. Select None. Click Next.
  6. Daemon services don’t need Auto Scaling, so click Next Step, and then Create Service.

Dynamic detection and monitoring of running services

Datadog’s Autodiscovery can be used in conjunction with ECS and Docker to automatically discovery and monitor running tasks in your environment.

Log collection

This section explains how to collect logs written by running applications in your ECS containers.

Option 1: AWS CLI

Follow the above instructions to install the Datadog Agent but with this datadog-agent-ecs-logs.json file.

Option 2: Web UI

There are a couple extra steps from the above instructions to collect logs with the Datadog Agent:

  1. Add the two Environment Variables DD_LOGS_ENABLED and DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL, both with a true value.
  2. Add another volume with the name pointdir and source path of /opt/datadog-agent/run.
  3. Add a new mount point for pointdir and enter /opt/datadog-agent/run in the Container path. Do not check the Read only checkbox.
  4. Restart the Datadog Agent.

Activate Log integrations

The source attribute is used to identify the integration to use for each container. Override it directly in your containers labels to start using log integrations. Read our autodiscovery guide for logs in order to learn more about this process.

Trace collection

After installing the Datadog Agent, enable the datadog-trace-agent by setting the following parameters in the task definition for the datadog/agent container:

  • Port mapping: Host / Container port 8126, Protocol tcp
  • Env Variables: DD_APM_ENABLED=true, DD_APM_NON_LOCAL_TRAFFIC=true (enable trace collection from other containers)

Application container

Amazon’s EC2 metadata endpoint allows discovery of the private IP address for each underlying instance your containers are running on. Setting this IP address as your Trace Agent Hostname in your application container allows traces to be shipped to the Agent.

To get the private IP address for each host, curl the following URL, and set the result as your Trace Agent Hostname environment variable for each application container shipping to APM:

curl http://169.254.169.254/latest/meta-data/local-ipv4

Alternatively, you can set the hostname in your application’s source code. For example:

const tracer = require('dd-trace')
const request = require('request')

request('http://169.254.169.254/latest/meta-data/local-ipv4', function (error, resp, body)  {
  tracer.init({hostname: body})
})

Data Collected

Metrics

aws.ecs.cpuutilization
(gauge)
Average percentage of CPU units that are used in the cluster or service.
shown as percent
aws.ecs.cpuutilization.minimum
(gauge)
Minimum percentage of CPU units that are used in the cluster or service.
shown as percent
aws.ecs.cpuutilization.maximum
(gauge)
Maximum percentage of CPU units that are used in the cluster or service.
shown as percent
aws.ecs.memory_utilization
(gauge)
Average percentage of memory that is used in the cluster or service.
shown as fraction
aws.ecs.memory_utilization.minimum
(gauge)
Minimum percentage of memory that is used in the cluster or service.
shown as fraction
aws.ecs.memory_utilization.maximum
(gauge)
Maximum percentage of memory that is used in the cluster or service.
shown as fraction
aws.ecs.service.cpuutilization
(gauge)
Average percentage of CPU units that are used in the service.
shown as percent
aws.ecs.service.cpuutilization.minimum
(gauge)
Minimum percentage of CPU units that are used in the service.
shown as percent
aws.ecs.service.cpuutilization.maximum
(gauge)
Maximum percentage of CPU units that are used in the service.
shown as percent
aws.ecs.service.memory_utilization
(gauge)
Average percentage of memory that is used in the service.
shown as fraction
aws.ecs.service.memory_utilization.minimum
(gauge)
Minimum percentage of memory that is used in the service.
shown as fraction
aws.ecs.service.memory_utilization.maximum
(gauge)
Maximum percentage of memory that is used in the service.
shown as fraction
aws.ecs.cluster.cpuutilization
(gauge)
Average percentage of CPU units that are used in the cluster.
shown as percent
aws.ecs.cluster.cpuutilization.minimum
(gauge)
Minimum percentage of CPU units that are used in the cluster.
shown as percent
aws.ecs.cluster.cpuutilization.maximum
(gauge)
Maximum percentage of CPU units that are used in the cluster.
shown as percent
aws.ecs.cluster.memory_utilization
(gauge)
Average percentage of memory that is used in the cluster.
shown as fraction
aws.ecs.cluster.memory_utilization.minimum
(gauge)
Minimum percentage of memory that is used in the cluster.
shown as fraction
aws.ecs.cluster.memory_utilization.maximum
(gauge)
Maximum percentage of memory that is used in the cluster.
shown as fraction
aws.ecs.cpureservation
(gauge)
Average percentage of CPU units that are reserved by running tasks in the cluster.
shown as percent
aws.ecs.cpureservation.maximum
(gauge)
Maximum percentage of CPU units that are reserved by running tasks in the cluster.
shown as percent
aws.ecs.cpureservation.minimum
(gauge)
Minimum percentage of CPU units that are reserved by running tasks in the cluster.
shown as percent
aws.ecs.memory_reservation
(gauge)
Average percentage of memory that is reserved by running tasks in the cluster.
shown as percent
aws.ecs.memory_reservation.minimum
(gauge)
Minimum percentage of memory that is reserved by running tasks in the cluster.
shown as percent
aws.ecs.memory_reservation.maximum
(gauge)
Maximum percentage of memory that is reserved by running tasks in the cluster.
shown as percent
aws.ecs.running_tasks_count
(gauge)
The number of tasks on the container instance that are in the RUNNING status.
aws.ecs.pending_tasks_count
(gauge)
The number of tasks on the container instance that are in the PENDING status.
aws.ecs.registered_cpu
(gauge)
The number of CPU units registered on the container instance
aws.ecs.remaining_cpu
(gauge)
The number of CPU units remaining on the container instance
aws.ecs.registered_memory
(gauge)
The number of Memory units registered on the container instance
aws.ecs.remaining_memory
(gauge)
The number of Memory units remaining on the container instance
aws.ecs.services
(gauge)
The number of services running per cluster
aws.ecs.service.pending
(gauge)
The number of containers pending per service
aws.ecs.service.desired
(gauge)
The number of containers desired per service
aws.ecs.service.running
(gauge)
The number of containers running per service

Each of the metrics retrieved from AWS is assigned the same tags that appear in the AWS console, including but not limited to host name, security-groups, and more.

Events

The AWS ECS integration includes events for deployment, drain, error, fail, and insufficient memory. See example events below:

AWS ECS Events

Service Checks

aws.ecs.agent_connected
Whether the ECS Agent is connected.

Troubleshooting

Need help? Contact Datadog Support.

Further Reading

Additional helpful documentation, links, and articles: