7月16日〜17日にニューヨークで開催されるDashカンファレンスにご参加ください。 7月16日〜17日にニューヨークで開催されるDashカンファレンスにご参加ください。

Amazon Elastic Container Service (ECS)

Crawler クローラー
このページは英語では対応しておりません。随時翻訳に取り組んでいます。翻訳に関してご質問やご意見ございましたら、お気軽にご連絡ください。

Overview

Amazon Elastic Container Service (ECS) is a highly scalable, high performance container management service for Docker containers running on EC2 instances.

This page covers AWS ECS setup with Datadog Container Agent v6. For other setups, see:

Setup

To monitor your ECS containers and tasks with Datadog, run the Agent as a container on every EC2 instance in your ECS cluster. As detailed below, there are a few setup steps:

  1. Add an ECS Task
  2. Create or Modify your IAM Policy
  3. Schedule the Datadog Agent as a Daemon Service

If you don’t have a working EC2 Container Service cluster configured, review the Getting Started section in the ECS documentation.

Metric collection

Create an ECS Task

This task launches the Datadog container. When you need to modify the configuration, update this task definition as described further down in this guide. If you’re using APM, DogStatsD, or logs, set the appropriate flags in the task definition:

  • If you are using APM, set portMappings so your downstream containers can ship traces to the Agent service. APM uses port 8126 and TCP to receive traces, so set this as a hostPort in the task’s definition. Note that in order to enable trace collection from other containers, you must ensure that the DD_APM_NON_LOCAL_TRAFFIC environment variable is set to true. Learn more about APM with containers.

  • If you are using DogStatsD, set a hostPort of 8125 as UDP in the task’s definition. Note that in order to enable DogStatsD metrics collection from other containers, you must ensure the DD_DOGSTATSD_NON_LOCAL_TRAFFIC environment variable is set to true.

  • If you are using logs, refer to the dedicated Log collection documentation.

Double check the security group settings on your EC2 instances. Make sure these ports are not open to the public. Datadog uses the private IP to route to the Agent from the containers.

You may either configure the task using the AWS CLI tools or using the Amazon Web Console.

AWS CLI
  1. Download datadog-agent-ecs.json (datadog-agent-ecs1.json if you are using an original Amazon Linux AMI).
  2. Edit datadog-agent-ecs.json and set <YOUR_DATADOG_API_KEY> with the Datadog API key for your account.
  3. Optionally - If you are in Datadog EU site, edit datadog-agent-ecs.json and set DD_SITE to DD_SITE:datadoghq.eu.
  4. Optionally - See log collection to activate log collection.
  5. Optionally - See process collection to activate process collection.

  6. Execute the following command:

    aws ecs register-task-definition --cli-input-json file://path/to/datadog-agent-ecs.json
    
Web UI
  1. Log in to your AWS Console and navigate to the EC2 Container Service section.
  2. Click on the cluster you wish to add Datadog to.
  3. Click on Task Definitions on the left side and click the button Create new Task Definition.
  4. Enter a Task Definition Name, such as datadog-agent-task.
  5. Click on the Add volume link.
  6. For Name enter docker_sock. For Source Path, enter /var/run/docker.sock. Click Add.
  7. Add another volume with the name proc and source path of /proc/.
  8. Add another volume with the name cgroup and source path of /sys/fs/cgroup/ (or /cgroup/ if you are using an original Amazon Linux AMI).
  9. Click the large Add container button.
  10. For Container name enter datadog-agent.
  11. For Image enter datadog/agent:latest.
  12. For Maximum memory enter 256. Note: For high resource usage, you may need a higher memory limit.
  13. Scroll down to the Advanced container configuration section and enter 10 in CPU units.
  14. For Env Variables, add a Key of DD_API_KEY and enter your Datadog API Key in the value. If you feel more comfortable storing secrets like this in s3, take a look at the ECS Configuration guide.
  15. Add another Environment Variable for any tags you want to add using the key DD_TAGS.
  16. Scroll down to the Storage and Logging section.
  17. In Mount points select the docker_sock source volume and enter /var/run/docker.sock in the Container path. Check the Read only checkbox.
  18. Add another mount point for proc and enter /host/proc/ in the Container path. Check the Read only checkbox.
  19. Add a third mount point for cgroup and enter /host/sys/fs/cgroup in the Container path. Check the Read only checkbox (use /host/cgroup/ if you are using an original Amazon Linux AMI).

Note: Setting the Datadog task definition to use 10 CPU units can cause the aws.ecs.cpuutilization for service:datadog-agent to display as running at 1000%. This is a peculiarity of how AWS displays CPU utilization. You can add more CPU units to avoid skewing your graph.

Create or Modify your IAM Policy

  1. Add the following permissions to your Datadog IAM policy in order to collect Amazon ECS metrics. For more information on ECS policies, review the documentation on the AWS website.

    AWS PermissionDescription
    ecs:ListClustersList available clusters.
    ecs:ListContainerInstancesList instances of a cluster.
    ecs:DescribeContainerInstancesDescribe instances to add metrics on resources and tasks running, adds cluster tag to ec2 instances.

Run the Agent as a Daemon Service

Ideally you want the Datadog Agent to load on one container on each EC2 instance. The easiest way to achieve this is to run the Datadog Agent as a Daemon Service.

Schedule a Daemon Service in AWS using Datadog’s ECS Task
  1. Log in to the AWS console and navigate to the ECS Clusters section. Click into your cluster you run the Agent on.
  2. Create a new service by clicking the Create button under Services.
  3. For launch type, select EC2 then the task definition created previously.
  4. For service type, select DAEMON, and enter a Service name. Click Next.
  5. Since the Service runs once on each instance, you won’t need a load balancer. Select None. Click Next.
  6. Daemon services don’t need Auto Scaling, so click Next Step, and then Create Service.

Dynamic detection and monitoring of running services

Datadog’s Autodiscovery can be used in conjunction with ECS and Docker to automatically discovery and monitor running tasks in your environment.

AWSVPC Mode

For Agent v6.10+, awsvpc mode is supported for both applicative containers and the Agent container, provided:

  1. For the apps and the Agent in awsvpc mode, security groups must be set to allow:
    • The Agent’s security group to reach the applicative containers on relevant ports.
    • The Agent’s security group to reach the host instances on TCP port 51678. The ECS Agent container must either run in host network mode (default) or have a port binding on the host.

  2. For apps in awsvpc mode and the Agent in bridge mode, security groups must be set to allow the host instances security group to reach the applicative containers on relevant ports.

Log collection

To collect all logs written by running applications in your ECS containers and send it to your Datadog application:

  1. Follow the above instructions to install the Datadog Agent.
  2. Update your datadog-agent-ecs.json file (datadog-agent-ecs1.json if you are using an original Amazon Linux AMI) with the following configuration:
{
    "containerDefinitions": [
    (...)
      "mountPoints": [
        (...)
        {
          "containerPath": "/opt/datadog-agent/run",
          "sourceVolume": "pointdir",
          "readOnly": false
        },
        (...)
      ],
      "environment": [
        (...)
        {
          "name": "DD_LOGS_ENABLED",
          "value": "true"
        },
        {
          "name": "DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL",
          "value": "true"
        },
        (...)
      ]
    }
  ],
  "volumes": [
    (...)
    {
      "host": {
        "sourcePath": "/opt/datadog-agent/run"
      },
      "name": "pointdir"
    },
    (...)
  ],
  "family": "datadog-agent-task"
}

Activate Log integrations

The source attribute is used to identify the integration to use for each container. Override it directly in your containers labels to start using log integrations. Read Datadog’s Autodiscovery guide for logs in order to learn more about this process.

Process collection

To collect processes information for all your containers and send it to Datadog:

  1. Follow the above instructions to install the Datadog Agent.
  2. Update your datadog-agent-ecs.json file (datadog-agent-ecs1.json if you are using an original Amazon Linux AMI) with the following configuration:
{
  "containerDefinitions": [
    (...)
      "mountPoints": [
        (...)
        {
          "containerPath": "/etc/passwd",
          "sourceVolume": "passwd",
          "readOnly": true
        },
        (...)
      ],
      "environment": [
        (...)
        {
          "name": "DD_PROCESS_AGENT_ENABLED",
          "value": "true"
        }
      ]
    }
  ],
  "volumes": [
    (...)
    {
      "host": {
        "sourcePath": "/etc/passwd"
      },
      "name": "passwd"
    },
    (...)
  ],
  "family": "datadog-agent-task"
}

Trace collection

After installing the Datadog Agent, enable the datadog-trace-agent by setting the following parameters in the task definition for the datadog/agent container:

  • Port mapping: Host / Container port 8126, Protocol tcp
  • Env Variables: DD_APM_ENABLED=true, DD_APM_NON_LOCAL_TRAFFIC=true (enable trace collection from other containers)

Application container

Amazon’s EC2 metadata endpoint allows discovery of the private IP address for each underlying instance your containers are running on. Setting this IP address as your Trace Agent Hostname in your application container allows traces to be shipped to the Agent.

To get the private IP address for each host, curl the following URL, and set the result as your Trace Agent Hostname environment variable for each application container shipping to APM:

curl http://169.254.169.254/latest/meta-data/local-ipv4
os.environ['DATADOG_TRACE_AGENT_HOSTNAME'] = <EC2_PRIVATE_IP>

In cases where variables on your ECS application are set at launch time, you must set the hostname as an environment variable.

Otherwise, you can set the hostname in your application’s source code. For example:

import requests
from ddtrace import tracer


def get_aws_ip():
  r = requests.get('http://169.254.169.254/latest/meta-data/local-ipv4')
  return r.text

tracer.configure(hostname=get_aws_ip())
const tracer = require('dd-trace')
const request = require('request')

request('http://169.254.169.254/latest/meta-data/local-ipv4', function (error, resp, body){
  tracer.init({hostname: body})
})

For more examples of how to set the Agent hostname in other languages, refer to the Change Agent Hostname documentation.

Data Collected

Metrics

aws.ecs.cpuutilization
(gauge)
Average percentage of CPU units that are used in the cluster or service.
Shown as percent
aws.ecs.cpuutilization.minimum
(gauge)
Minimum percentage of CPU units that are used in the cluster or service.
Shown as percent
aws.ecs.cpuutilization.maximum
(gauge)
Maximum percentage of CPU units that are used in the cluster or service.
Shown as percent
aws.ecs.memory_utilization
(gauge)
Average percentage of memory that is used in the cluster or service.
Shown as fraction
aws.ecs.memory_utilization.minimum
(gauge)
Minimum percentage of memory that is used in the cluster or service.
Shown as fraction
aws.ecs.memory_utilization.maximum
(gauge)
Maximum percentage of memory that is used in the cluster or service.
Shown as fraction
aws.ecs.service.cpuutilization
(gauge)
Average percentage of CPU units that are used in the service.
Shown as percent
aws.ecs.service.cpuutilization.minimum
(gauge)
Minimum percentage of CPU units that are used in the service.
Shown as percent
aws.ecs.service.cpuutilization.maximum
(gauge)
Maximum percentage of CPU units that are used in the service.
Shown as percent
aws.ecs.service.memory_utilization
(gauge)
Average percentage of memory that is used in the service.
Shown as fraction
aws.ecs.service.memory_utilization.minimum
(gauge)
Minimum percentage of memory that is used in the service.
Shown as fraction
aws.ecs.service.memory_utilization.maximum
(gauge)
Maximum percentage of memory that is used in the service.
Shown as fraction
aws.ecs.cluster.cpuutilization
(gauge)
Average percentage of CPU units that are used in the cluster.
Shown as percent
aws.ecs.cluster.cpuutilization.minimum
(gauge)
Minimum percentage of CPU units that are used in the cluster.
Shown as percent
aws.ecs.cluster.cpuutilization.maximum
(gauge)
Maximum percentage of CPU units that are used in the cluster.
Shown as percent
aws.ecs.cluster.memory_utilization
(gauge)
Average percentage of memory that is used in the cluster.
Shown as fraction
aws.ecs.cluster.memory_utilization.minimum
(gauge)
Minimum percentage of memory that is used in the cluster.
Shown as fraction
aws.ecs.cluster.memory_utilization.maximum
(gauge)
Maximum percentage of memory that is used in the cluster.
Shown as fraction
aws.ecs.cpureservation
(gauge)
Average percentage of CPU units that are reserved by running tasks in the cluster.
Shown as percent
aws.ecs.cpureservation.maximum
(gauge)
Maximum percentage of CPU units that are reserved by running tasks in the cluster.
Shown as percent
aws.ecs.cpureservation.minimum
(gauge)
Minimum percentage of CPU units that are reserved by running tasks in the cluster.
Shown as percent
aws.ecs.memory_reservation
(gauge)
Average percentage of memory that is reserved by running tasks in the cluster.
Shown as percent
aws.ecs.memory_reservation.minimum
(gauge)
Minimum percentage of memory that is reserved by running tasks in the cluster.
Shown as percent
aws.ecs.memory_reservation.maximum
(gauge)
Maximum percentage of memory that is reserved by running tasks in the cluster.
Shown as percent
aws.ecs.running_tasks_count
(gauge)
The number of tasks on the container instance that are in the RUNNING status.
aws.ecs.pending_tasks_count
(gauge)
The number of tasks on the container instance that are in the PENDING status.
aws.ecs.registered_cpu
(gauge)
The number of CPU units registered on the container instance
aws.ecs.remaining_cpu
(gauge)
The number of CPU units remaining on the container instance
aws.ecs.registered_memory
(gauge)
The number of Memory units registered on the container instance
aws.ecs.remaining_memory
(gauge)
The number of Memory units remaining on the container instance
aws.ecs.services
(gauge)
The number of services running per cluster
aws.ecs.service.pending
(gauge)
The number of containers pending per service
aws.ecs.service.desired
(gauge)
The number of containers desired per service
aws.ecs.service.running
(gauge)
The number of containers running per service

Each of the metrics retrieved from AWS is assigned the same tags that appear in the AWS console, including but not limited to host name, security-groups, and more.

Events

To reduce noise, the AWS ECS integration is automatically whitelisted to include only events that contain the following words: drain, error, fail, insufficient memory, pending, reboot, terminate. See example events below:

To remove the whitelist and receive all events from your Datadog AWS ECS integration, reach out to Datadog support.

Service Checks

  • aws.ecs.agent_connected: Returns CRITICAL if the Agent cannot connect, otherwise OK.

Troubleshooting

Need help? Contact Datadog support.

Further Reading

お役に立つドキュメント、リンクや記事: