Note: This page describes the ECS Fargate integration. For EKS Fargate, see the documentation for Datadog’s EKS Fargate integration.
Get metrics from all your containers running in ECS Fargate:
CPU/Memory usage & limit metrics
Monitor your applications running on Fargate using Datadog integrations or custom metrics.
The Datadog Agent retrieves metrics for the task definition’s containers with the ECS task metadata endpoint. According to the ECS Documentation on that endpoint:
This endpoint returns Docker stats JSON for all of the containers associated with the task. For more information about each of the returned stats, see ContainerStats in the Docker API documentation.
The Task Metadata endpoint is only available from within the task definition itself, which is why the Datadog Agent needs to be run as an additional container within each task definition to be monitored.
The only configuration required to enable this metrics collection is to set an environment variable ECS_FARGATE to "true" in the task definition.
The following steps cover setup of the Datadog Container Agent within AWS ECS Fargate. Note: Datadog Agent version 6.1.1 or higher is needed to take full advantage of the Fargate integration.
Tasks that do not have the Datadog Agent still report metrics with Cloudwatch, however the Agent is needed for Autodiscovery, detailed container metrics, tracing, and more. Additionally, Cloudwatch metrics are less granular, and have more latency in reporting than metrics shipped directly through the Datadog Agent.
To monitor your ECS Fargate tasks with Datadog, run the Agent as a container in same task definition as your application container. To collect metrics with Datadog, each task definition should include a Datadog Agent container in addition to the application containers. Follow these setup steps:
Create an ECS Fargate task
Create or Modify your IAM Policy
Run the task as a replica service
Create an ECS Fargate task
The primary unit of work in Fargate is the task, which is configured in the task definition. A task definition is comparable to a pod in Kubernetes. A task definition must contain one or more containers. In order to run the Datadog Agent, create your task definition to run your application container(s), as well as the Datadog Agent container.
You can use AWS CloudFormation templating to configure your Fargate containers. Use the AWS::ECS::TaskDefinition resource within your CloudFormation template to set the Amazon ECS task and specify FARGATE as the required launch type for that task.
Update this CloudFormation template below with your Datadog API Key. As well as include the appropriate DD_SITE () environment variable if necessary, as this defaults to datadoghq.com if you don’t set it.
In the CloudFormation template you can reference the ECSTaskDefinition resource created in the previous example into the AWS::ECS::Service resource being created. After this specify your Cluster, DesiredCount, and any other parameters necessary for your application in your replica service.
After the Datadog Agent is setup as described above, the ecs_fargate check collects metrics with autodiscovery enabled. Add Docker labels to your other containers in the same task to collect additional metrics.
Although the integration works on Linux and Windows, some metrics are OS dependent. All metrics exposed when running on Windows are also exposed on Linux, but there are some metrics that are only available on Linux. See Data Collected for the list of metrics provided by this integration. The list also specifies which metrics are Linux-only.
Metrics are collected with DogStatsD through UDP port 8125.
To send custom metrics by listening to DogStatsD packets from other containers, set the environment variable DD_DOGSTATSD_NON_LOCAL_TRAFFIC to true within the Datadog Agent container.
Other environment variables
For environment variables available with the Docker Agent container, see the Docker Agent page. Note: Some variables are not be available for Fargate.
Extract docker container labels
Add tags to check metrics
Add tags to custom metrics
For global tagging, it is recommended to use DD_DOCKER_LABELS_AS_TAGS. With this method, the Agent pulls in tags from your container labels. This requires you to add the appropriate labels to your other containers. Labels can be added directly in the task definition.
Note: You should not use DD_HOSTNAME since there is no concept of a host to the user in Fargate. DD_TAGS is traditionally used to assign host tags, but as of Datadog Agent version 6.13.0 you can also use the environment variable to set global tags on your integration metrics.
In addition to the metrics collected by the Datadog Agent, Datadog has a CloudWatch based ECS integration. This integration collects the Amazon ECS CloudWatch Metrics.
As noted there, Fargate tasks also report metrics in this way:
The metrics made available will depend on the launch type of the tasks and services in your clusters. If you are using the Fargate launch type for your services then CPU and memory utilization metrics are provided to assist in the monitoring of your services.
Since this method does not use the Datadog Agent, you need to configure the AWS integration by checking ECS on the integration tile. Then, Datadog pulls these CloudWatch metrics (namespaced aws.ecs.* in Datadog) on your behalf. See the Data Collected section of the documentation.
If these are the only metrics you need, you could rely on this integration for collection using CloudWatch metrics. Note: CloudWatch data is less granular (1-5 min depending on the type of monitoring you have enabled) and delayed in reporting to Datadog. This is because the data collection from CloudWatch must adhere to AWS API limits, instead of pushing it to Datadog with the Agent.
Datadog’s default CloudWatch crawler polls metrics once every 10 minutes. If you need a faster crawl schedule, contact Datadog support for availability. Note: There are cost increases involved on the AWS side as CloudWatch bills for API calls.
You can monitor Fargate logs by using either:
The AWS FireLens integration built on Datadog’s Fluent Bit output plugin to send logs directly to Datadog
Using the awslogs log driver to store the logs in a CloudWatch Log Group, and then a Lambda function to route logs to Datadog
Datadog recommends using AWS FireLens because you can configure Fluent Bit directly in your Fargate tasks.
Add the Fluent Bit FireLens log router container in your existing Fargate task. For more information about enabling FireLens, see the dedicated AWS Firelens docs. For more information about Fargate container definitions, see the AWS docs on Container Definitions. AWS recommends that you use the regional Docker image. Here is an example snippet of a task definition where the Fluent Bit image is configured:
Next, in the same Fargate task define a log configuration for the desired containers to ship logs. This log configuration should have AWS FireLens as the log driver, and with data being output to Fluent Bit. Here is an example snippet of a task definition where the FireLens is the log driver, and it is outputting data to Fluent Bit:
Note: Set your apikey as well as the Host relative to your respective site http-intake.logs.. The full list of available parameters is described in the Datadog Fluent Bit documentation.
The dd_service, dd_source, and dd_tags can be adjusted for your desired tags.
Whenever a Fargate task runs, Fluent Bit sends the container logs to Datadog with information about all of the containers managed by your Fargate tasks. You can see the raw logs on the Log Explorer page, build monitors for the logs, and use the Live Container view.
To add the Fluent Bit container to your existing Task Definition check the Enable FireLens integration checkbox under Log router integration to automatically create the log_router container for you. This pulls the regional image, however, we do recommend to use the stable image tag instead of latest. Once you click Apply this creates the base container. To further customize the firelensConfiguration click the Configure via JSON button at the bottom to edit this manually.
After this has been added edit the application container in your Task Definition that you want to submit logs from and change the Log driver to awsfirelens filling in the Log options with the keys shown in the above example.
Edit the existing task definition JSON file that you have to contain the log_router container and the updated logConfiguration for your application container, as described in the previous section. Once this is done you can create a new revision of your task definition with:
For more information about using the awslogs log driver in your task definitions to send container logs to CloudWatch Logs, see Using the awslogs Log Driver. This driver collects logs generated by the container and sends them to CloudWatch directly.
Follow the instructions above to add the Datadog Agent container to your task definition with the additional environment variable DD_APM_ENABLED set to true and set up a container port that uses 8126 with tcp protocol under port mappings. Set the DD_SITE variable to . It defaults to datadoghq.com if you don’t set it.
Instrument your application based on your setup. With Fargate APM applications do not set DD_AGENT_HOST, the default of localhost works.
Ensure your application is running in the same task definition as the Datadog Agent container.
The Agent can autodiscover and attach tags to all data emitted by the entire task or an individual container within this task. The list of tags automatically attached depends on the Agent’s cardinality configuration.
Number of write operations to the disk (Linux only).
Number of bytes written to the disk (Linux only). Shown as byte
Number of read operation on the disk (Linux only).
Number of bytes read on the disk (Linux only). Shown as byte
User CPU time. Shown as nanocore
System CPU time. Shown as nanocore
Total CPU Usage. Shown as nanocore
Soft limit (CPU Shares) in CPU Units.
Task CPU Limit (shared by all containers). Shown as nanocore
Percentage of CPU used per container (Linux only). Shown as percent
Number of bytes of page cache memory (Linux only). Shown as byte
Number of bytes of file-backed memory on active LRU list (Linux only). Shown as byte
Number of bytes of file-backed memory on inactive LRU list (Linux only). Shown as byte
Number of bytes memory limit (Linux only). Shown as byte
Number of bytes of anonymous and swap cache memory on active LRU list (Linux only). Shown as byte
Number of bytes of memory used. Shown as byte
Number of bytes of anonymous and swap cache memory (includes transparent hugepages) (Linux only). Shown as byte
Number of uncharging events to the memory cgroup. The uncharging event happens each time a page is unaccounted from the cgroup (Linux only).
Number of charging events to the memory cgroup. The charging event happens each time a page is accounted as either mapped anon page(RSS) or cache page(Page Cache) to the cgroup (Linux only).
Number of page faults per second (Linux only).
Number of major page faults per second (Linux only).
Number of bytes of mapped file (includes tmpfs/shmem) (Linux only). Shown as byte
Show max memory usage recorded. Shown as byte
Number of bytes of memory limit with regard to hierarchy under which the memory cgroup is (Linux only). Shown as byte
Number of bytes of memory+swap limit with regard to hierarchy under which memory cgroup is (Linux only). Shown as byte
Number of received errors (Fargate 1.4.0+ required). Shown as error
Number of sent errors (Fargate 1.4.0+ required). Shown as error
Number of ingoing packets dropped (Fargate 1.4.0+ required). Shown as packet
Number of outgoing packets dropped (Fargate 1.4.0+ required). Shown as packet
Number of bytes received (Fargate 1.4.0+ required). Shown as byte
Number of bytes sent (Fargate 1.4.0+ required). Shown as byte
The ECS Fargate check does not include any events.
fargate_check Returns CRITICAL if the Agent is unable to connect to Fargate, otherwise returns OK. Statuses: ok, critical