The Service Map for APM is here!

Docker

Agent Check Agent Check

Supported OS: Linux Mac OS

Docker default dashboard

Overview

Configure this Agent check to get metrics from docker_daemon service in real time to:

  • Visualize and monitor docker_daemon states
  • Be notified about docker_daemon failovers and events.

NOTE: The Docker check has been rewritten in Go for Agent v6 to take advantage of the new internal architecture. Hence it is still maintained but only works with Agents prior to major version 6.

To learn how to use the Docker_daemon Integration with the Agent major version 6 Consult our dedicated agent v6 setup.

Setup

Installation

To collect Docker metrics about all your containers, run one Datadog Agent on every host. There are two ways to run the Agent: directly on each host, or within a docker-dd-agent container. We recommend the latter.

Whichever you choose, your hosts need to have cgroup memory management enabled for the Docker check to succeed. See the docker-dd-agent repository for how to enable it.

Host Installation

  1. Ensure Docker is running on the host.
  2. Install the Agent as described in the Agent installation instructions for your host OS.
  3. Enable the Docker integration tile in the application.
  4. Add the Agent user to the docker group: usermod -a -G docker dd-agent
  5. Create a docker_daemon.yaml file by copying the example file in the agent conf.d directory. If you have a standard install of Docker on your host, there shouldn’t be anything you need to change to get the integration to work.
  6. To enable other integrations, use docker ps to identify the ports used by the corresponding applications. Docker ps command

Note: docker_daemon has replaced the older docker integration.

Container Installation

  1. Ensure Docker is running on the host.
  2. As per the docker container installation instructions, run:

    docker run -d --name dd-agent \
      -v /var/run/docker.sock:/var/run/docker.sock:ro \
      -v /proc/:/host/proc/:ro \
      -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
      -e API_KEY={YOUR API KEY} \
      datadog/docker-dd-agent:latest
    

Note that in the command above, you are able to pass your API key to the Datadog Agent using Docker’s -e environment variable flag. Some other variables you can pass include:

Variable Description
API_KEY Sets your Datadog API key.
DD_HOSTNAME Sets the hostname in the Agent container’s datadog.conf file. If this variable is not set, the Agent container will default to using the Name field (as reported by the docker info command) as the Agent container hostname.
DD_URL Sets the Datadog intake server URL where the Agent will send data. This is useful when using an agent as a proxy.
LOG_LEVEL Sets logging verbosity (CRITICAL, ERROR, WARNING, INFO, DEBUG). For example, -e LOG_LEVEL=DEBUG will set logging to debug mode.
TAGS Sets host tags as a comma delimited string. You can pass both simple tags and key-value tags. For example, -e TAGS="simple-tag, tag-key:tag-value".
EC2_TAGS Enabling this feature allows the agent to query and capture custom tags set using the EC2 API during startup. To enable, set the value to “yes”, for example, -e EC2_TAGS=yes. Note that this feature requires an IAM role associated with the instance.
NON_LOCAL_TRAFFIC Enabling this feature will allow statsd reporting from any external IP. For example, -e NON_LOCAL_TRAFFIC=yes. This can be used to report metrics from other containers or systems. See network configuration for more details.
PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASSWORD Sets proxy configuration details. For more information, see the Agent proxy documentation
SD_BACKEND, SD_CONFIG_BACKEND, SD_BACKEND_HOST, SD_BACKEND_PORT, SD_TEMPLATE_DIR, SD_CONSUL_TOKEN Enables and configures Autodiscovery. For more information, please see the Autodiscovery guide.

Note: Add --restart=unless-stopped if you want your agent to be resistant to restarts.

Running the agent container on Amazon Linux

To run the Datadog Agent container on Amazon Linux, you’ll need to make one small change to the cgroup volume mount location:

docker run -d --name dd-agent \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /cgroup/:/host/sys/fs/cgroup:ro \
  -e API_KEY={YOUR API KEY} \
  datadog/docker-dd-agent:latest

Alpine Linux based container

Our standard Docker image is based on Debian Linux, but as of version 5.7 of the Datadog Agent, we also offer an Alpine Linux based image. The Alpine Linux image is considerably smaller in size than the traditional Debian-based image. It also inherits Alpine’s security-oriented design.

To use the Alpine Linux image, simply append -alpine to the version tag. For example:

docker run -d --name dd-agent \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
  -e API_KEY={YOUR API KEY} \
  datadog/docker-dd-agent:latest-alpine

Image versioning

Starting with version 5.5.0 of the Datadog Agent, the docker image follows a new versioning pattern. This allows us to release changes to the Docker image of the Datadog Agent but with the same version of the Agent.

The Docker image version will have the following pattern: X.Y.Z where X is the major version of the Docker Image, Y is the minor version, Z will represent the Agent version.

For example, the first version of the Docker image that will bundle the Datadog Agent 5.5.0 will be: 10.0.550

Custom containers and additional information

For more information about building custom Docker containers with the Datadog Agent, the Alpine Linux based image, versioning, and more, please reference our docker-dd-agent project on Github.

Validation

Run the Agent’s status subcommand and look for docker_daemon under the Checks section.

Agent v6

The new docker check is named docker. Starting from version 6.0, the Agent won’t load the docker_daemon check anymore, even if it is still available and maintained for Agent version 5.x. All features are ported on version >6.0 , excepted the following deprecations:

  • the url, api_version and tags* options are deprecated, direct use of the standard docker environment variables is encouraged.
  • the ecs_tags, performance_tags and container_tags options are deprecated. Every relevant tag is now collected by default.
  • the collect_container_count option to enable the docker.container.count metric is not supported. docker.containers.running and .stopped are to be used.

Some options have moved from docker_daemon.yaml to the main datadog.yaml:

  • collect_labels_as_tags has been renamed docker_labels_as_tags and now supports high cardinality tags, see the details in datadog.yaml.example
  • exclude and include lists have been renamed ac_include and ac_exclude. In order to make filtering consistent accross all components of the agent, we had to drop filtering on arbitrary tags. The only supported filtering tags are image (image name) and name (container name). Regexp filtering is still available, see datadog.yaml.example for examples
  • docker_root option has been split in two options container_cgroup_root and container_proc_root
  • exclude_pause_container has been added to exclude pause containers on Kubernetes and Openshift (default to true). This will avoid removing them from the exclude list by error

The import command converts the old docker_daemon.yaml to the new docker.yaml. The command also moves needed settings from docker_daemon.yaml to datadog.yaml.

Data Collected

Metrics

docker.cpu.system
(gauge)
The percent of time the CPU is executing system calls on behalf of processes of this container, unnormalized
shown as percent
docker.cpu.system.95percentile
(gauge)
95th percentile of docker.cpu.system
shown as percent
docker.cpu.system.avg
(gauge)
Average value of docker.cpu.system
shown as percent
docker.cpu.system.count
(rate)
The rate that the value of docker.cpu.system was sampled
shown as sample
docker.cpu.system.max
(gauge)
Max value of docker.cpu.system
shown as percent
docker.cpu.system.median
(gauge)
Median value of docker.cpu.system
shown as percent
docker.cpu.user
(gauge)
The percent of time the CPU is under direct control of processes of this container, unnormalized
shown as percent
docker.cpu.user.95percentile
(gauge)
95th percentile of docker.cpu.user
shown as percent
docker.cpu.user.avg
(gauge)
Average value of docker.cpu.user
shown as percent
docker.cpu.user.count
(rate)
The rate that the value of docker.cpu.user was sampled
shown as sample
docker.cpu.user.max
(gauge)
Max value of docker.cpu.user
shown as percent
docker.cpu.user.median
(gauge)
Median value of docker.cpu.user
shown as percent
docker.cpu.usage
(gauge)
The percent of CPU time obtained by this container
shown as percent
docker.cpu.throttled
(gauge)
Number of times the cgroup has been throttled
docker.cpu.shares
(gauge)
Shares of CPU usage allocated to the container
docker.mem.cache
(gauge)
The amount of memory that is being used to cache data from disk (e.g. memory contents that can be associated precisely with a block on a block device)
shown as byte
docker.mem.cache.95percentile
(gauge)
95th percentile value of docker.mem.cache
shown as byte
docker.mem.cache.avg
(gauge)
Average value of docker.mem.cache
shown as byte
docker.mem.cache.count
(rate)
The rate that the value of docker.mem.cache was sampled
shown as sample
docker.mem.cache.max
(gauge)
Max value of docker.mem.cache
shown as byte
docker.mem.cache.median
(gauge)
Median value of docker.mem.cache
shown as byte
docker.mem.rss
(gauge)
The amount of non-cache memory that belongs to the container's processes. Used for stacks, heaps, etc.
shown as byte
docker.mem.rss.95percentile
(gauge)
95th percentile value of docker.mem.rss
shown as byte
docker.mem.rss.avg
(gauge)
Average value of docker.mem.rss
shown as byte
docker.mem.rss.count
(rate)
The rate that the value of docker.mem.rss was sampled
shown as sample
docker.mem.rss.max
(gauge)
Max value of docker.mem.rss
shown as byte
docker.mem.rss.median
(gauge)
Median value of docker.mem.rss
shown as byte
docker.mem.swap
(gauge)
The amount of swap currently used by the container
shown as byte
docker.mem.swap.95percentile
(gauge)
95th percentile value of docker.mem.swap
shown as byte
docker.mem.swap.avg
(gauge)
Average value of docker.mem.swap
shown as byte
docker.mem.swap.count
(rate)
The rate that the value of docker.mem.swap was sampled
shown as sample
docker.mem.swap.max
(gauge)
Max value of docker.mem.swap
shown as byte
docker.mem.swap.median
(gauge)
Median value of docker.mem.swap
shown as byte
docker.container.size_rw
(gauge)
Total size of all the files in the container which have been created or changed by processes running in the container
shown as byte
docker.container.size_rw.95percentile
(gauge)
95th percentile of docker.container.size_rw
shown as byte
docker.container.size_rw.avg
(gauge)
Average value of docker.container.size_rw
shown as byte
docker.container.size_rw.count
(rate)
The rate that the value of docker.container.size_rw was sampled
shown as sample
docker.container.size_rw.max
(gauge)
Max value of docker.container.size_rw
shown as byte
docker.container.size_rw.median
(gauge)
Median value of docker.container.size_rw
shown as byte
docker.container.size_rootfs
(gauge)
Total size of all the files in the container
shown as byte
docker.container.size_rootfs.95percentile
(gauge)
95th percentile of docker.container.size_rootfs
shown as byte
docker.container.size_rootfs.avg
(gauge)
Average value of docker.container.size_rootfs
shown as byte
docker.container.size_rootfs.count
(rate)
The rate that the value of docker.container.size_rw was sampled
shown as sample
docker.container.size_rootfs.max
(gauge)
Max value of docker.container.size_rootfs
shown as byte
docker.container.size_rootfs.median
(gauge)
Median value of docker.container.size_rootfs
shown as byte
docker.containers.running
(gauge)
The number of containers running on this host tagged by image
docker.containers.stopped
(gauge)
The number of containers stopped on this host tagged by image
docker.containers.running.total
(gauge)
The total number of containers running on this host
docker.containers.stopped.total
(gauge)
The total number of containers stopped on this host
docker.images.available
(gauge)
The number of top-level images
docker.images.intermediate
(gauge)
The number of intermediate images, which are intermediate layers that make up other images
docker.mem.limit
(gauge)
The memory limit for the container, if set
shown as byte
docker.mem.limit.95percentile
(gauge)
95th percentile of docker.mem.limit. Ordinarily this value will not change
shown as byte
docker.mem.limit.avg
(gauge)
Average value of docker.mem.limit. Ordinarily this value will not change
shown as byte
docker.mem.limit.count
(rate)
The rate that the value of docker.mem.limit was sampled
shown as sample
docker.mem.limit.max
(gauge)
Max value of docker.mem.limit. Ordinarily this value will not change
shown as byte
docker.mem.limit.median
(gauge)
Median value of docker.mem.limit. Ordinarily this value will not change
shown as byte
docker.mem.sw_limit
(gauge)
The swap + memory limit for the container, if set
shown as byte
docker.mem.sw_limit.95percentile
(gauge)
95th percentile of docker.mem.sw_limit. Ordinarily this value will not change
shown as byte
docker.mem.sw_limit.avg
(gauge)
Average value of docker.mem.sw_limit. Ordinarily this value will not change
shown as byte
docker.mem.sw_limit.count
(rate)
The rate that the value of docker.mem.sw_limit was sampled
shown as sample
docker.mem.sw_limit.max
(gauge)
Max value of docker.mem.sw_limit. Ordinarily this value will not change
shown as byte
docker.mem.sw_limit.median
(gauge)
Median value of docker.mem.sw_limit. Ordinarily this value will not change
shown as byte
docker.mem.soft_limit
(gauge)
The memory reservation limit for the container, if set
shown as byte
docker.mem.soft_limit.95percentile
(gauge)
95th percentile of docker.mem.soft_limit. Ordinarily this value will not change
shown as byte
docker.mem.soft_limit.avg
(gauge)
Average value of docker.mem.soft_limit. Ordinarily this value will not change
shown as byte
docker.mem.soft_limit.count
(rate)
The rate that the value of docker.mem.soft_limit was sampled
shown as sample
docker.mem.soft_limit.max
(gauge)
Max value of docker.mem.soft_limit. Ordinarily this value will not change
shown as byte
docker.mem.soft_limit.median
(gauge)
Median value of docker.mem.soft_limit. Ordinarily this value will not change
shown as byte
docker.mem.in_use
(gauge)
The fraction of used memory to available memory, IF THE LIMIT IS SET
shown as fraction
docker.mem.in_use.95percentile
(gauge)
95th percentile of docker.mem.in_use
shown as fraction
docker.mem.in_use.avg
(gauge)
Average value of docker.mem.in_use
shown as fraction
docker.mem.in_use.count
(rate)
The rate that the value of docker.mem.in_use was sampled
shown as sample
docker.mem.in_use.max
(gauge)
Max value of docker.container.mem.in_use
shown as fraction
docker.mem.in_use.median
(gauge)
Median value of docker.container.mem.in_use
shown as fraction
docker.mem.sw_in_use
(gauge)
The fraction of used swap + memory to available swap + memory, if the limit is set
shown as fraction
docker.mem.sw_in_use.95percentile
(gauge)
95th percentile of docker.mem.sw_in_use
shown as fraction
docker.mem.sw_in_use.avg
(gauge)
Average value of docker.mem.sw_in_use
shown as fraction
docker.mem.sw_in_use.count
(rate)
The rate that the value of docker.mem.sw_in_use was sampled
shown as sample
docker.mem.sw_in_use.max
(gauge)
Max value of docker.container.mem.sw_in_use
shown as fraction
docker.mem.sw_in_use.median
(gauge)
Median value of docker.container.mem.sw_in_use
shown as fraction
docker.io.read_bytes
(gauge)
Bytes read per second from disk by the processes of the container
shown as byte
docker.io.read_bytes.95percentile
(gauge)
95th percentile of docker.io.read_bytes
shown as byte
docker.io.read_bytes.avg
(gauge)
Average value of docker.io.read_bytes
shown as byte
docker.io.read_bytes.count
(rate)
The rate that the value of docker.io.read_bytes was sampled
shown as sample
docker.io.read_bytes.max
(gauge)
Max value of docker.container.io.read_bytes
shown as byte
docker.io.read_bytes.median
(gauge)
Median value of docker.container.io.read_bytes
shown as byte
docker.io.write_bytes
(gauge)
Bytes written per second to disk by the processes of the container
shown as byte
docker.io.write_bytes.95percentile
(gauge)
95th percentile of docker.io.write_bytes
shown as byte
docker.io.write_bytes.avg
(gauge)
Average value of docker.io.write_bytes
shown as byte
docker.io.write_bytes.count
(rate)
The rate that the value of docker.io.write_bytes was sampled
shown as sample
docker.io.write_bytes.max
(gauge)
Max value of docker.container.io.write_bytes
shown as byte
docker.io.write_bytes.median
(gauge)
Median value of docker.container.io.write_bytes
shown as byte
docker.image.virtual_size
(gauge)
Size of all layers of the image on disk
shown as byte
docker.image.size
(gauge)
Size of all layers of the image on disk
shown as byte
docker.net.bytes_rcvd
(gauge)
Bytes received per second from the network
shown as byte
docker.net.bytes_rcvd.95percentile
(gauge)
95th percentile of docker.net.bytes_rcvd
shown as byte
docker.net.bytes_rcvd.avg
(gauge)
Average value of docker.net.bytes_rcvd
shown as byte
docker.net.bytes_rcvd.count
(rate)
The rate that the value of docker.net.bytes_rcvd was sampled
shown as sample
docker.net.bytes_rcvd.max
(gauge)
Max value of docker.container.net.bytes_rcvd
shown as byte
docker.net.bytes_rcvd.median
(gauge)
Median value of docker.container.net.bytes_rcvd
shown as byte
docker.net.bytes_sent
(gauge)
Bytes sent per second to the network
shown as byte
docker.net.bytes_sent_bytes.95percentile
(gauge)
95th percentile of docker.net.bytes_sent_bytes
shown as byte
docker.net.bytes_sent_bytes.avg
(gauge)
Average value of docker.net.bytes_sent_bytes
shown as byte
docker.net.bytes_sent_bytes.count
(rate)
The rate that the value of docker.net.bytes_sent_bytes was sampled
shown as sample
docker.net.bytes_sent_bytes.max
(gauge)
Max value of docker.container.net.bytes_sent_bytes
shown as byte
docker.net.bytes_sent_bytes.median
(gauge)
Median value of docker.container.net.bytes_sent_bytes
shown as byte
docker.data.used
(gauge)
Storage pool disk space used
shown as byte
docker.data.free
(gauge)
Storage pool disk space free
shown as byte
docker.data.total
(gauge)
Storage pool disk space total
shown as byte
docker.data.percent
(gauge)
The percent of storage pool used
shown as percent
docker.metadata.used
(gauge)
Storage pool metadata space used
shown as byte
docker.metadata.free
(gauge)
Storage pool metadata space free
shown as byte
docker.metadata.total
(gauge)
Storage pool metadata space total
shown as byte
docker.metadata.percent
(gauge)
The percent of storage pool metadata used
shown as percent

Events

The events below will be available:

  • Delete Image
  • Die
  • Error
  • Fail
  • Kill
  • Out of memory (oom)
  • Pause
  • Restart container
  • Restart Daemon
  • Update

Service Checks

docker.service_up:

Returns CRITICAL if the Agent is unable to collect the list of containers from the Docker daemon. Returns OK otherwise.

docker.container_health:

This Service Check is only available for Agent v5. It Returns CRITICAL if a container is unhealthy, UNKNOWN if the health is unknown, OK otherwise.

docker.exit:

Returns CRITICAL if a container exited with a non-zero exit code. Returns OK otherwise.

Troubleshooting

Need help? Contact Datadog Support.

Further Reading

Knowledge Base

Datadog Blog

Learn more about how to monitor Docker performance metrics thanks to our series of posts. We detail the challenges when monitoring Docker, its key performance metrics, how to collect them, and lastly how the largest TV and radio outlet in the U.S. monitors Docker using Datadog.

We’ve also written several other in-depth blog posts to help you get the most out of Datadog and Docker:


Mistake in the docs? Feel free to contribute!