Join us at the Dash conference! July 16-17, NYC

Docker Daemon

Agent Check Agent Check

Supported OS: Linux Mac OS

Note: The Docker Daemon check is still maintained but only works with Agent v5.

To use the Docker integration with Agent v6 consult the Agent v6 section below.

Docker default dashboard

Overview

Configure this Agent check to get metrics from the Docker_daemon service in real time to:

  • Visualize and monitor Docker_daemon states.
  • Be notified about Docker_daemon failovers and events.

Setup

Installation

To collect Docker metrics about all your containers, run one Datadog Agent on every host. There are two ways to run the Agent: directly on each host, or within a docker-dd-agent container (recommended).

For either option, your hosts need cgroup memory management enabled for the Docker check to succeed. See the docker-dd-agent repository for how to enable it.

Host Installation

  1. Ensure Docker is running on the host.
  2. Install the Agent as described in the Agent installation instructions for your host OS.
  3. Enable the Docker integration tile in the application.
  4. Add the Agent user to the Docker group: usermod -a -G docker dd-agent
  5. Create a docker_daemon.yaml file by copying the example file in the agent conf.d directory. If you have a standard install of Docker on your host, there shouldn’t be anything you need to change to get the integration to work.
  6. To enable other integrations, use docker ps to identify the ports used by the corresponding applications. Docker ps command

Container Installation

  1. Ensure Docker is running on the host.
  2. As per the Docker container installation instructions, run:

    docker run -d --name dd-agent \
      -v /var/run/docker.sock:/var/run/docker.sock:ro \
      -v /proc/:/host/proc/:ro \
      -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
      -e API_KEY={YOUR_DD_API_KEY} \
      datadog/docker-dd-agent:latest
    

In the command above, you are able to pass your API key to the Datadog Agent using Docker’s -e environment variable flag. Other variables include:

Variable Description
API_KEY Sets your Datadog API key.
DD_HOSTNAME Sets the hostname in the Agent container’s datadog.conf file. If this variable is not set, the Agent container defaults to using the Name field (as reported by the docker info command) as the Agent container hostname.
DD_URL Sets the Datadog intake server URL where the Agent sends data. This is useful when using the Agent as a proxy.
LOG_LEVEL Sets logging verbosity (CRITICAL, ERROR, WARNING, INFO, DEBUG). For example, -e LOG_LEVEL=DEBUG sets logging to debug mode.
TAGS Sets host tags as a comma delimited string. Both simple tags and key-value tags are available, for example: -e TAGS="simple-tag, tag-key:tag-value".
EC2_TAGS Enabling this feature allows the agent to query and capture custom tags set using the EC2 API during startup. To enable, use -e EC2_TAGS=yes. Note that this feature requires an IAM role associated with the instance.
NON_LOCAL_TRAFFIC Enabling this feature allows StatsD reporting from any external IP. To enable, use -e NON_LOCAL_TRAFFIC=yes. This is used to report metrics from other containers or systems. See network configuration for more details.
PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASSWORD Sets proxy configuration details. For more information, see the Agent proxy documentation
SD_BACKEND, SD_CONFIG_BACKEND, SD_BACKEND_HOST, SD_BACKEND_PORT, SD_TEMPLATE_DIR, SD_CONSUL_TOKEN Enables and configures Autodiscovery. For more information, see the Autodiscovery guide.

Note: Add --restart=unless-stopped if you want your agent to be resistant to restarts.

Running the agent container on Amazon Linux

To run the Datadog Agent container on Amazon Linux, make this change to the cgroup volume mount location:

docker run -d --name dd-agent \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /cgroup/:/host/sys/fs/cgroup:ro \
  -e API_KEY={YOUR API KEY} \
  datadog/docker-dd-agent:latest

Alpine Linux based container

The standard Docker image is based on Debian Linux, but as of Datadog Agent v5.7, there is an Alpine Linux based image. The Alpine Linux image is considerably smaller in size than the traditional Debian-based image. It also inherits Alpine’s security-oriented design.

To use the Alpine Linux image, append -alpine to the version tag. For example:

docker run -d --name dd-agent \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
  -e API_KEY={YOUR API KEY} \
  datadog/docker-dd-agent:latest-alpine

Image versioning

Starting with version 5.5.0 of the Datadog Agent, the Docker image follows a new versioning pattern. This allows us to release changes to the Docker image of the Datadog Agent but with the same version of the Agent.

The Docker image version has the following pattern: X.Y.Z where X is the major version of the Docker Image, Y is the minor version, Z represents the Agent version.

For example, the first version of the Docker image that bundles the Datadog Agent 5.5.0 is: 10.0.550

Custom containers and additional information

For more information about building custom Docker containers with the Datadog Agent, the Alpine Linux based image, versioning, and more, reference the docker-dd-agent project on Github.

Validation

Run the Agent’s status subcommand and look for docker_daemon under the Checks section.

Agent v6

The latest Docker check is named docker and written in Go to take advantage of the new internal architecture. Starting from version 6.0, the Agent won’t load the docker_daemon check anymore, even if it is still available and maintained for Agent v5. All features are ported on version >6.0 , except the following deprecations:

  • The url, api_version and tags* options are deprecated, direct use of the standard Docker environment variables is encouraged.
  • The ecs_tags, performance_tags and container_tags options are deprecated. Every relevant tag is now collected by default.
  • The collect_container_count option to enable the docker.container.count metric is not supported. docker.containers.running and .stopped should be used.

Some options have moved from docker_daemon.yaml to the main datadog.yaml:

  • collect_labels_as_tags has been renamed docker_labels_as_tags and now supports high cardinality tags, see the details in datadog.yaml.example.
  • exclude and include lists have been renamed ac_include and ac_exclude. To make filtering consistent across all components of the Agent, filtering on arbitrary tags has been dropped. The only supported filtering tags are image (image name) and name (container name). Regexp filtering is still available, see datadog.yaml.example for examples.
  • The docker_root option has been split in two options container_cgroup_root and container_proc_root.
  • exclude_pause_container has been added to exclude paused containers on Kubernetes and Openshift (defaults to true). This avoids removing them from the exclude list by error.

Additional changes:

The import command converts the old docker_daemon.yaml to the new docker.yaml. The command also moves needed settings from docker_daemon.yaml to datadog.yaml.

Data Collected

Metrics

docker.cpu.system
(gauge)
The percent of time the CPU is executing system calls on behalf of processes of this container, unnormalized
shown as percent
docker.cpu.system.95percentile
(gauge)
95th percentile of docker.cpu.system
shown as percent
docker.cpu.system.avg
(gauge)
Average value of docker.cpu.system
shown as percent
docker.cpu.system.count
(rate)
The rate that the value of docker.cpu.system was sampled
shown as sample
docker.cpu.system.max
(gauge)
Max value of docker.cpu.system
shown as percent
docker.cpu.system.median
(gauge)
Median value of docker.cpu.system
shown as percent
docker.cpu.user
(gauge)
The percent of time the CPU is under direct control of processes of this container, unnormalized
shown as percent
docker.cpu.user.95percentile
(gauge)
95th percentile of docker.cpu.user
shown as percent
docker.cpu.user.avg
(gauge)
Average value of docker.cpu.user
shown as percent
docker.cpu.user.count
(rate)
The rate that the value of docker.cpu.user was sampled
shown as sample
docker.cpu.user.max
(gauge)
Max value of docker.cpu.user
shown as percent
docker.cpu.user.median
(gauge)
Median value of docker.cpu.user
shown as percent
docker.cpu.usage
(gauge)
The percent of CPU time obtained by this container
shown as percent
docker.cpu.throttled
(gauge)
Number of times the cgroup has been throttled
docker.cpu.shares
(gauge)
Shares of CPU usage allocated to the container
docker.kmem.usage
(gauge)
The amount of kernel memory that belongs to the container's processes.
shown as byte
docker.mem.cache
(gauge)
The amount of memory that is being used to cache data from disk (e.g. memory contents that can be associated precisely with a block on a block device)
shown as byte
docker.mem.cache.95percentile
(gauge)
95th percentile value of docker.mem.cache
shown as byte
docker.mem.cache.avg
(gauge)
Average value of docker.mem.cache
shown as byte
docker.mem.cache.count
(rate)
The rate that the value of docker.mem.cache was sampled
shown as sample
docker.mem.cache.max
(gauge)
Max value of docker.mem.cache
shown as byte
docker.mem.cache.median
(gauge)
Median value of docker.mem.cache
shown as byte
docker.mem.rss
(gauge)
The amount of non-cache memory that belongs to the container's processes. Used for stacks, heaps, etc.
shown as byte
docker.mem.rss.95percentile
(gauge)
95th percentile value of docker.mem.rss
shown as byte
docker.mem.rss.avg
(gauge)
Average value of docker.mem.rss
shown as byte
docker.mem.rss.count
(rate)
The rate that the value of docker.mem.rss was sampled
shown as sample
docker.mem.rss.max
(gauge)
Max value of docker.mem.rss
shown as byte
docker.mem.rss.median
(gauge)
Median value of docker.mem.rss
shown as byte
docker.mem.swap
(gauge)
The amount of swap currently used by the container
shown as byte
docker.mem.swap.95percentile
(gauge)
95th percentile value of docker.mem.swap
shown as byte
docker.mem.swap.avg
(gauge)
Average value of docker.mem.swap
shown as byte
docker.mem.swap.count
(rate)
The rate that the value of docker.mem.swap was sampled
shown as sample
docker.mem.swap.max
(gauge)
Max value of docker.mem.swap
shown as byte
docker.mem.swap.median
(gauge)
Median value of docker.mem.swap
shown as byte
docker.container.size_rw
(gauge)
Total size of all the files in the container which have been created or changed by processes running in the container
shown as byte
docker.container.size_rw.95percentile
(gauge)
95th percentile of docker.container.size_rw
shown as byte
docker.container.size_rw.avg
(gauge)
Average value of docker.container.size_rw
shown as byte
docker.container.size_rw.count
(rate)
The rate that the value of docker.container.size_rw was sampled
shown as sample
docker.container.size_rw.max
(gauge)
Max value of docker.container.size_rw
shown as byte
docker.container.size_rw.median
(gauge)
Median value of docker.container.size_rw
shown as byte
docker.container.size_rootfs
(gauge)
Total size of all the files in the container
shown as byte
docker.container.size_rootfs.95percentile
(gauge)
95th percentile of docker.container.size_rootfs
shown as byte
docker.container.size_rootfs.avg
(gauge)
Average value of docker.container.size_rootfs
shown as byte
docker.container.size_rootfs.count
(rate)
The rate that the value of docker.container.size_rw was sampled
shown as sample
docker.container.size_rootfs.max
(gauge)
Max value of docker.container.size_rootfs
shown as byte
docker.container.size_rootfs.median
(gauge)
Median value of docker.container.size_rootfs
shown as byte
docker.containers.running
(gauge)
The number of containers running on this host tagged by image
docker.containers.stopped
(gauge)
The number of containers stopped on this host tagged by image
docker.containers.running.total
(gauge)
The total number of containers running on this host
docker.containers.stopped.total
(gauge)
The total number of containers stopped on this host
docker.images.available
(gauge)
The number of top-level images
docker.images.intermediate
(gauge)
The number of intermediate images, which are intermediate layers that make up other images
docker.mem.limit
(gauge)
The memory limit for the container, if set
shown as byte
docker.mem.limit.95percentile
(gauge)
95th percentile of docker.mem.limit. Ordinarily this value will not change
shown as byte
docker.mem.limit.avg
(gauge)
Average value of docker.mem.limit. Ordinarily this value will not change
shown as byte
docker.mem.limit.count
(rate)
The rate that the value of docker.mem.limit was sampled
shown as sample
docker.mem.limit.max
(gauge)
Max value of docker.mem.limit. Ordinarily this value will not change
shown as byte
docker.mem.limit.median
(gauge)
Median value of docker.mem.limit. Ordinarily this value will not change
shown as byte
docker.mem.sw_limit
(gauge)
The swap + memory limit for the container, if set
shown as byte
docker.mem.sw_limit.95percentile
(gauge)
95th percentile of docker.mem.sw_limit. Ordinarily this value will not change
shown as byte
docker.mem.sw_limit.avg
(gauge)
Average value of docker.mem.sw_limit. Ordinarily this value will not change
shown as byte
docker.mem.sw_limit.count
(rate)
The rate that the value of docker.mem.sw_limit was sampled
shown as sample
docker.mem.sw_limit.max
(gauge)
Max value of docker.mem.sw_limit. Ordinarily this value will not change
shown as byte
docker.mem.sw_limit.median
(gauge)
Median value of docker.mem.sw_limit. Ordinarily this value will not change
shown as byte
docker.mem.soft_limit
(gauge)
The memory reservation limit for the container, if set
shown as byte
docker.mem.soft_limit.95percentile
(gauge)
95th percentile of docker.mem.soft_limit. Ordinarily this value will not change
shown as byte
docker.mem.soft_limit.avg
(gauge)
Average value of docker.mem.soft_limit. Ordinarily this value will not change
shown as byte
docker.mem.soft_limit.count
(rate)
The rate that the value of docker.mem.soft_limit was sampled
shown as sample
docker.mem.soft_limit.max
(gauge)
Max value of docker.mem.soft_limit. Ordinarily this value will not change
shown as byte
docker.mem.soft_limit.median
(gauge)
Median value of docker.mem.soft_limit. Ordinarily this value will not change
shown as byte
docker.mem.in_use
(gauge)
The fraction of used memory to available memory, IF THE LIMIT IS SET
shown as fraction
docker.mem.in_use.95percentile
(gauge)
95th percentile of docker.mem.in_use
shown as fraction
docker.mem.in_use.avg
(gauge)
Average value of docker.mem.in_use
shown as fraction
docker.mem.in_use.count
(rate)
The rate that the value of docker.mem.in_use was sampled
shown as sample
docker.mem.in_use.max
(gauge)
Max value of docker.container.mem.in_use
shown as fraction
docker.mem.in_use.median
(gauge)
Median value of docker.container.mem.in_use
shown as fraction
docker.mem.sw_in_use
(gauge)
The fraction of used swap + memory to available swap + memory, if the limit is set
shown as fraction
docker.mem.sw_in_use.95percentile
(gauge)
95th percentile of docker.mem.sw_in_use
shown as fraction
docker.mem.sw_in_use.avg
(gauge)
Average value of docker.mem.sw_in_use
shown as fraction
docker.mem.sw_in_use.count
(rate)
The rate that the value of docker.mem.sw_in_use was sampled
shown as sample
docker.mem.sw_in_use.max
(gauge)
Max value of docker.container.mem.sw_in_use
shown as fraction
docker.mem.sw_in_use.median
(gauge)
Median value of docker.container.mem.sw_in_use
shown as fraction
docker.io.read_bytes
(gauge)
Bytes read per second from disk by the processes of the container
shown as byte
docker.io.read_bytes.95percentile
(gauge)
95th percentile of docker.io.read_bytes
shown as byte
docker.io.read_bytes.avg
(gauge)
Average value of docker.io.read_bytes
shown as byte
docker.io.read_bytes.count
(rate)
The rate that the value of docker.io.read_bytes was sampled
shown as sample
docker.io.read_bytes.max
(gauge)
Max value of docker.container.io.read_bytes
shown as byte
docker.io.read_bytes.median
(gauge)
Median value of docker.container.io.read_bytes
shown as byte
docker.io.write_bytes
(gauge)
Bytes written per second to disk by the processes of the container
shown as byte
docker.io.write_bytes.95percentile
(gauge)
95th percentile of docker.io.write_bytes
shown as byte
docker.io.write_bytes.avg
(gauge)
Average value of docker.io.write_bytes
shown as byte
docker.io.write_bytes.count
(rate)
The rate that the value of docker.io.write_bytes was sampled
shown as sample
docker.io.write_bytes.max
(gauge)
Max value of docker.container.io.write_bytes
shown as byte
docker.io.write_bytes.median
(gauge)
Median value of docker.container.io.write_bytes
shown as byte
docker.image.virtual_size
(gauge)
Size of all layers of the image on disk
shown as byte
docker.image.size
(gauge)
Size of all layers of the image on disk
shown as byte
docker.net.bytes_rcvd
(gauge)
Bytes received per second from the network
shown as byte
docker.net.bytes_rcvd.95percentile
(gauge)
95th percentile of docker.net.bytes_rcvd
shown as byte
docker.net.bytes_rcvd.avg
(gauge)
Average value of docker.net.bytes_rcvd
shown as byte
docker.net.bytes_rcvd.count
(rate)
The rate that the value of docker.net.bytes_rcvd was sampled
shown as sample
docker.net.bytes_rcvd.max
(gauge)
Max value of docker.container.net.bytes_rcvd
shown as byte
docker.net.bytes_rcvd.median
(gauge)
Median value of docker.container.net.bytes_rcvd
shown as byte
docker.net.bytes_sent
(gauge)
Bytes sent per second to the network
shown as byte
docker.net.bytes_sent_bytes.95percentile
(gauge)
95th percentile of docker.net.bytes_sent_bytes
shown as byte
docker.net.bytes_sent_bytes.avg
(gauge)
Average value of docker.net.bytes_sent_bytes
shown as byte
docker.net.bytes_sent_bytes.count
(rate)
The rate that the value of docker.net.bytes_sent_bytes was sampled
shown as sample
docker.net.bytes_sent_bytes.max
(gauge)
Max value of docker.container.net.bytes_sent_bytes
shown as byte
docker.net.bytes_sent_bytes.median
(gauge)
Median value of docker.container.net.bytes_sent_bytes
shown as byte
docker.data.used
(gauge)
Storage pool disk space used
shown as byte
docker.data.free
(gauge)
Storage pool disk space free
shown as byte
docker.data.total
(gauge)
Storage pool disk space total
shown as byte
docker.data.percent
(gauge)
The percent of storage pool used
shown as percent
docker.metadata.used
(gauge)
Storage pool metadata space used
shown as byte
docker.metadata.free
(gauge)
Storage pool metadata space free
shown as byte
docker.metadata.total
(gauge)
Storage pool metadata space total
shown as byte
docker.metadata.percent
(gauge)
The percent of storage pool metadata used
shown as percent
docker.thread.count
(gauge)
Current thread count for the container
shown as thread
docker.thread.limit
(gauge)
Thread count limit for the container, if set
shown as thread

Events

The Docker integration produces the following events:

  • Delete Image
  • Die
  • Error
  • Fail
  • Kill
  • Out of memory (oom)
  • Pause
  • Restart container
  • Restart Daemon
  • Update

Service Checks

docker.service_up: Returns CRITICAL if the Agent is unable to collect the list of containers from the Docker daemon, otherwise returns OK.

docker.container_health: This Service Check is only available for Agent v5. It returns CRITICAL if a container is unhealthy, UNKNOWN if the health is unknown, and OK otherwise.

docker.exit: Returns CRITICAL if a container exited with a non-zero exit code, otherwise returns OK.

Note: To use docker.exit, add collect_exit_code: true in your Docker YAML file and restart the Agent.

Troubleshooting

Need help? Contact Datadog support.

Further Reading

Knowledge Base

Datadog Blog

Learn more about how to monitor Docker performance metrics with our series of posts. We detail the challenges when monitoring Docker, its key performance metrics, how to collect them, and lastly how the largest TV and radio outlet in the U.S. monitors Docker using Datadog.

We’ve also written several other in-depth blog posts to help you get the most out of Datadog and Docker:


Mistake in the docs? Feel free to contribute!