Datadog-Docker Integration

Docker default dashboard

Overview

Get metrics from docker_daemon service in real time to:

  • Visualize and monitor docker_daemon states
  • Be notified about docker_daemon failovers and events.

Setup

Installation

To collect Docker metrics about all your containers, you will run one Datadog Agent on every host. There are two ways to run the Agent: directly on each host, or within a docker-dd-agent container. We recommend the latter.

Whichever you choose, your hosts need to have cgroup memory management enabled for the Docker check to succeed. See the docker-dd-agent repository for how to enable it.

Host Installation

  1. Ensure Docker is running on the host.
  2. Install the Agent as described in the Agent installation instructions for your host OS.
  3. Enable the Docker integration tile in the application.
  4. Add the Agent user to the docker group: usermod -a -G docker dd-agent
  5. Create a docker_daemon.yaml file by copying the example file in the agent conf.d directory. If you have a standard install of Docker on your host, there shouldn’t be anything you need to change to get the integration to work.
  6. To enable other integrations, use docker ps to identify the ports used by the corresponding applications.

Note: docker_daemon has replaced the older docker integration.

Container Installation

  1. Ensure Docker is running on the host.
  2. As per the docker container installation instructions, run:

    docker run -d --name dd-agent \
      -v /var/run/docker.sock:/var/run/docker.sock:ro \
      -v /proc/:/host/proc/:ro \
      -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
      -e API_KEY={YOUR API KEY} \
      datadog/docker-dd-agent:latest
    

Note that in the command above, you are able to pass your API key to the Datadog Agent using Docker’s -e environment variable flag. Some other variables you can pass include:

VariableDescription
API_KEYSets your Datadog API key.
DD_HOSTNAMESets the hostname in the Agent container’s datadog.conf file. If this variable is not set, the Agent container will default to using the Name field (as reported by the docker info command) as the Agent container hostname.
DD_URLSets the Datadog intake server URL where the Agent will send data. This is useful when using an agent as a proxy.
LOG_LEVELSets logging verbosity (CRITICAL, ERROR, WARNING, INFO, DEBUG). For example, -e LOG_LEVEL=DEBUG will set logging to debug mode.
TAGSSets host tags as a comma delimited string. You can pass both simple tags and key-value tags. For example, -e TAGS="simple-tag, tag-key:tag-value".
EC2_TAGSEnabling this feature allows the agent to query and capture custom tags set using the EC2 API during startup. To enable, set the value to “yes”, for example, -e EC2_TAGS=yes. Note that this feature requires an IAM role associated with the instance.
NON_LOCAL_TRAFFICEnabling this feature will allow statsd reporting from any external IP. For example, -e NON_LOCAL_TRAFFIC=yes. This can be used to report metrics from other containers or systems. See network configuration for more details.
PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASSWORDSets proxy configuration details. For more information, see the Agent proxy documentation
SD_BACKEND, SD_CONFIG_BACKEND, SD_BACKEND_HOST, SD_BACKEND_PORT, SD_TEMPLATE_DIR, SD_CONSUL_TOKENEnables and configures Autodiscovery. For more information, please see the Autodiscovery guide.

Running the agent container on Amazon Linux

To run the Datadog Agent container on Amazon Linux, you’ll need to make one small change to the cgroup volume mount location:

docker run -d --name dd-agent \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /cgroup/:/host/sys/fs/cgroup:ro \
  -e API_KEY={YOUR API KEY} \
  datadog/docker-dd-agent:latest

Alpine Linux based container

Our standard Docker image is based on Debian Linux, but as of version 5.7 of the Datadog Agent, we also offer an Alpine Linux based image. The Alpine Linux image is considerably smaller in size than the traditional Debian-based image. It also inherits Alpine’s security-oriented design.

To use the Alpine Linux image, simply append -alpine to the version tag. For example:

docker run -d --name dd-agent \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
  -e API_KEY={YOUR API KEY} \
  datadog/docker-dd-agent:latest-alpine

Image versioning

Starting with version 5.5.0 of the Datadog Agent, the docker image follows a new versioning pattern. This allows us to release changes to the Docker image of the Datadog Agent but with the same version of the Agent.

The Docker image version will have the following pattern: X.Y.Z where X is the major version of the Docker Image, Y is the minor version, Z will represent the Agent version.

For example, the first version of the Docker image that will bundle the Datadog Agent 5.5.0 will be: 10.0.550

Custom containers and additional information

For more information about building custom Docker containers with the Datadog Agent, the Alpine Linux based image, versioning, and more, please reference our docker-dd-agent project on Github.

Validation

Run the Agent’s info subcommand and look for docker_daemon under the Checks section:

Checks
======

    docker_daemon
    -----------
      - instance #0 [OK]
      - Collected 39 metrics, 0 events & 7 service checks

Compatibility

The docker_daemon check is compatible with all major platforms

Data Collected

Metrics

docker.cpu.system
(gauge)
The percent of time the CPU is executing system calls on behalf of processes of this container
shown as percent
docker.cpu.system.95percentile
(gauge)
95th percentile of docker.cpu.system
shown as percent
docker.cpu.system.avg
(gauge)
Average value of docker.cpu.system
shown as percent
docker.cpu.system.count
(rate)
The rate that the value of docker.cpu.system was sampled
shown as sample
docker.cpu.system.max
(gauge)
Max value of docker.cpu.system
shown as percent
docker.cpu.system.median
(gauge)
Median value of docker.cpu.system
shown as percent
docker.cpu.user
(gauge)
The percent of time the CPU is under direct control of processes of this container
shown as percent
docker.cpu.user.95percentile
(gauge)
95th percentile of docker.cpu.user
shown as percent
docker.cpu.user.avg
(gauge)
Average value of docker.cpu.user
shown as percent
docker.cpu.user.count
(rate)
The rate that the value of docker.cpu.user was sampled
shown as sample
docker.cpu.user.max
(gauge)
Max value of docker.cpu.user
shown as percent
docker.cpu.user.median
(gauge)
Median value of docker.cpu.user
shown as percent
docker.cpu.usage
(gauge)
The percent of CPU time obtained by this container
shown as percent
docker.cpu.throttled
(gauge)
Number of times the cgroup has been throttled
shown as
docker.mem.cache
(gauge)
The amount of memory that is being used to cache data from disk (e.g. memory contents that can be associated precisely with a block on a block device)
shown as byte
docker.mem.cache.95percentile
(gauge)
95th percentile value of docker.mem.cache
shown as byte
docker.mem.cache.avg
(gauge)
Average value of docker.mem.cache
shown as byte
docker.mem.cache.count
(rate)
The rate that the value of docker.mem.cache was sampled
shown as sample
docker.mem.cache.max
(gauge)
Max value of docker.mem.cache
shown as byte
docker.mem.cache.median
(gauge)
Median value of docker.mem.cache
shown as byte
docker.mem.rss
(gauge)
The amount of non-cache memory that belongs to the container's processes. Used for stacks, heaps, etc.
shown as byte
docker.mem.rss.95percentile
(gauge)
95th percentile value of docker.mem.rss
shown as byte
docker.mem.rss.avg
(gauge)
Average value of docker.mem.rss
shown as byte
docker.mem.rss.count
(rate)
The rate that the value of docker.mem.rss was sampled
shown as sample
docker.mem.rss.max
(gauge)
Max value of docker.mem.rss
shown as byte
docker.mem.rss.median
(gauge)
Median value of docker.mem.rss
shown as byte
docker.mem.swap
(gauge)
The amount of swap currently used by the container
shown as byte
docker.mem.swap.95percentile
(gauge)
95th percentile value of docker.mem.swap
shown as byte
docker.mem.swap.avg
(gauge)
Average value of docker.mem.swap
shown as byte
docker.mem.swap.count
(rate)
The rate that the value of docker.mem.swap was sampled
shown as sample
docker.mem.swap.max
(gauge)
Max value of docker.mem.swap
shown as byte
docker.mem.swap.median
(gauge)
Median value of docker.mem.swap
shown as byte
docker.container.size_rw
(gauge)
Total size of all the files in the container which have been created or changed by processes running in the container
shown as byte
docker.container.size_rw.95percentile
(gauge)
95th percentile of docker.container.size_rw
shown as byte
docker.container.size_rw.avg
(gauge)
Average value of docker.container.size_rw
shown as byte
docker.container.size_rw.count
(rate)
The rate that the value of docker.container.size_rw was sampled
shown as sample
docker.container.size_rw.max
(gauge)
Max value of docker.container.size_rw
shown as byte
docker.container.size_rw.median
(gauge)
Median value of docker.container.size_rw
shown as byte
docker.container.size_rootfs
(gauge)
Total size of all the files in the container
shown as byte
docker.container.size_rootfs.95percentile
(gauge)
95th percentile of docker.container.size_rootfs
shown as byte
docker.container.size_rootfs.avg
(gauge)
Average value of docker.container.size_rootfs
shown as byte
docker.container.size_rootfs.count
(rate)
The rate that the value of docker.container.size_rw was sampled
shown as sample
docker.container.size_rootfs.max
(gauge)
Max value of docker.container.size_rootfs
shown as byte
docker.container.size_rootfs.median
(gauge)
Median value of docker.container.size_rootfs
shown as byte
docker.containers.running
(gauge)
The number of containers running on this host tagged by image
shown as
docker.containers.stopped
(gauge)
The number of containers stopped on this host tagged by image
shown as
docker.containers.running.total
(gauge)
The total number of containers running on this host
shown as
docker.containers.stopped.total
(gauge)
The total number of containers stopped on this host
shown as
docker.images.available
(gauge)
The number of top-level images
shown as
docker.images.intermediate
(gauge)
The number of intermediate images, which are intermediate layers that make up other images
shown as
docker.mem.limit
(gauge)
The memory limit for the container, if set
shown as byte
docker.mem.limit.95percentile
(gauge)
95th percentile of docker.mem.limit. Ordinarily this value will not change
shown as byte
docker.mem.limit.avg
(gauge)
Average value of docker.mem.limit. Ordinarily this value will not change
shown as byte
docker.mem.limit.count
(rate)
The rate that the value of docker.mem.limit was sampled
shown as sample
docker.mem.limit.max
(gauge)
Max value of docker.mem.limit. Ordinarily this value will not change
shown as byte
docker.mem.limit.median
(gauge)
Median value of docker.mem.limit. Ordinarily this value will not change
shown as byte
docker.mem.sw_limit
(gauge)
The swap + memory limit for the container, if set
shown as byte
docker.mem.sw_limit.95percentile
(gauge)
95th percentile of docker.mem.sw_limit. Ordinarily this value will not change
shown as byte
docker.mem.sw_limit.avg
(gauge)
Average value of docker.mem.sw_limit. Ordinarily this value will not change
shown as byte
docker.mem.sw_limit.count
(rate)
The rate that the value of docker.mem.sw_limit was sampled
shown as sample
docker.mem.sw_limit.max
(gauge)
Max value of docker.mem.sw_limit. Ordinarily this value will not change
shown as byte
docker.mem.sw_limit.median
(gauge)
Median value of docker.mem.sw_limit. Ordinarily this value will not change
shown as byte
docker.mem.soft_limit
(gauge)
The memory reservation limit for the container, if set
shown as byte
docker.mem.soft_limit.95percentile
(gauge)
95th percentile of docker.mem.soft_limit. Ordinarily this value will not change
shown as byte
docker.mem.soft_limit.avg
(gauge)
Average value of docker.mem.soft_limit. Ordinarily this value will not change
shown as byte
docker.mem.soft_limit.count
(rate)
The rate that the value of docker.mem.soft_limit was sampled
shown as sample
docker.mem.soft_limit.max
(gauge)
Max value of docker.mem.soft_limit. Ordinarily this value will not change
shown as byte
docker.mem.soft_limit.median
(gauge)
Median value of docker.mem.soft_limit. Ordinarily this value will not change
shown as byte
docker.mem.in_use
(gauge)
The fraction of used memory to available memory, if the limit is set
shown as fraction
docker.mem.in_use.95percentile
(gauge)
95th percentile of docker.mem.in_use
shown as fraction
docker.mem.in_use.avg
(gauge)
Average value of docker.mem.in_use
shown as fraction
docker.mem.in_use.count
(rate)
The rate that the value of docker.mem.in_use was sampled
shown as sample
docker.mem.in_use.max
(gauge)
Max value of docker.container.mem.in_use
shown as fraction
docker.mem.in_use.median
(gauge)
Median value of docker.container.mem.in_use
shown as fraction
docker.mem.sw_in_use
(gauge)
The fraction of used swap + memory to available swap + memory, if the limit is set
shown as fraction
docker.mem.sw_in_use.95percentile
(gauge)
95th percentile of docker.mem.sw_in_use
shown as fraction
docker.mem.sw_in_use.avg
(gauge)
Average value of docker.mem.sw_in_use
shown as fraction
docker.mem.sw_in_use.count
(rate)
The rate that the value of docker.mem.sw_in_use was sampled
shown as sample
docker.mem.sw_in_use.max
(gauge)
Max value of docker.container.mem.sw_in_use
shown as fraction
docker.mem.sw_in_use.median
(gauge)
Median value of docker.container.mem.sw_in_use
shown as fraction
docker.io.read_bytes
(gauge)
Bytes read per second from disk by the processes of the container
shown as byte
docker.io.read_bytes.95percentile
(gauge)
95th percentile of docker.io.read_bytes
shown as byte
docker.io.read_bytes.avg
(gauge)
Average value of docker.io.read_bytes
shown as byte
docker.io.read_bytes.count
(rate)
The rate that the value of docker.io.read_bytes was sampled
shown as sample
docker.io.read_bytes.max
(gauge)
Max value of docker.container.io.read_bytes
shown as byte
docker.io.read_bytes.median
(gauge)
Median value of docker.container.io.read_bytes
shown as byte
docker.io.write_bytes
(gauge)
Bytes written per second to disk by the processes of the container
shown as byte
docker.io.write_bytes.95percentile
(gauge)
95th percentile of docker.io.write_bytes
shown as byte
docker.io.write_bytes.avg
(gauge)
Average value of docker.io.write_bytes
shown as byte
docker.io.write_bytes.count
(rate)
The rate that the value of docker.io.write_bytes was sampled
shown as sample
docker.io.write_bytes.max
(gauge)
Max value of docker.container.io.write_bytes
shown as byte
docker.io.write_bytes.median
(gauge)
Median value of docker.container.io.write_bytes
shown as byte
docker.image.virtual_size
(gauge)
Size of all layers of the image on disk
shown as byte
docker.image.size
(gauge)
Size of all layers of the image on disk
shown as byte
docker.net.bytes_rcvd
(gauge)
Bytes received per second from the network
shown as byte
docker.net.bytes_rcvd.95percentile
(gauge)
95th percentile of docker.net.bytes_rcvd
shown as byte
docker.net.bytes_rcvd.avg
(gauge)
Average value of docker.net.bytes_rcvd
shown as byte
docker.net.bytes_rcvd.count
(rate)
The rate that the value of docker.net.bytes_rcvd was sampled
shown as sample
docker.net.bytes_rcvd.max
(gauge)
Max value of docker.container.net.bytes_rcvd
shown as byte
docker.net.bytes_rcvd.median
(gauge)
Median value of docker.container.net.bytes_rcvd
shown as byte
docker.net.bytes_sent
(gauge)
Bytes sent per second to the network
shown as byte
docker.net.bytes_sent_bytes.95percentile
(gauge)
95th percentile of docker.net.bytes_sent_bytes
shown as byte
docker.net.bytes_sent_bytes.avg
(gauge)
Average value of docker.net.bytes_sent_bytes
shown as byte
docker.net.bytes_sent_bytes.count
(rate)
The rate that the value of docker.net.bytes_sent_bytes was sampled
shown as sample
docker.net.bytes_sent_bytes.max
(gauge)
Max value of docker.container.net.bytes_sent_bytes
shown as byte
docker.net.bytes_sent_bytes.median
(gauge)
Median value of docker.container.net.bytes_sent_bytes
shown as byte
docker.data.used
(gauge)
Storage pool disk space used
shown as byte
docker.data.free
(gauge)
Storage pool disk space free
shown as byte
docker.data.total
(gauge)
Storage pool disk space total
shown as byte
docker.data.percent
(gauge)
The percent of storage pool used
shown as percent
docker.metadata.used
(gauge)
Storage pool metadata space used
shown as byte
docker.metadata.free
(gauge)
Storage pool metadata space free
shown as byte
docker.metadata.total
(gauge)
Storage pool metadata space total
shown as byte
docker.metadata.percent
(gauge)
The percent of storage pool metadata used
shown as percent

Events

The events below will be available:

  • Delete Image
  • Die
  • Error
  • Fail
  • Kill
  • Out of memory (oom)
  • Pause
  • Restart container
  • Restart Daemon
  • Update

Service Checks

The Docker Daemon check does not include any service check at this time.

Troubleshooting

Need help? Contact Datadog Support.

Further Reading

Knowledge Base

Compose and the Datadog Agent

Compose is a Docker tool that simplifies building applications on Docker by allowing you to define, build and run multiple containers as a single application.

While the Single Container Installation instructions above will get the stock Datadog Agent container running, you will most likely want to enable integrations for other containerized services that are part of your Compose application. To do this, you’ll need to combine integration YAML files with the base Datadog Agent image to create your Datadog Agent container. Then you’ll need to add your container to the Compose YAML.

Example: Monitoring Redis

Let’s look at how you would monitor a Redis container using Compose. Our example file structure is:

|- docker-compose.yml
|- datadog
    |- Dockerfile
    |- conf.d
       |-redisdb.yaml

First we’ll take a look at the docker-compose.yml that describes how our containers work together and sets some of the configuration details for the containers.

version: "2"
services:
  # Redis container
  redis:
    image: redis
  # Agent container
  datadog:
    build: datadog
    links:
     - redis # Ensures datadog container can connect to redis container
    environment:
     - API_KEY=__your_datadog_api_key_here__
    volumes:
     - /var/run/docker.sock:/var/run/docker.sock
     - /proc/mounts:/host/proc/mounts:ro
     - /sys/fs/cgroup:/host/sys/fs/cgroup:ro

In this file, we can see that Compose will run the Docker image redis and it will also build and run a datadog container. By default it will look for a matching directory called datadog and run the Dockerfile in that directory.

Our Dockerfile simply takes the standard Datadog docker image and places a copy of the redisdb.yaml integration configuration into the appropriate directory:

FROM datadog/docker-dd-agent
ADD conf.d/redisdb.yaml /etc/dd-agent/conf.d/redisdb.yaml

Finally our redisdb.yaml is patterned after the redisdb.yaml.example file and tells the Datadog Agent to look for Redis on the host named redis (defined in our docker-compose.yaml above) and the standard Redis port 6379:

init_config:

instances:
  - host: redis
    port: 6379

For a more complete example, please see our Docker Compose example project on Github.

DogStatsD and Docker

Datadog has a huge number of integrations with common applications, but it can also be used to instrument your custom applications. This is typically using one of the many Datadog libraries.

Libraries that communicate over HTTP using the Datadog API don’t require any special configuration with regard to Docker. However, applications using libraries that integrate with DogStatsD or StatsD will need to configure the library to connect to the Agent. Note that each library will handle this configuration differently, so please refer to the individual library’s documentation for more details.

After your code is configured you can run your custom application container using the --link option to create a network connection between your application container and the Datadog Agent container.

Example: Monitoring a basic Python application

To start monitoring our application, we first need to run the Datadog container using the Single Container Installation instructions above. Note that the docker run command sets the name of the container to dd-agent.

Next, we’ll need to instrument our code. Here’s a basic Flask-based web application:

from flask import Flask
from datadog import initialize, statsd

# Initialize DogStatsD and set the host.
initialize(statsd_host = 'dd-agent')

app = Flask(__name__)

@app.route('/')
def hello():
    # Increment a Datadog counter.
    statsd.increment('my_webapp.page.views')

    return "Hello World!"

if __name__ == "__main__"
    app.run()

In our example code above, we set the DogStatsD host to match the Datadog Agent container name, dd-agent.

After we build our web application container, we can run it and use the --link argument to setup a network connection to the Datadog Agent container:

docker run -d --name my-web-app \
  --link dd-agent:dd-agent
  my-web-app

For another example using DogStatsD, see our Docker Compose example project on Github.

Datadog Blog

Learn more about how to monitor Docker performance metrics thanks to our series of posts. We detail the challenges when monitoring Docker, its key performance metrics, how to collect them, and lastly how the largest TV and radio outlet in the U.S. monitors Docker using Datadog.

We’ve also written several other in-depth blog posts to help you get the most out of Datadog and Docker: