New announcements for Serverless, Network, RUM, and more from Dash! New announcements from Dash!

Agent

Agent v6 is available. Upgrade to the newest version to benefit from all new functionality.

What is the Agent?

The Datadog Agent is software that runs on your hosts. It collects events and metrics from hosts and sends them to Datadog, where you can analyze your monitoring and performance data. The Datadog Agent is open-source, and its source code is available on GitHub at DataDog/datadog-agent.

To get started using the Agent, select your platform.

Agent Architecture

Agent v6 is a complete rewrite in Go of the Agent v5. V6 offers better performances, smaller footprint, and more features. It is the default Datadog Agent (v5 is no longer in active development).

Agent v6 is a composed of a main process responsible for collecting infrastructure metrics, logs, and receiving DogStatsD metrics. The main components to this process are:

  • The Collector is in charge of running checks and collecting metrics.
  • The Forwarder sends payloads to Datadog.

Two optional processes are spawned by the Agent if enabled in the datadog.yaml configuration file:

  • The APM Agent is a process to collect traces (enabled by default).
  • The Process Agent is a process to collect live process information. By default, it only collects available containers, otherwise it is disabled.

On Windows the services are listed as:

ServiceDescription
DatadogAgent“Datadog Agent”
datadog-trace-agent“Datadog Trace Agent”
datadog-process-agent“Datadog Process Agent”

By default the Agent binds 3 ports on Linux and 4 on Windows and OSX:

PortDescription
5000Exposes runtime metrics about the Agent.
5001Used by the Agent CLI and GUI to send commands and pull information from the running Agent.
5002Serves the GUI server on Windows and OSX.
8125Used for the DogStatsD server to receive external metrics.

The Collector

The collector gathers all standard metrics every 15 seconds. Agent v6 embed a Python2.7 interpreter to run integrations and custom checks.

The Forwarder

The Agent forwarder send metrics over HTTPS to Datadog. Buffering prevents network splits from affecting metric reporting. Metrics are buffered in memory until a limit in size or number of outstanding send requests are reached. Afterwards, the oldest metrics are discarded to keep the forwarder’s memory footprint manageable. Logs are sent over an SSL-encrypted TCP connection to Datadog.

DogStatsD

In v6, DogStatsD is a Golang implementation of Etsy’s StatsD metric aggregation daemon. It is used to receive and roll up arbitrary metrics over UDP or unix socket, thus allowing custom code to be instrumented without adding latency to the mix. Learn more about DogStatsD.

Agent v5 is composed of four major components, each written in Python running as a separate process:

  • Collector (agent.py): The collector runs checks on the current machine for configured integrations, and captures system metrics, such as memory and CPU.
  • DogStatsD (dogstatsd.py): This is a StatsD-compatible backend server that you can send custom metrics to from your applications.
  • Forwarder (ddagent.py): The forwarder retrieves data from both DogStatsD and the collector, queues it up, and then sends it to Datadog.
  • SupervisorD: This is all controlled by a single supervisor process. It is kept separate to limit the overhead of each application if you aren’t running all parts. However, it is generally recommended to run all parts.

Note: For Windows users, all four Agent processes appear as instances of ddagent.exe with the description DevOps’ best friend.

Supervision, Privileges, and Network Ports

A SupervisorD master process runs as the dd-agent user, and all forked subprocesses run as the same user. This also applies to any system call (iostat/netstat) initiated by the Datadog Agent. The Agent configuration resides at /etc/dd-agent/datadog.conf and /etc/dd-agent/conf.d. All configuration must be readable by dd-agent. The recommended permissions are 0600 since configuration files contain your API key and other credentials needed to access metrics.

The following ports are open for operations:

PortDescription
tcp/17123The forwarder for normal operations
tcp/17124The forwarder for graphite support
udp/8125DogStatsD

All listening processes are bound by default to 127.0.0.1 and/or ::1 on v3.4.1+ of the Agent. In earlier versions, they were bound to 0.0.0.0 (all interfaces). For information on running the Agent through a proxy see Agent proxy configuration. For information on IP ranges to allow, see Network Traffic.

The recommended number of open file descriptors is 1024. You can see this value with the command ulimit -a. If you have a hard limitation below the recommended value, for example Shell Fork Bomb Protection, one solution is to add the following in superisord.conf:

[supervisord]
minfds = 100  # Your hard limit

The Collector

The collector gathers all standard metrics every 15 seconds. It also supports the execution of python-based, user-provided checks, stored in /etc/dd-agent/checks.d. User-provided checks must inherit from the AgentCheck abstract class defined in checks/init.py. See Writing a custom Agent check for more details.

The Forwarder

The Agent forwarder listens for incoming requests over HTTP to send metrics over HTTPS to Datadog. Buffering prevents network splits from affecting metric reporting. Metrics are buffered in memory until a limit in size or number of outstanding send requests are reached. Afterwards, the oldest metrics are discarded to keep the forwarder’s memory footprint manageable.

DogStatsD

DogStatsD is a python implementation of Etsy’s StatsD metric aggregation daemon. It is used to receive and roll up arbitrary metrics over UDP, thus allowing custom code to be instrumented without adding latency to the mix. Learn more about DogStatsD.

CLI

The new command line interface for the Agent v6 is sub-command based:

CommandNotes
checkRun the specified check
configcheckPrint all configurations loaded & resolved of a running Agent
diagnoseExecute some connectivity diagnosis on your system
flareCollect a flare and send it to Datadog
healthPrint the current Agent health
helpHelp about any command
hostnamePrint the hostname used by the Agent
importImport and convert configuration files from previous versions of the Agent
installserviceInstalls the Agent within the service control manager
launch-guiStarts the Datadog Agent GUI
regimportImport the registry settings into datadog.yaml
remove-serviceRemoves the Agent from the service control manager
restart-serviceRestarts the Agent within the service control manager
startStart the Agent
start-serviceStarts the Agent within the service control manager
statusPrint the current status
stopserviceStops the Agent within the service control manager
versionPrint the version info

To run a sub-command, first invoke the Agent binary.

<path_to_agent_bin> <sub_command> <options>

Some options have their own set of flags and options detailed in a help message. For example, to see how to use the check sub-command, run:

<agent_binary> check --help

GUI

You can configure the port on which the GUI runs in the datadog.yaml file. To disable the GUI, set the port’s value to -1. For Windows and macOS, the GUI is enabled by default and runs on port 5002. For Linux, the GUI is disabled by default.

When the Agent is running, use the datadog-agent launch-gui command to open the GUI in your default web browser.

Note: The Agent GUI isn’t supported on 32-bit Windows platforms.

Requirements

  1. Cookies must be enabled in your browser. The GUI generates and saves a token in your browser which is used for authenticating all communications with the GUI server.

  2. To start the GUI, the user must have the required permissions. If you are able to open datadog.yaml, you are able to use the GUI.

  3. For security reasons, the GUI can only be accessed from the local network interface (localhost/127.0.0.1), therefore you must be on the same host that the Agent is running. That is, you can’t run the Agent on a VM or a container and access it from the host machine.

Supported OS versions

OSSupported versions
AmazonAmazon Linux 2
Debian x86_64Debian 7 (wheezy)+ and SysVinit in Agent 6.6.0+)
Ubuntu x86_64Ubuntu 14.04+
RedHat/CentOS x86_64RedHat/CentOS 6+
DockerVersion 1.12+
KubernetesVersion 1.3+
SUSE Enterprise Linux x86_64SUSE 11 SP4+ (not SysVinit)
Fedora x86_64Fedora 26+
macOSmacOS 10.12+
Windows server 64-bitWindows Server 2008r2+ and Server Core (not Nano)
Windows 64-bitWindows 7+

Note: Source install may work on operating systems not listed here and is supported on a best effort basis.

OSSupported versions
AmazonAmazon Linux 2
Debian x86_64Debian 7 (wheezy)+
Ubuntu x86_64Ubuntu 12.04+
RedHat/CentOS x86_64RedHat/CentOS 5+
DockerVersion 1.12+
KubernetesVersion 1.3+
SUSE Enterprise Linux x86_64SUSE 11 SP4+
Fedora x86_64Fedora 26+
MacOSmacOS 10.10+
Windows server 64-bitWindows server 2008r2+
Windows 64-bitWindows 7+

Note: Source install may work on operating systems not listed here and is supported on a best effort basis.

OSSupported versions
AIXAIX 6.1 TL9 SP6, 7.1 TL5 SP3, 7.2 TL3 SP0

Agent Overhead

An example of the Datadog Agent resource consumption is below. Tests were made on an AWS EC2 machine c5.xlarge instance (4 VCPU/ 8GB RAM). The vanilla datadog-agent was running with a process check to monitor the Agent itself. Enabling more integrations may increase Agent resource consumption. Enabling JMX Checks forces the Agent to use more memory depending on the number of beans exposed by the monitored JVMs. Enabling the trace and process Agents increases the resource consumption as well.

  • Agent Test version: 6.7.0
  • CPU: ~ 0.12% of the CPU used on average
  • Memory: ~ 60MB of RAM used (RSS memory)
  • Network bandwidth: ~ 86 B/s ▼ | 260 B/s ▲
  • Disk:
    • Linux 350MB to 400MB depending on the distribution
    • Windows: 260MB
  • Agent Test version: 5.24.0
  • CPU: ~ 0.35% of the CPU used on average
  • Memory: ~ 115MB of RAM used.
  • Network bandwidth: ~ 1900 B/s ▼ | 800 B/s ▲
  • Disk:
    • Linux 312MB
    • Windows: 295MB

Note: Since v5.15 of the container Agent, it is recommended to set container resources to at least 256MB due to an added memory cache – upping the limit is not to account for baseline usage but rather to accommodate temporary spikes. Agent 6 has a much more limited memory footprint.

Further Reading