Network Performance Monitoring Installation
Incident Management is now generally available! Incident Management is now generally available!

Network Performance Monitoring Installation

Network performance monitoring requires Datadog Agent v6.14+. Since this product is built on eBPF, Datadog minimally requires platforms that have an underlying Linux kernel versions of 4.4.0+.

Supported platforms include:

  • Ubuntu 16.04+
  • Debian 9+
  • Fedora 26+
  • SUSE 15+
  • Amazon AMI 2016.03+
  • Amazon Linux 2

There is an exemption to the 4.4.0+ kernel requirement for CentOS/RHEL 7.6+. The DNS Resolution feature is not supported on CentOS/RHEL 7.6.

Network Performance Monitoring is compatible with Cilium installations, provided the following requirements are met:

  1. Cilium version 1.6 and above, and
  2. Kernel version 5.1.16 and above, or 4.19.57 and above for 4.19.x kernels

Note: Datadog does not currently support Windows and macOS platforms for Network Performance Monitoring.

The following provisioning systems are supported:

Setup

To enable Network Performance Monitoring, configure it in your Agent’s main configuration file based on your system setup.

Given this tool’s focus and strength is in analyzing traffic between network endpoints and mapping network dependencies, it is recommended to install it on a meaningful subset of your infrastructure and a minimum of 2 hosts to maximize value.

To enable network performance monitoring with the Datadog Agent, use the following configurations:

  1. If you are not using Agent v6.14+, enable live process collection first, otherwise skip this step.

  2. Copy the system-probe example configuration:

    sudo -u dd-agent cp /etc/datadog-agent/system-probe.yaml.example /etc/datadog-agent/system-probe.yaml
    
  3. Edit /etc/datadog-agent/system-probe.yaml to set the enable flag to true:

    system_probe_config:
        ## @param enabled - boolean - optional - default: false
        ## Set to true to enable the System Probe.
        #
        enabled: true
    
  4. If you are running an Agent older than v6.18 or 7.18, manually start the system-probe and enable it to start on boot (since v6.18 and v7.18 the system-probe starts automatically when the Agent is started):

    sudo systemctl start datadog-agent-sysprobe
    sudo systemctl enable datadog-agent-sysprobe
    

    Note: If the systemctl command is not available on your system, start it with following command instead: sudo service datadog-agent-sysprobe start and then set it up to start on boot before datadog-agent starts.

  5. Restart the Agent

    sudo systemctl restart datadog-agent
    

    Note: If the systemctl command is not available on your system, run the following command instead: sudo service datadog-agent restart

SELinux-enabled systems

On systems with SELinux enabled, the system-probe binary needs special permissions to use eBPF features.

The Datadog Agent RPM package for CentOS-based systems bundles an SELinux policy to grant these permissions to the system-probe binary.

If you need to use Network Performance Monitoring on other systems with SELinux enabled, do the following:

  1. Modify the base SELinux policy to match your SELinux configuration. Depending on your system, some types or attributes may not exist (or have different names).

  2. Compile the policy into a module; assuming your policy file is named system_probe_policy.te:

    checkmodule -M -m -o system_probe_policy.mod system_probe_policy.te
    semodule_package -o system_probe_policy.pp -m system_probe_policy.mod
    
  3. Apply the module to your SELinux system:

    semodule -v -i system_probe_policy.pp
    
  4. Change the system-probe binary type to use the one defined in the policy; assuming your Agent installation directory is /opt/datadog-agent:

    semanage fcontext -a -t system_probe_t /opt/datadog-agent/embedded/bin/system-probe
    restorecon -v /opt/datadog-agent/embedded/bin/system-probe
    
  5. Restart the Agent

Note: these instructions require to have some SELinux utilities installed on the system (checkmodule, semodule, semodule_package, semanage and restorecon) that are available on most standard distributions (Ubuntu, Debian, RHEL, CentOS, SUSE). Check your distribution for details on how to install them.

If these utilities do not exist in your distribution, follow the same procedure but using the utilities provided by your distribution instead.

To enable network performance monitoring with Kubernetes from scratch:

  1. Download the datadog-agent.yaml manifest template.

  2. Replace <DATADOG_API_KEY> with your Datadog API key.

  3. Optional - Set your Datadog site. If you are using the Datadog EU site, set the DD_SITE environment variable to datadoghq.eu in the datadog-agent.yaml manifest.

  4. Deploy the DaemonSet with the command:

    kubectl apply -f datadog-agent.yaml
    

If you already have the Agent running with a manifest:

  1. Add the annotation container.apparmor.security.beta.kubernetes.io/system-probe: unconfined on the datadog-agent template:

    spec:
        selector:
            matchLabels:
                app: datadog-agent
        template:
            metadata:
                labels:
                    app: datadog-agent
                name: datadog-agent
                annotations:
                    container.apparmor.security.beta.kubernetes.io/system-probe: unconfined
    
  2. Enable process collection and the system probe with the following environment variables in the Agent DaemonSet. If you are running a container per Agent process, add the following environment variables to the Process Agent container; otherwise, add them to the Agent container.

      # (...)
                      env:
                      # (...)
                          - name: DD_PROCESS_AGENT_ENABLED
                            value: 'true'
                          - name: DD_SYSTEM_PROBE_ENABLED
                            value: 'true'
                          - name: DD_SYSTEM_PROBE_EXTERNAL
                            value: 'true'
                          - name: DD_SYSPROBE_SOCKET
                            value: /var/run/s6/sysprobe.sock
    
  3. Mount the following extra volumes into the datadog-agent container:

     # (...)
            spec:
                serviceAccountName: datadog-agent
                containers:
                    - name: datadog-agent
                      image: 'gcr.io/datadoghq/agent:latest'
                      # (...)
                  volumeMounts:
                      - name: procdir
                        mountPath: /host/proc
                        readOnly: true
                      - name: cgroups
                        mountPath: /host/sys/fs/cgroup
                        readOnly: true
                      - name: debugfs
                        mountPath: /sys/kernel/debug
                      - name: s6-run
                        mountPath: /var/run/s6
    
  4. Add a new system-probe as a side car to the Agent:

     # (...)
            spec:
                serviceAccountName: datadog-agent
                containers:
                    - name: datadog-agent
                      image: 'datadog/agent:latest'
                    # (...)
                    - name: system-probe
                      image: 'datadog/agent:latest'
                      imagePullPolicy: Always
                      securityContext:
                          capabilities:
                              add:
                                  - SYS_ADMIN
                                  - SYS_RESOURCE
                                  - SYS_PTRACE
                                  - NET_ADMIN
                                  - IPC_LOCK
                      command:
                          - /opt/datadog-agent/embedded/bin/system-probe
                      env:
                          - name: DD_SYSPROBE_SOCKET
                            value: /var/run/s6/sysprobe.sock
                      resources:
                          requests:
                              memory: 150Mi
                              cpu: 200m
                          limits:
                              memory: 150Mi
                              cpu: 200m
                      volumeMounts:
                          - name: procdir
                            mountPath: /host/proc
                            readOnly: true
                          - name: cgroups
                            mountPath: /host/sys/fs/cgroup
                            readOnly: true
                          - name: debugfs
                            mountPath: /sys/kernel/debug
                          - name: s6-run
                            mountPath: /var/run/s6
    
  5. Finally, add the following volumes to your manifest:

                volumes:
                    - name: s6-run
                      emptyDir: {}
                    - name: debugfs
                      hostPath:
                          path: /sys/kernel/debug
    

To enable network performance monitoring in Docker, use the following configuration when starting the container Agent:

$ docker run -e DD_API_KEY="<DATADOG_API_KEY>" \
-e DD_SYSTEM_PROBE_ENABLED=true \
-e DD_PROCESS_AGENT_ENABLED=true \
      -v /var/run/docker.sock:/var/run/docker.sock:ro \
      -v /proc/:/host/proc/:ro \
      -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
-v /sys/kernel/debug:/sys/kernel/debug \
--security-opt apparmor:unconfined \
--cap-add=SYS_ADMIN \
--cap-add=SYS_RESOURCE \
--cap-add=SYS_PTRACE \
--cap-add=NET_ADMIN \
--cap-add=IPC_LOCK \
datadog/agent:latest

Replace <DATADOG_API_KEY> with your Datadog API key.

If using docker-compose, make the following additions to the Datadog Agent service.

version: '3'
services:
  ..
  datadog:
    image: "datadog/agent:latest"
    environment:
       DD_SYSTEM_PROBE_ENABLED: 'true'
       DD_PROCESS_AGENT_ENABLED: 'true'
       DD_API_KEY: '<DATADOG_API_KEY>'
    volumes:
    - /var/run/docker.sock:/var/run/docker.sock:ro
    - /proc/:/host/proc/:ro
    - /sys/fs/cgroup/:/host/sys/fs/cgroup:ro
    - /sys/kernel/debug:/sys/kernel/debug
    cap_add:
    - SYS_ADMIN
    - SYS_RESOURCE
    - SYS_PTRACE
    - NET_ADMIN
    - IPC_LOCK
    security_opt:
    - apparmor:unconfined

To set up on AWS ECS, see the AWS ECS documentation page.

Further Reading