Cloud Network Monitoring Setup
Datadog Cloud Network Monitoring (CNM) gives you visibility into your network traffic between services, containers, availability zones, and any other tag in Datadog so you can:
- Pinpoint unexpected or latent service dependencies.
- Optimize costly cross-regional or multi-cloud communication.
- Identify outages of cloud provider regions and third-party tools.
- Troubleshoot faulty service discovery with DNS server metrics.
Cloud Network Monitoring requires Datadog Agent v6.14+. Because metrics are automatically collected in higher versions of the Agent, see the metrics setup section to configure DNS Monitoring.
Operating systems
Linux OS
Data collection is done using eBPF, so Datadog minimally requires platforms that have underlying Linux kernel versions of 4.4.0+ or have eBPF features backported. CNM supports the following Linux distributions:
- Ubuntu 16.04+
- Debian 9+
- Fedora 26+
- SUSE 15+
- Amazon AMI 2016.03+
- Amazon Linux 2
- CentOS/RHEL 7.6+
Note: There is an exception to the 4.4.0+ kernel requirement for CentOS/RHEL 7.6+. The DNS Resolution feature is not supported on CentOS/RHEL 7.6.
Windows OS
Data collection is done using a network kernel device driver. Support is available as of Datadog Agent version 7.27.1, for Windows versions 2012 R2 (and equivalent desktop OSs, including Windows 10) and up.
macOS
Datadog Cloud Network Monitoring does not support macOS platforms.
Containers
CNM helps you visualize the architecture and performance of your containerized and orchestrated environments, with support for Docker, Kubernetes, ECS, and other container technologies. Datadog’s container integrations enable you to aggregate traffic by meaningful entities–such as containers, tasks, pods, clusters, and deployments–with out-of-the-box tags such as container_name
, task_name
, and kube_service
.
CNM is not supported for Google Kubernetes Engine (GKE) Autopilot.
Istio
With CNM, you can map network communication between containers, pods, and services over the Istio service mesh.
Datadog monitors every aspect of your Istio environment, so you can also:
- Assess the health of Envoy and the Istio control plane with logs.
- Break down the performance of your service mesh with request, bandwidth, and resource consumption metrics.
- Examine distributed traces for applications transacting over the mesh with APM.
CNM supports Istio v1.6.4+ with Datadog Agent v7.24.1+.
To learn more about monitoring your Istio environment with Datadog, see the Istio blog.
Cilium
Cloud Network Monitoring is compatible with Cilium installations, provided the following requirements are met:
- Cilium version 1.6 and above, and
- Kernel version 5.1.16 and above, or 4.19.57 and above for 4.19.x kernels
Provisioning systems
Cloud Network Monitoring supports use of the following provisioning systems:
Setup
Given this tool’s focus and strength is in analyzing traffic between network endpoints and mapping network dependencies, it is recommended to install it on a meaningful subset of your infrastructure and a minimum of 2 hosts to maximize value.
To enable Cloud Network Monitoring with the Datadog Agent, use the following configurations:
If you are using an agent older than v6.14+, enable live process collection first, otherwise skip this step.
Copy the system-probe example configuration:
sudo -u dd-agent install -m 0640 /etc/datadog-agent/system-probe.yaml.example /etc/datadog-agent/system-probe.yaml
Edit /etc/datadog-agent/system-probe.yaml
to set the enable flag to true
:
network_config: # use system_probe_config for Agent's older than 7.24.1
## @param enabled - boolean - optional - default: false
## Set to true to enable Cloud Network Monitoring.
#
enabled: true
If you are running an Agent older than v6.18 or 7.18, manually start the system-probe and enable it to start on boot (since v6.18 and v7.18 the system-probe starts automatically when the Agent is started):
sudo systemctl start datadog-agent-sysprobe
sudo systemctl enable datadog-agent-sysprobe
Note: If the systemctl
command is not available on your system, start it with following command instead: sudo service datadog-agent-sysprobe start
and then set it up to start on boot before datadog-agent
starts.
Restart the Agent.
sudo systemctl restart datadog-agent
Note: If the systemctl
command is not available on your system, run the following command instead: sudo service datadog-agent restart
SELinux-enabled systems
On systems with SELinux enabled, the system-probe binary needs special permissions to use eBPF features.
The Datadog Agent RPM package for CentOS-based systems bundles an SELinux policy to grant these permissions to the system-probe binary.
If you need to use Cloud Network Monitoring on other systems with SELinux enabled, do the following:
Modify the base SELinux policy to match your SELinux configuration.
Depending on your system, some types or attributes may not exist (or have different names).
Compile the policy into a module; assuming your policy file is named system_probe_policy.te
:
checkmodule -M -m -o system_probe_policy.mod system_probe_policy.te
semodule_package -o system_probe_policy.pp -m system_probe_policy.mod
Apply the module to your SELinux system:
semodule -v -i system_probe_policy.pp
Change the system-probe binary type to use the one defined in the policy; assuming your Agent installation directory is /opt/datadog-agent
:
semanage fcontext -a -t system_probe_t /opt/datadog-agent/embedded/bin/system-probe
restorecon -v /opt/datadog-agent/embedded/bin/system-probe
Restart the Agent.
Note: these instructions require to have some SELinux utilities installed on the system (checkmodule
, semodule
, semodule_package
, semanage
and restorecon
) that are available on most standard distributions (Ubuntu, Debian, RHEL, CentOS, SUSE). Check your distribution for details on how to install them.
If these utilities do not exist in your distribution, follow the same procedure but using the utilities provided by your distribution instead.
Data collection for Windows relies on a filter driver for collecting network data.
To enable Cloud Network Monitoring for Windows hosts:
Install the Datadog Agent (version 7.27.1 or above) with the network driver component enabled.
[DEPRECATED] (version 7.44 or below) During installation pass ADDLOCAL="MainApplication,NPM"
to the msiexec
command, or select “Cloud Network Monitoring” when running the Agent installation through the GUI.
Edit C:\ProgramData\Datadog\system-probe.yaml
to set the enabled flag to true
:
network_config:
enabled: true
Restart the Agent.
For PowerShell (powershell.exe
):
restart-service -f datadogagent
For Command Prompt (cmd.exe
):
net /y stop datadogagent && net start datadogagent
Note: Cloud Network Monitoring monitors Windows hosts only, and not Windows containers.
To enable Cloud Network Monitoring with Kubernetes using Helm, add the following to your values.yaml
file.Helm chart v2.4.39+ is required. For more information, see the Datadog Helm Chart documentation.
datadog:
networkMonitoring:
enabled: true
Note: If you receive a permissions error when configuring CNM on your Kubernetes environment: Error: error enabling protocol classifier: permission denied
, add the following to your values.yaml
(Reference this section in the Helm chart):
agents:
podSecurity:
apparmor:
enabled: true
If you are not using Helm, you can enable Cloud Network Monitoring with Kubernetes from scratch:
Download the datadog-agent.yaml manifest template.
Replace <DATADOG_API_KEY>
with your Datadog API key.
Optional - Set your Datadog site. If you are using the Datadog EU site, set the DD_SITE
environment variable to datadoghq.eu
in the datadog-agent.yaml
manifest.
Deploy the DaemonSet with the command:
kubectl apply -f datadog-agent.yaml
If you already have the Agent running with a manifest:
For Kubernetes versions below 1.30
, add the annotation container.apparmor.security.beta.kubernetes.io/system-probe: unconfined
on the datadog-agent
template:
spec:
selector:
matchLabels:
app: datadog-agent
template:
metadata:
labels:
app: datadog-agent
name: datadog-agent
annotations:
container.apparmor.security.beta.kubernetes.io/system-probe: unconfined
For Kubernetes versions 1.30+
, add the following securityContext
on the datadog-agent
template:
spec:
selector:
matchLabels:
app: datadog-agent
template:
metadata:
labels:
app: datadog-agent
name: datadog-agent
spec:
serviceAccountName: datadog-agent
securityContext:
appArmorProfile:
type: Unconfined
containers:
# (...)
Enable process collection and the system probe with the following environment variables in the Agent DaemonSet. If you are running a container per Agent process, add the following environment variables to the Process Agent container; otherwise, add them to the Agent container.
# (...)
env:
# (...)
- name: DD_PROCESS_AGENT_ENABLED
value: 'true'
- name: DD_SYSTEM_PROBE_ENABLED
value: 'true'
- name: DD_SYSTEM_PROBE_EXTERNAL
value: 'true'
- name: DD_SYSPROBE_SOCKET
value: /var/run/sysprobe/sysprobe.sock
- name: DD_AUTH_TOKEN_FILE_PATH
value: /etc/datadog-agent/auth/token
Mount the following extra volumes into the datadog-agent
container:
# (...)
spec:
serviceAccountName: datadog-agent
containers:
- name: datadog-agent
image: 'gcr.io/datadoghq/agent:latest'
# (...)
volumeMounts:
- name: procdir
mountPath: /host/proc
readOnly: true
- name: cgroups
mountPath: /host/sys/fs/cgroup
readOnly: true
- name: debugfs
mountPath: /sys/kernel/debug
- name: sysprobe-socket-dir
mountPath: /var/run/sysprobe
- name: auth-token
mountPath: /etc/datadog-agent/auth
readOnly: false # needs RW to write auth token
Add a new system-probe as a side car to the Agent:
# (...)
spec:
serviceAccountName: datadog-agent
containers:
- name: datadog-agent
image: 'gcr.io/datadoghq/agent:latest'
# (...)
- name: system-probe
image: 'gcr.io/datadoghq/agent:latest'
imagePullPolicy: Always
securityContext:
capabilities:
add:
- SYS_ADMIN
- SYS_RESOURCE
- SYS_PTRACE
- NET_ADMIN
- NET_BROADCAST
- NET_RAW
- IPC_LOCK
- CHOWN
command:
- /opt/datadog-agent/embedded/bin/system-probe
env:
- name: DD_SYSTEM_PROBE_ENABLED
value: 'true'
- name: DD_SYSPROBE_SOCKET
value: /var/run/sysprobe/sysprobe.sock
- name: DD_AUTH_TOKEN_FILE_PATH
value: /etc/datadog-agent/auth/token
resources:
requests:
memory: 150Mi
cpu: 200m
limits:
memory: 300Mi
cpu: 400m
volumeMounts:
- name: procdir
mountPath: /host/proc
readOnly: true
- name: cgroups
mountPath: /host/sys/fs/cgroup
readOnly: true
- name: debugfs
mountPath: /sys/kernel/debug
- name: sysprobe-socket-dir
mountPath: /var/run/sysprobe
- name: auth-token
mountPath: /etc/datadog-agent/auth
readOnly: true
Finally, add the following volumes to your manifest:
volumes:
- name: debugfs
hostPath:
path: /sys/kernel/debug
- name: sysprobe-socket-dir
emptyDir: { }
- name: auth-token
emptyDir: { }
The Datadog Operator is Generally Available with the `1.0.0` version, and it reconciles the version `v2alpha1` of the DatadogAgent Custom Resource.
The Datadog Operator is a way to deploy the Datadog Agent on Kubernetes and OpenShift. It reports deployment status, health, and errors in its Custom Resource status, and it limits the risk of misconfiguration thanks to higher-level configuration options.
To enable Cloud Network Monitoring in Operator, use the following configuration:
apiVersion: datadoghq.com/v2alpha1
metadata:
name: placeholder
namespace: placeholder
spec:
features:
npm:
enabled: true
To enable Cloud Network Monitoring in Docker, use the following configuration when starting the container Agent:
docker run --cgroupns host \
--pid host \
-e DD_API_KEY="<DATADOG_API_KEY>" \
-e DD_SYSTEM_PROBE_NETWORK_ENABLED=true \
-e DD_PROCESS_AGENT_ENABLED=true \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro \
-v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
-v /sys/kernel/debug:/sys/kernel/debug \
--security-opt apparmor:unconfined \
--cap-add=SYS_ADMIN \
--cap-add=SYS_RESOURCE \
--cap-add=SYS_PTRACE \
--cap-add=NET_ADMIN \
--cap-add=NET_BROADCAST \
--cap-add=NET_RAW \
--cap-add=IPC_LOCK \
--cap-add=CHOWN \
gcr.io/datadoghq/agent:latest
Replace <DATADOG_API_KEY>
with your Datadog API key.
If using docker-compose
, make the following additions to the Datadog Agent service.
version: '3'
services:
datadog:
image: "gcr.io/datadoghq/agent:latest"
environment:
- DD_SYSTEM_PROBE_NETWORK_ENABLED=true
- DD_PROCESS_AGENT_ENABLED=true
- DD_API_KEY=<DATADOG_API_KEY>
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /proc/:/host/proc/:ro
- /sys/fs/cgroup/:/host/sys/fs/cgroup:ro
- /sys/kernel/debug:/sys/kernel/debug
cap_add:
- SYS_ADMIN
- SYS_RESOURCE
- SYS_PTRACE
- NET_ADMIN
- NET_BROADCAST
- NET_RAW
- IPC_LOCK
- CHOWN
security_opt:
- apparmor:unconfined
To set up on Amazon ECS, see the Amazon ECS documentation page.
Enhanced resolution
Optionally, enable resource collection for cloud integrations to allow Cloud Network Monitoring to discover cloud-managed entities.
- Install the Azure integration for visibility into Azure load balancers and application gateways.
- Install the AWS Integration for visibility into AWS Load Balancer. you must enable ENI and EC2 metric collection
For additional information around these capabilities, see Cloud service enhanced resolution.
Failed connections
To enable the Agent to start collecting data about failed connections, add the following setting to your /etc/datadog-agent/system-probe.yaml
file (or C:\ProgramData\Datadog\system-probe.yaml
for Windows).
network_config: # use system_probe_config for Agent versions older than 7.24.1
## @param enabled - boolean - optional - default: false
## Set to true to enable Cloud Network Monitoring.
#
enabled: true
enable_tcp_failed_connections: true ##enabled by default
Further Reading
Additional helpful documentation, links, and articles: