Network Analytics

Docs > Network Monitoring > Cloud Network Monitoring > Network Analytics

Overview

The Network Analytics page provides insights into your overall network health and shows recommended queries at the top of the page. These recommended queries enable you to run common queries and see snapshots of relevant metrics, so that you can see changes in throughput, latency, DNS errors, and more. Clicking on a recommended query automatically populates the search bar, group bys, and summary graphs to provide you with relevant insights into your network.

Network Analytics landing page under Cloud Network Monitoring

Queries

To refine your search to traffic between particular endpoints, aggregate and filter your network connections with tags. Tags from Datadog integrations or Unified Service Tagging can be used for aggregating and filtering automatically. When utilizing tagging in Network Monitoring, you can take advantage of how network traffic flows across availability zones for a particular service or for your entire infrastructure. Grouping by client and server tags visualizes the network flow between those two sets of tags.

Additionally, Datadog provides a list of default out-of-the-box tags that you can use to efficiently query and analyze the network traffic most relevant to your needs.

network diagram showing how requests are seen when grouping by tags

For example, if you want to see network traffic between your ordering service called orders-app and all of your availability zones, use client_service:orders-app in the search bar, add the service tag in the View clients as drop-down, then use the availability-zone tag in the View servers as drop-down to visualize the traffic flow between these two sets of tags:

Network Analytics page showing how requests are seen when filtering on service and grouping by availability zone

For information on NA/Untagged traffic paths, see Unresolved traffic.

Additionally, the following diagram illustrates inbound and outbound requests when grouping by client and server tags. The client is where the connection originated, and the server is where the connection terminated.

network diagram showing inbound and outbound requests

The following screenshot shows the default view, which aggregates the client and server by the service tag. Accordingly, each row in the table represents service-to-service aggregate connections when aggregated over a one hour time period. Select “Auto-grouped traffic” to see traffic bucketed into several commonly used tags such as service, kube_service, short_image, and container_name.

CNM default view with drop downs showing view clients and servers as auto grouped traffic

The next example shows all aggregate connections from IP addresses representing services in region us-east-1 to availability zones:

You can further aggregate to isolate to traffic where the client or server matches a CIDR using CIDR(network.client.ip, 10.0.0.0/8) or CIDR(network.server.ip, 10.0.0.0/8).

Understanding client and server roles in relation to traffic direction

The Network Analytics page shows directional traffic flows from clients in one zone to servers in another. These flows are not symmetrical and may not show equal “bytes sent” and “bytes received” when reversed.

In this context:

Client refers to the side that initiates the connection.
Server is the side that responds to that connection.

Datadog monitors traffic based on who opened the connection. The reverse direction (server to client) is shown as a separate flow and may have different volume metrics, or no data at all if no connections are initiated in that direction.

For example, if a client in us-east-1d talks to a server in us-east-1c, you may see significant traffic. However, if there is no server in us-east-1d, the reverse row (us-east-1c → us-east-1d) may show little or no data.

Note: Asymmetries in traffic can also result from application behavior or infrastructure elements (for example, proxies or NATs), or lack of connection initiation in one direction.

Recommended queries

Recommended queries allow you to begin investigating into your network—whether you’re troubleshooting a specific issue or gaining a better overall understanding of your network. The recommended queries help you quickly find relevant network information without needing to search for or group the traffic. For example, the recommended query Find dependencies of service: web-store populates the search bar with the query client_service: web-store and displays the top services that the service web-store is sending traffic to within the network, and therefore its downstream dependencies.

Any available recommended queries are provided at the top of the Analytics page, and there are three recommended queries at the top of the DNS page. Use these queries to access commonly used data, and see any changes in that data in the last hour.

To run a recommended query, click on the tile. Hovering over the tile displays a description and summary of the data the query returns.

The detail view of a recommended query displaying a description and query information, with four query dimensions displayed: Search for, View clients as, View servers as, and Visualize as

You can use the facet panels to browse through all of the tags available on your flows, or filter traffic when you don’t remember the exact tags you were looking for. Facet panels mirror the tags in your search bar query. Switch between the facet panels with the Client and Server tabs on top:

Aggregate and filter your traffic data by any tags on the network analytics page. A list of included tags is located on the left side of the screen under the Client and Server tags, and in the View clients as and View servers as dropdown menus.

Dropdown menu from network analytics page showing the facet list

Include listed tags are service, availability zone, env, environment, pod, host, ip, and port, among others. If you want to aggregate or filter traffic by a tag that is not already in the menu, add it as a custom Facet:

Select the + Add button on the top right of the facet panels.
Enter the relevant tag you want to create a custom facet upon.
Click Add.

Once the custom facet is created, use this tag to filter and aggregate traffic on the network analytics page and network map. All custom facets can be viewed in the bottom Custom section of the facet panels.

Wildcard search

To perform a multi-character wildcard search, use the * symbol as follows:

client_service:web* matches all client services that start with web.
client_service:*web matches all client services that end with web.
client_service:*web* matches all client services that contain the string web.

Wildcard searches work within facets with this syntax. This query returns all the client services that end with the string “mongo”:

client_service:*mongo

To learn more, see the search syntax documentation.

Group by

Groups allow you to group your data by a given tag’s value. For example, if you select a grouping such as host, results are grouped by individual hosts. You can also choose to view all your data in a single group using the Ungrouped traffic option. Additionally, you may have large chunks of data that are not tagged by the grouping you’re interested in. In these situations, you can use Auto-grouped traffic to group data by whichever tags are available.

If you want to investigate connections from all of your hosts in a single grouping, add the host tag in the View clients as dropdown, and add Ungrouped traffic in the View servers as dropdown.

NPM analytics page sorting by host and grouped by Ungrouped traffic

If you have traffic that is not tagged by a specific group, you can select Auto-grouped traffic to group data by any available tags. For example, to see which tags are available for a specific service, use the service tag in the View clients as dropdown, and add Auto-grouped traffic in the View servers as dropdown:

NPM analytics page sorting by service tags

The Auto-grouped traffic option can help you identify the source of your tags. For example, hover over the individual icons to display a tooltip that indicates the tag’s origin:

Using the search bar and the group by feature together is helpful to further isolate your network traffic. For example, to find all traffic from your auth-dotnet service across all data centers, enter service:auth-dotnet in the search bar and select datacenter in the View clients as dropdown:

Neutral tags

Neutral tags are tags that are not specific to a client or server, and instead apply to an entire flow. You can search for and filter on traffic with these neutral tags. For example, you can use these tags to filter for traffic that is TLS encrypted.

The following is the list of neutral tags available for use:

Tag	Description
`is_agent_traffic`	Indicates if the traffic was generated by the Datadog Agent.
`tls_encrypted`	Specifies if the connection is encrypted using TLS.
`tls_cipher_suite`	Identifies the TLS cipher suite used (for example, `tls_ecdhe_rsa_with_aes_128_gcm_sha256`).
`tls_cipher_insecure`	Indicates if the cipher used is considered secure.
`tls_version`	The TLS version used (`tls_1.2` or `tls_1.3`).
`tls_client_version`	The TLS versions supported by the client (`tls_1.2` or `tls_1.3`).
`gateway_id`	Unique identifier for the AWS gateway resource.
`gateway_type`	Type of AWS gateway (Internet, NAT, or Transit).
`gateway_region`	AWS region of the gateway (for example, `us-east-1`).
`gateway_availability-zone`	Availability zone hosting the gateway (for example, `us-east-1a`).
`gateway_public_ip`	Public IP address assigned to the NAT gateway.
`tgw_attachment_id`	Unique identifier for the AWS Transit Gateway attachment.
`tgw_attachment_type`	Type of Transit Gateway attachment (for example, VPC, VPN, Direct Connect).
`vpc_endpoint_id`	Unique identifier for the VPC endpoint.

Summary graphs

The summary graphs are a condensed view of your network, which you can modify to display volume, throughput, connections, or latency as needed. Display up to three summary graphs at a time, and change the data and visualization type to suit your organization. To update a graph’s data source, click on the graph’s title and make a selection from the dropdown menu.

To change the visualization type, click on the pencil icon in the top right corner of the graph. Select from the options available, as shown in the screenshot below.

The summary graph visualization options, displaying options to adjust Y-Axis Scale with Linear, Log, Pow, and Sqrt, and to adjust Graph Type with Area, Line, Bars, Toplist, Change, and Piechart

To hide a specific graph, click on the hide icon next to the pencil icon. You can display as little as one graph or as many as three graphs. To add graphs, click on the plus icon + on the right side of the summary graph and select the graph to add. You can also reset the graphs to the default graphs when adding a new graph.

The summary graphs section displaying the options to Add graph and Reset Graphs

Network data

Your network metrics are displayed through the graphs and the associated table. All sent and received metrics are displayed from the perspective of the source:

Sent metrics: measure the value of something from the source to the destination from the source’s perspective.
Received metrics: measure the value of something from the destination to the source from the source’s perspective.

Values displayed might be different for sent_metric(source to destination) and received_metric(destination to source) if there is a large number of packet drops. In this case, if the destination sends a lot of bytes to the source, the aggregate connections that originate at destination include those bytes, but the aggregate connections that originate at source do not see them as received.

Note: Data is collected every 30 seconds, aggregated in five minute buckets, and retained for 14 days.

Metrics

Network load

The following network load metrics are available:

Metric	Description
Volume	The number of bytes sent or received over a period. Measured in bytes (or orders of magnitude thereof) bidirectional.
Throughput	The rate of bytes sent or received over a period. Measured in bytes per second, bidirectional.

TCP

TCP is a connection-oriented protocol that guarantees in-order delivery of packets.

The following TCP metrics are available:

Metric	Description
TCP Retransmits	TCP Retransmits represent detected failures that are retransmitted to ensure delivery. Measured in count of retransmits from the client.
TCP Latency	Measured as TCP smoothed round-trip time, that is, the time between a TCP frame being sent and acknowledged.
TCP Jitter	Measured as TCP smoothed round-trip time variance.
TCP Timeouts	The number of TCP connections that timed out from the perspective of the operating system. This can indicate general connectivity and latency issues.
TCP Refusals	The number of TCP connections that were refused by the server. Typically this indicates an attempt to connect to an IP/port that isn’t receiving connections, or a firewall/security misconfiguration.
TCP Resets	The number of TCP connections that were reset by the server.
Established Connections	The number of TCP connections in an established state. Measured in connections per second from the client.
Closed Connections	The number of TCP connections in a closed state. Measured in connections per second from the client.

All metrics are instrumented from the perspective of the client side of the connection when available, or the server if not.

Cloud service autodetection

If you’re relying on managed cloud services like S3 or Kinesis, you can monitor the performance of traffic to those services from your internal applications. Scope your view to a particular AWS, Google Cloud, or Azure dependency to pinpoint latency, assess database performance, and visualize your network more completely.

For instance, you can:

Visualize data flow from your internal Kubernetes cluster to server_service:aws.s3 in the Network Map.
Pivot to the Network Page to isolate which pods are establishing the most connections to that service, and
Validate that their request is successful by analyzing S3 performance metrics, which are correlated with traffic performance directly in the side panel for a given dependency, under the Integration Metrics tab.

CNM automatically maps:

Network calls to S3 (which can broken down by s3_bucket), RDS (which can be broken down by rds_instance_type), Kinesis, ELB, Elasticache, and other AWS services.
API calls to AppEngine, Google DNS, Gmail, and other Google Cloud services.

To monitor other endpoints where an Agent cannot be installed (such as public APIs), group the destination in the Network Overview by the domain tag. Or, see the section below for cloud service resolution.

Cloud service enhanced resolution

If you have setup enhanced resolution for AWS or Azure, CNM can filter and group network traffic with several resources collected from these cloud providers. Depending on the cloud provider and resource, you have different sets of tags available to query with. Datadog applies the tags defined below in addition to the user-defined tags.

Amazon Web Services

name
loadbalancer
load_balancer_arn
dns_name (format loadbalancer/dns:)
region
account_id
scheme
custom (user-defined) tags applied to AWS Loadbalancers

gateway_id
gateway_type
aws_nat_gateway_id
aws_nat_gateway_public_ip
aws_account
availability-zone
region
custom (user) tags applied to AWS Nat Gateways

gateway_id
gateway_type
aws_internet_gateway_id
aws_account
region
custom (user) tags applied to VPC Internet Gateways

gateway_id
gateway_type
aws_vpc_endpoint_id
custom (user) tags applied to VPC Internet Endpoints

Azure

Loadbalancers and Application Gateways

name
loadbalancer
cloud_provider
region
type
resource_group
tenant_name
subscription_name
subscription_id
sku_name
custom (user-defined) tags applied to Azure Loadbalancers and Application Gateways

Domain resolution

Starting with Agent 7.17+, the Agent resolves IPs to human-readable domain names for external and internal traffic. Domain allows you to monitor cloud provider endpoints where a Datadog Agent cannot be installed, such as S3 buckets, application load balancers, and APIs. Unrecognizable domain names such as DGA domains from C&C servers may point to network security threats. domain is encoded as a tag in Datadog, so you can use it in search bar queries and the facet panel to aggregate and filter traffic.

Note: DNS resolution is supported for hosts where the system probe is running on the root network namespace, which is usually caused by running the system-probe in a container without using the host network.

Network Address Translation (NAT)

NAT is a tool used by Kubernetes and other systems to route traffic between containers. When investigating a specific dependency (for example, service to service), you can use the presence or absence of pre-NAT IPs to distinguish between Kubernetes-native services, which do their own routing, and services that rely on external clients for routing. This feature does not currently include resolution of NAT gateways.

To view pre-NAT and post-NAT IPs, use the Show pre-NAT IPs toggle in the table settings. When this setting is toggled off, IPs shown in the Client IP and Server IP columns are by default post-NAT IPs. In cases where you have multiple pre-NAT IPs for one post-NAT IP, the top 5 most common pre-NAT IPs are displayed. pre_nat.ip is a tag like any other in the product, so you can use it to aggregate and filter traffic.

Network ID

CNM users may configure their networks to have overlapping IP spaces. For instance, you may want to deploy in multiple VPCs (virtual private clouds) which have overlapping address ranges and communicate only through load balancers or cloud gateways.

To correctly classify traffic destinations, CNM uses the concept of a network ID, which is represented as a tag. A network ID is an alphanumeric identifier for a set of IP addresses that can communicate with one another. When an IP address mapping to several hosts with different network IDs is detected, this identifier is used to determine the particular host network traffic is going to or coming from.

In AWS and Google Cloud, the network ID is automatically set to the VPC ID. For other environments, the network ID may be set manually, either in datadog.yaml as shown below, or by adding the DD_NETWORK_ID to the process and core Agent containers.

network:
   Id: <your-network-id>

Saved views

Organize and share views of traffic data. Saved Views make debugging faster and empower collaboration. For instance, you can create a view, save it for the future for common queries, and copy its link to share network data with your teammates.

To save a view: click the + Save button and name the view to record your current query, table configuration, and graph metric selections.
To load a view: click Views at the top left to see your Saved Views and select a view from the list.
To rename a view: hover over a view in the Saved Views list and click the gear icon to Edit name.
To share a view: hover over a view in the Saved Views list and click the link icon to Copy permalink.

To learn more, see the Saved Views documentation.

Table

The network table breaks down the Volume, Throughput, TCP Retransmits, Round-trip Time (RTT), and RTT variance metrics between each source and destination defined by your query.

You can configure the columns in your table using the Customize button at the top right of the table.

Configure the traffic shown with the Filter Traffic button.

External traffic (to public IPs) and Datadog Agent traffic is shown by default. To narrow down your view, you can choose to toggle off the Show Datadog Traffic and Show External Traffic toggles.

Unresolved traffic

Unresolved client and server tags are marked as N/A. A traffic client or server endpoint may be unresolved because it lacks identifiable metadata, such as source or destination information. This can occur when Datadog cannot resolve the traffic to known entities like load balancers, cloud services, or specific IP addresses within the monitored infrastructure. Typically, unresolved traffic may arise due to:

The host or container client or server IPs are not tagged with the client or server tags used for traffic aggregation.
The endpoint is outside of your private network, and accordingly is not tagged by the Datadog Agent.
The endpoint is a firewall, service mesh or other entity where a Datadog Agent cannot be installed.
The destination has not been tagged with a service, or an IP has not been mapped to any service.

Monitoring unresolved traffic is essential for identifying blind spots in network visibility and ensuring all relevant traffic is accounted for in performance and security analysis.

Use the Show N/A (Unresolved Traffic) toggle in the upper right corner of the data table to filter out aggregate connections with unresolved (N/A) clients or servers.

Select any row from the data table to see associated logs, traces, and processes for a given client <=> server aggregate connection:

Pivot to network path

Hover over a row in the analytics table to pivot to network path and see the paths between the source and destination specified in CNM.

Example of hovering over a row in the Analytics table to show the Network Path toggle

Sidepanel

The sidepanel provides contextual telemetry to help you debug network dependencies. Use the Flows, Logs, Traces, and Processes tabs to determine whether a high retransmit count or latency in traffic between two endpoints is due to:

A spike in traffic volume from a particular port or IP.
Heavy processes consuming the CPU or memory of the destination endpoint.
Application errors in the code of the client endpoint.

CNM sidepanel detailing traffic between the client service orders-app and the server service azure.sql_database

Common tags

The top of the sidepanel displays common client and server tags shared by the inspected dependency’s most recent connections. Use common tags to gain additional context into a faulty endpoint. For instance, when troubleshooting latent communication to a particular service, common destination tags surface the following:

Granular context such as the container, task, or host to which traffic is flowing.
Wider context such as the availability zone, cloud provider account, or deployment in which the service runs.

Security

The Security tab highlights potential network threats and findings detected by Workload Protection and Cloud Security Misconfigurations. These signals are generated when Datadog detects network activity that matches a detection or compliance rule, or if there are other threats and misconfigurations related to the selected network flow.

Default tags

The following is a list of default server and client tags available out-of-the-box for querying and analyzing network traffic.

server	client
server_team	client_team
server_role	client_role
server_env	client_env
server_environment	client_environment
server_app	client_app
server_domain	client_datacenter
server_dns_server	client_instance-id
server_datacenter	client_instance-type
server_instance-id	client_security-group-name
server_instance-type	client_security-group
server_security-group-name	client_name
server_security-group	client_image
server_name	client_account
server_image	client_kernel_version
server_account	client_autoscaling_group
server_kernel_version	client_region
server_autoscaling_group	client_terraform.module
server_region	client_site
server_terraform.module	client_image_name
server_site	client_pod_name
server_image_name	client_kube_deployment
server_pod_name	client_kube_replica_set
server_kube_deployment	client_kube_job
server_kube_replica_set	client_kube_cronjob
server_kube_job	client_kube_daemon_set
server_kube_cronjob	client_kube_stateful_set
server_kube_daemon_set	client_kube_cluster_name
server_kube_stateful_set	client_kube_service
server_kube_cluster_name	client_kube_namespace
server_kube_service	client_kubernetes_cluster
server_kube_namespace	client_cluster-name
server_kubernetes_cluster	client_kube_container_name
server_cluster-name	client_kube-labels
server_kube_container_name	client_task_name
server_kube-labels	client_task_version
server_task_name	client_task_family
server_task_version	client_ecs_cluster
server_task_family	client_loadbalancer
server_ecs_cluster	client_mesos_task
server_loadbalancer	client_marathon_app
server_cacheclusterid	client_chronos_job
server_mesos_task	client_chronos_job_owner
server_marathon_app	client_nomad_task
server_chronos_job	client_nomad_group
server_chronos_job_owner	client_nomad_job
server_nomad_task	client_rancher_container
server_nomad_group	client_rancher_service
server_nomad_job	client_rancher_stack
server_rancher_container	client_swarm_service
server_rancher_service	client_swarm_namespace
server_rancher_stack	client_container_id
server_swarm_service	client_container_name
server_swarm_namespace	client_image_tag
server_container_id	client_short_image
server_container_name	client_docker_image
server_image_tag	client_kubernetescluster
server_short_image	client_kube_cluster
server_cluster	client_protocol
server_docker_image
server_kubernetescluster
server_kube_cluster
server_s3_bucket
server_rds_instance_id
server_cloud_endpoint_detection
server_gateway_id
server_protocol