To refine your search to traffic between particular endpoints, aggregate and filter your network aggregate connections with tags. You can select tags for the client and server using the search bar at the top of the page. The client is where the connection originated, and the server is where the connection terminated.
The following screenshot shows the default view, which aggregates the client and server by the
service tag. Accordingly, each row in the table represents service-to-service aggregate connections when aggregated over a one hour time period.
The next example shows all aggregate connections from IP addresses representing services in region
us-east-1 to availability zones:
You can set the timeframe over which traffic is aggregated using the time selector at the top right of the page:
Tags from Datadog integrations or Unified Service Tagging can be used for aggregating and filtering automatically. See custom facets, below, for other tags. You can also select “Auto-grouped traffic” to see traffic bucketed into several commonly used tags such as
You can filter to traffic where the client or server matches a CIDR using
CIDR(network.client.ip, 10.0.0.0/8) or
Recommended queries allow you to begin investigating into your network–whether you’re troubleshooting a specific issue or gaining a better overall understanding of your network. The recommended queries help you quickly find relevant network information without needing to search for or group the traffic. For example, the recommended query
Find dependencies of service: web-store populates the search bar with the query
client_service: web-store and displays the top services that the service web-store is sending traffic to within the network, and therefore its downstream dependencies.
Any available recommended queries are provided at the top of the Analytics page, and there are three recommended queries at the top of the DNS page. Use these queries to access commonly used data, and see any changes in that data in the last hour.
To run a recommended query, click on the tile. Hovering over the tile displays a description and summary of the data the query returns.
You can use the facet panels to browse through all of the tags available on your flows, or filter traffic when you don’t remember the exact tags you were looking for. Facet panels mirror the tags in your search bar query. Switch between the facet panels with the Client and Server tabs on top:
Aggregate and filter your traffic data by any tags in Datadog network page. An include list of tags is provided by default, which you can find in the search bar dropdown menu:
Include listed tags are
port, among others. If you want to aggregate or filter traffic by a tag that is not already in the menu, add it as a custom Facet:
- Select the
+ button on the top right of the facet panels.
- Enter the relevant tag you want to create a custom facet upon.
Once the custom facet is created, use this tag to filter and aggregate traffic in the network page and map. All custom facets can be viewed in the bottom
Custom section of the facet panels.
To perform a multi-character wildcard search, use the
* symbol as follows:
client_service:web* matches all client services that start with web
client_service:*web matches all client services that end with web
client_service:*web* matches all client services that contain the string web
Wildcard searches work within facets with this syntax. This query returns all the client services that end with the string “mongo”:
To learn more, see the search syntax documentation.
Groups allow you to group your data by a given tag’s value. For example, if you select a grouping such as host, results are grouped by individual hosts. You can also choose to view all your data in a single group using the Ungrouped traffic option. Additionally, you may have large chunks of data that are not tagged by the grouping you’re interested in. In these situations, you can use Auto-grouped traffic to group data by whichever tags are available.
The summary graphs are a condensed view of your network, which you can modify to display volume, throughput, connections, or latency as needed. Display up to three summary graphs at a time, and change the data and visualization type to suit your organization. To update a graph’s data source, click on the graph’s title and make a selection from the dropdown menu.
To change the visualization type, click on the pencil icon in the top right corner of the graph. Select from the options available, as shown in the screenshot below.
To hide a specific graph, click on the hide icon next to the pencil icon. You can display as little as one graph or as many as three graphs. To add graphs, click on the plus icon
+ on the right side of the summary graph and select the graph to add. You can also reset the graphs to the default graphs when adding a new graph.
Your network metrics are displayed through the graphs and the associated table. All sent and received metrics are displayed from the perspective of the source:
- Sent metrics: measure the value of something from the source to the destination from the source’s perspective.
- Received metrics: measure the value of something from the destination to the source from the source’s perspective.
Values displayed might be different for
sent_metric(source to destination) and
received_metric(destination to source) if there is a large number of packet drops. In this case, if the
destination sends a lot of bytes to the
source, the aggregate connections that originate at
destination include those bytes, but the aggregate connections that originate at
source do not see them as received.
Note: Data is collected every 30 seconds, aggregated in five minute buckets, and retained for 14 days.
The following network load metrics are available:
|Volume||The number of bytes sent or received over a period. Measured in bytes (or orders of magnitude thereof) bidirectional.|
| Throughput||The rate of bytes sent or received over a period. Measured in bytes per second, bidirectional.|
TCP is a connection-oriented protocol that guarantees in-order delivery of packets. The following TCP metrics are available: All metrics are instrumented from the perspective of the
client side of the connection when available, or the server if not.
|TCP Retransmits||TCP Retransmits represent detected failures that are retransmitted to ensure delivery. Measured in count of retransmits from the client.|
|TCP Latency||Measured as TCP smoothed round-trip time, that is, the time between a TCP frame being sent and acknowledged.|
|TCP Jitter||Measured as TCP smoothed round-trip time variance.|
|Established Connections||The number of TCP connections in an established state. Measured in connections per second from the client.|
|Closed Connections||The number of TCP connections in a closed state. Measured in connections per second from the client.|
Cloud service autodetection
If you’re relying on managed cloud services like S3 or Kinesis, you can monitor the performance of traffic to those services from your internal applications. Scope your view to a particular AWS or Google Cloud dependency to pinpoint latency, assess database performance, and visualize your network more completely.
For instance, you can
- visualize data flow from your internal Kubernetes cluster to
server_service:aws.s3 in the Network Map.
- pivot to the Network Page to isolate which pods are establishing the most connections to that service, and
- validate that their request are successful by analyzing S3 performance metrics, which are correlated with traffic performance directly in the sidepanel for a given dependency, under the Integration Metrics tab.
NPM automatically maps
- network calls to S3 (which can broken down by
s3_bucket), RDS (which can be broken down by
rds_instance_type), Kinesis, ELB, Elasticache, and other AWS services.
- API calls to AppEngine, Google DNS, Gmail, and other Google Cloud services.
To monitor other endpoints where an Agent cannot be installed (such as public APIs), group the destination in the Network Overview by the
domain tag. Or, see the section below for cloud service resolution.
Cloud service enhanced resolution
If you have setup enhanced resolution for AWS or Azure, NPM can filter and group network traffic with several resources collected from these cloud providers. Depending on the cloud provider and resource, you have different sets of tags available to query with. Datadog applies the tags defined below in addition to the user-defined tags.
Amazon Web Services
- dns_name (format loadbalancer/dns:)
- custom (user-defined) tags applied to AWS Loadbalancers
- custom (user) tags applied to AWS Nat Gateways
- custom (user) tags applied to VPC Internet Gateways
- custom (user) tags applied to VPC Internet Endpoints
Loadbalancers and Application Gateways
- custom (user-defined) tags applied to Azure Loadbalancers and Application Gateways
Starting with Agent 7.17+, the Agent resolves IPs to human-readable domain names for external and internal traffic. Domain allows you to monitor cloud provider endpoints where a Datadog Agent cannot be installed, such as S3 buckets, application load balancers, and APIs. Unrecognizable domain names such as DGA domains from C&C servers may point to network security threats.
domain is encoded as a tag in Datadog, so you can use it in search bar queries and the facet panel to aggregate and filter traffic.
Note: DNS resolution is supported for hosts where the system probe is running on the root network namespace, which is usually caused by running the system-probe in a container without using the host network.
Network Address Translation (NAT)
NAT is a tool used by Kubernetes and other systems to route traffic between containers. When investigating a specific dependency (for example, service to service), you can use the presence or absence of pre-NAT IPs to distinguish between Kubernetes-native services, which do their own routing, and services that rely on external clients for routing. This feature does not currently include resolution of NAT gateways.
To view pre-NAT and post-NAT IPs, use the Show pre-NAT IPs toggle in the table settings. When this setting is toggled off, IPs shown in the Client IP and Server IP columns are by default post-NAT IPs. In cases where you have multiple pre-NAT IPs for one post-NAT IP, the top 5 most common pre-NAT IPs are displayed.
pre_nat.ip is a tag like any other in the product, so you can use it to aggregate and filter traffic.
NPM users may configure their networks to have overlapping IP spaces. For instance, you may want to deploy in multiple VPCs (virtual private clouds) which have overlapping address ranges and communicate only through load balancers or cloud gateways.
To correctly classify traffic destinations, NPM uses the concept of a network ID, which is represented as a tag. A network ID is an alphanumeric identifier for a set of IP addresses that can communicate with one another. When an IP address mapping to several hosts with different network IDs is detected, this identifier is used to determine the particular host network traffic is going to or coming from.
In AWS and Google Cloud, the network ID is automatically set to the VPC ID. For other environments, the network ID may be set manually, either in
datadog.yaml as shown below, or by adding the
DD_NETWORK_ID to the process and core Agent containers.
Organize and share views of traffic data. Saved Views make debugging faster and empower collaboration. For instance, you can create a view, save it for the future for common queries, and copy its link to share network data with your teammates.
- To save a view: click the + Save button and name the view to record your current query, table configuration, and graph metric selections.
- To load a view: click Views at the top left to see your Saved Views and select a view from the list.
- To rename a view: hover over a view in the Saved Views list and click the gear icon to Edit name.
- To share a view: hover over a view in the Saved Views list and click the link icon to Copy permalink.
To learn more, see the Saved Views documentation.
The network table breaks down the Volume, Throughput, TCP Retransmits, Round-trip Time (RTT), and RTT variance metrics between each source and destination defined by your query.
You can configure the columns in your table using the
Customize button at the top right of the table.
Configure the traffic shown with the
Filter Traffic button.
External traffic (to public IPs) and Datadog Agent traffic is shown by default. To narrow down your view, you can choose to toggle off the
Show Datadog Traffic and
Show External Traffic toggles.
Unresolved client and server tags are marked as
N/A. A traffic client or server endpoint may be unresolved because:
- The host or container client or server IPs are not tagged with the client or server tags used for traffic aggregation.
- The endpoint is outside of your private network, and accordingly is not tagged by the Datadog Agent.
- The endpoint is a firewall, service mesh or other entity where a Datadog Agent cannot be installed.
Use the Show N/A (Unresolved Traffic) toggle in the upper right corner of the data table to filter out aggregate connections with unresolved (
N/A) clients or servers.
Select any row from the data table to see associated logs, traces, and processes for a given client <=> server aggregate connection:
The sidepanel provides contextual telemetry to help you debug network dependencies. Use the Flows, Logs, Traces, and Processes tabs to determine whether a high retransmit count or latency in traffic between two endpoints is due to:
- A spike in traffic volume from a particular port or IP.
- Heavy processes consuming the CPU or memory of the destination endpoint.
- Application errors in the code of the client endpoint.
The top of the sidepanel displays common client and server tags shared by the inspected dependency’s most recent connections. Use common tags to gain additional context into a faulty endpoint. For instance, when troubleshooting latent communication to a particular service, common destination tags surface the following:
- Granular context such as the container, task, or host to which traffic is flowing.
- Wider context such as the availability zone, cloud provider account, or deployment in which the service runs.
The Security tab highlights potential network threats and findings detected by Cloud Security Management Threats and Cloud Security Management Misconfigurations. These signals are generated when Datadog detects network activity that matches a detection or compliance rule, or if there are other threats and misconfigurations related to the selected network flow.
Additional helpful documentation, links, and articles: