Network Page

Network Page

Queries

To refine your search to traffic between particular endpoints, aggregate and filter your network aggregate connections with tags. You can select tags for the source and destination by using the search bar at the top of the page.

The following screenshot shows the default view, which aggregates the source and destination by the service tag. Accordingly, each row in the table represents service-to-service aggregate connections when aggregated over a one hour time period.

The next example shows all aggregate connections from IP addresses representing services in region us-east-1 to availability zones:

You can set the timeframe over which traffic is aggregated using the time selector at the top right of the page:

Facet panels

Facet panels mirror the tags in your search bar query. Switch between the facet panels with the Source and Destination tabs on top:

Custom facets

Aggregate and filter your traffic data by any tags in Datadog network page. A whitelist of tags is provided by default, which you can find in the search bar dropdown menu:

Whitelisted tags include service, availability zone, env, environment, pod, host, ip, and port, among others. If you want to aggregate or filter traffic by a tag that is not already in the menu, add it as a custom Facet:

  1. Select the + button on the top right of the facet panels.
  2. Enter the relevant tag you want to create a custom facet upon.
  3. Click Create.

Once the custom facet is created, use this tag to filter and aggregate traffic in the network page and map. All custom facets can be viewed in the bottom Custom section of the facet panels.

To perform a multi-character wildcard search, use the * symbol as follows:

  • service:web* matches all services that start with web
  • service:*web matches all services that end with web
  • service:*web* matches all services that contain the string web

Wildcard searches work within facets with this syntax. This query returns all the services that end with the string mongo:

service:*mongo

To learn more, see the search syntax documentation.

Network data

Your network metrics are displayed through the graphs and the associated table. All sent and received metrics are displayed from the perspective of the source :

  • Sent metrics: measure the value of something from the source to the destination from the source’s perspective.
  • Received metrics: measure the value of something from the destination to the source from the source’s perspective.

Values displayed might be different for sent_metric(source to destination) and received_metric(destination to source) if there is a large number of packet drops. In this case, if the destination sends a lot of bytes to the source, the aggregate connections that originate at destination include those bytes, but the aggregate connections that originate at source do not see them as received.

Note: The default collection interval is five minutes and retention is seven days.

Metrics

Network load

The following network load metrics are available:

Metric Description
VolumeThe number of bytes sent or received over a period. Measured in bytes (or orders of magnitude thereof) bidirectional.
 ThroughputThe rate of bytes sent or received over a period. Measured in bytes per second, bidirectional.

TCP

TCP is a connection-oriented protocol that guarantees in-order delivery of packets. The following TCP metrics are available:

Metric Description
TCP RetransmitsTCP Retransmits represent detected failures that are retransmitted to ensure delivery. Measured in count of retransmits from the source.
TCP LatencyMeasured as TCP smoothed round-trip time, that is the time between a TCP frame being sent and acknowledged.
TCP JitterMeasured as TCP smoothed round-trip time variance.
 Established ConnectionsThe number of TCP connections in an established state. Measured in connections per second from the source.
 Closed ConnectionsThe number of TCP connections in a closed state. Measured in connections per second from the source.

Cloud service autodetection

Filtering by specific AWS cloud services can help pinpoint latency, assess database performance, and visualize your network more completely. For instance, you can filter a search query by service, view the service in the Network Map, and trace communication on that node to see affected services.

  • To filter a query: In a search bar, enter tags such as service:s3, service:kinesis, and service:elb. For some services, you can break down latency and retransmits further by using more out-of-the-box tags like s3_bucket and rds_instance_type.
  • To visualize specific managed services: In the Network Map, click the dropdown next to View and type or select desired tags. In the map, click a node to view troubleshooting options.
  • To view integration metrics for a service: In the Network Page, click a row in the dependency table. In the opened side panel, use the Integration Metrics tab to analyze the performance of cloud services and distinguish between a client-side and cloud provider issue.

NPM automatically detects S3, RDS, Kinesis, ELB, Elasticache, and others listed in the supported services. To monitor other endpoints where an Agent cannot be installed (such as public APIs), group the destination in the Network Overview by the domain tag.

DNS resolution

Starting with Agent 7.17+, the Agent resolves IPs to human-readable domain names for external and internal traffic. Domain allows you to monitor cloud provider endpoints where a Datadog Agent cannot be installed, such as S3 buckets, application load balancers, and APIs. Unrecognizable domain names such as DGA domains from C&C servers may point to network security threats. Domain is encoded as a tag in Datadog, so you can use it in search bar queries and the facet panel to aggregate and filter traffic.

Note: DNS resolution is supported for hosts where the system probe is running on the root network namespace, which is usually caused by running the system-probe in a container without using the host network.

pre-NAT IPs

The Network Address Translation (NAT) is a tool used by Kubernetes and other systems to route traffic between containers. When investigating a specific dependency (for example, service to service), you can use the presence or absence of pre-NAT IPs to distinguish between Kubernetes-native services, which do their own routing, and services that rely on external clients for routing. This feature does not currently include resolution of NAT gateways.

To view pre-NAT and post-NAT IPs, use the Show pre-NAT IPs toggle in the table settings. When this setting is toggled off, IPs shown in the Source IP and Dest IP columns are by default post-NAT IPs. In cases where you have multiple pre-NAT IPs for one post-NAT IP, the top 5 most common pre-NAT IPs will be displayed. pre_nat.ip is a tag like any other in the product, so you can use it to aggregate and filter traffic.

Network ID

NPM users may configure their networks to have overlapping IP spaces. For instance, you may want to deploy in multiple VPCs (virtual private clouds) which have overlapping address ranges and communicate only through load balancers or cloud gateways.

To correctly classify traffic destinations, NPM uses the concept of a network ID, which is represented as a tag. A network ID is an alphanumeric identifier for a set of IP addresses that can communicate with one another. When an IP address mapping to several hosts with different network IDs is detected, this identifier is used to determine the particular host network traffic is going to or coming from.

In AWS and GCP, the network ID is automatically set to the VPC ID. For other environments, the network ID may be set manually, either in datadog.yaml as shown below, or by adding the DD_NETWORK_ID to the process and core Agent containers.

network:
   Id: <your-network-id>

Saved views

Organize and share views of traffic data. Saved Views make debugging faster and empower collaboration. For instance, you can create a view, save it for the future for common queries, and copy its link to share network data with your teammates.

  • To save a view: click the + Save button and name the view to record your current query, table configuration, and graph metric selections.
  • To load a view: click Views at the top left to see your Saved Views and select a view from the list.
  • To rename a view: hover over a view in the Saved Views list and click the gear icon to Edit name.
  • To share a view: hover over a view in the Saved Views list and click the link icon to Copy permalink.

To learn more, see the Saved Views documentation.

Table

The network table breaks down the Volume, Throughput, TCP Retransmits, Round-trip Time (RTT), and RTT variance metrics between each source and destination defined by your query.

You can configure the columns in your table using the Customize button at the top right of the table.

Congifure the traffic shown with the Filter Traffic button.

External traffic (to public IPs) and Datadog Agent traffic is shown by default. To narrow down your view, you can choose to toggle off the Show Datadog Traffic and Show External Traffic toggles.

Unresolved traffic

Unresolved source and destination tags are marked as N/A. A traffic source or destination endpoint may be unresolved because:

  • The host or container source or destination IPs are not tagged with the source or destination tags used for traffic aggregation.
  • The endpoint is outside of your private network, and accordingly is not tagged by the Datadog Agent.
  • The endpoint is a firewall, service mesh or other entity where a Datadog Agent cannot be installed.

Use the Show N/A (Unresolved Traffic) toggle in the upper right corner of the data table to filter out aggregate connections with unresolved (N/A) sources or destinations.

Select any row from the data table to see associated logs, traces, and processes for a given source <=> destination aggregate connection:

Sidepanel

The sidepanel provides contextual telemetry to help you debug network dependencies. Use the Flows, Logs, Traces, and Processes tabs to determine whether a high retransmit count or latency in traffic between two endpoints is due to:

  • A spike in traffic volume from a particular port or IP.
  • Heavy processes consuming the CPU or memory of the destination endpoint.
  • Application errors in the code of the source endpoint.

Common tags

The top of the sidepanel displays common source and destination tags shared by the inspected dependency’s most recent connections. Use common tags to gain additional context into a faulty endpoint. For instance, when troubleshooting latent communication to a particular service, common destination tags will surface:

  • Granular context such as the container, task, or host to which traffic is flowing.
  • Wider context such as the availability zone, cloud provider account, or deployment in which the service runs.

Further Reading