Network Health

Docs > Network Monitoring > Cloud Network Monitoring > Network Health

Network Health is in Preview. Contact your Datadog representative to sign up.

Overview

Network Health provides a unified view of your network’s most critical issues, automatically detecting and prioritizing problems across DNS, TLS certificates, security groups, and network anomalies. It surfaces actionable insights with clear remediation paths, helping you resolve connectivity issues and reduce incident impact.

This page describes the sections of the Network Health page and the issues and insights surfaced in each.

The Network Health page with the collapsible menu open, highlighting Recommended Actions.

Prerequisites

Cloud Network Monitoring is enabled.
To view Security Groups, resource collection must enabled in your AWS integration.

Recommended Actions

The Recommended Actions section highlights the most critical issues detected in your network. These are prioritized based on:

Severity: Whether the issue is actively blocking traffic
Impact: How critical the affected services are to your infrastructure

Each recommended action displays:

The specific problem detected (for example, “TLS certificate expired N days ago”)
The impacted client service (the service making requests)
The impacted server service (the service receiving requests)

Hover over a service name to pivot to APM, or click Remediate to view remediation steps along with options to create a New Workflow, Create a Case, or Declare an Incident.

Recommended actions side panel of an affected service, showing remediation steps.

Watchdog Insights

The Watchdog Insights section displays anomalous network behavior detected by Watchdog, focusing on spikes in TCP retransmits. An increase in retransmits compared to your baseline (typically the previous week) often indicates an underlying network issue. See the Watchdog Insights documentation for more information.

Use Watchdog Insights to:

Detect potential problems early
Correlate anomalies with specific root causes
Investigate performance degradation before it impacts users

TLS certificates

Expired or expiring TLS certificates can block secure connections between services, resulting in dropped traffic. The TLS Certificates section lists:

Expired certificates: Certificates that are invalid and blocking traffic
Expiring certificates: Certificates about to expire
Impacted services: The client and server services affected by each certificate issue (note that the client “service” may be an AWS load balancer, such as an Application Load Balancer)

Click an expired certificate to view steps for renewing it in AWS, or to create a New Workflow, Create a Case, or Declare an Incident.

DNS failures

DNS misconfigurations can route traffic to incorrect destinations, preventing services from communicating. These failures typically result from changes made to DNS routing configurations.

The DNS Failures section shows:

Failure reason: The cause of the DNS failure
Impacted DNS server: The DNS server experiencing elevated failure rates
Impacted services: The client and server services affected by the DNS failure

Failure reasons:

NXDOMAIN: The domain name does not exist, usually due to a misconfiguration or removed domain.
TIMEOUT: The DNS query timed out before receiving a response, which may indicate network issues or unresponsive DNS servers.
SERVFAIL: The DNS server failed to process a query, often due to a server-side problem.

Hover over a service name to pivot to APM, or click on a recommended action to view remediation steps along with options to create a New workflow, Create a Case, or Declare an Incident.

Security groups

Security groups control traffic flow in cloud environments through allow and deny rules. Because security groups deny traffic by default, accidental rule deletions or modifications can immediately block legitimate traffic between services.

Note: Security group monitoring is available only for AWS and requires EC2 resource collection to be enabled in your AWS integration.

The Security Groups section identifies:

Security group misconfigurations blocking traffic
The specific services unable to communicate
Recent changes to security group rules

Resolution:

Click on a security group issue to open the side panel.
Select View in AWS to navigate to the AWS console.
Review and modify the inbound and outbound rules.
Use the Infrastructure Change Tracking data in the side panel to identify when the change occurred and revert it if necessary.