If you’re only seeing this failure intermittently—every couple of weeks and not continually—it’s likely fine; the Agent is designed to store and forward metrics and events in the case of transient issues, so all of your data is still being routed to Datadog.
The first and easiest thing to check is your host’s time—verify it’s in sync with a valid NTP server. If NTP is not related, move onto the below.
Between the server and ELB, there is a link with a smaller MTU and an ICMP blackhole. The hypothesis is that the server uses an MTU > 1500.
Datadog’s ELB supports jumbo frames as shown by the response MSS.
17:26:24.040194 IP (tos 0x0, ttl 64, id 30550, offset 0, flags [DF], proto TCP (6), length 60) 10.42.30.229.36487 > 126.96.36.199.80: Flags [S], cksum 0x89c5 (incorrect -> 0x7617), seq 3174747918, win 29200, options [mss 1460,sackOK,TS val 46708824 ecr 0,nop,wscale 7], length 0
17:26:24.054944 IP (tos 0x0, ttl 248, id 0, offset 0, flags [DF], proto TCP (6), length 60) 188.8.131.52.80 > 10.42.30.229.36487: Flags [S.], cksum 0x086f (correct), seq 3620905346, ack 3174747919, win 17898, options [mss 8961,sackOK,TS val 1552328339 ecr 46708824,nop,wscale 8], length 0
On Linux, get the server’s MTU using one of the following:
Then find the lowest MTU on the way:
sudo ip link set dev ... mtu 1500)
sudo sysctl net.ipv4.tcp_mtu_probing=1)
sudo ip route add ... via ... mtu 1500); first argument is IP range, second argument is gateway
Note, some customers report that this was resolved by correcting DNS or IPv6 issues on their side. For example:
When DNS responses are more than 512 bytes, DNS is sent on TCP. If any TCP ports have been blocked, this results in an issue for the Agent. Checking for similar communication restrictions assists in troubleshooting Agent communication issues. If DNS is the culprit, you’ll see the following error in your
gaierror: (-2, ' Name of service not known ')
If your system or network doesn’t support DNS over TCP, disabling IPv6 may help to reduce DNS message sizes and allow the use of UDP.
For disabling IPV6, reference the following article:
Some customers experience these 599 tornado errors only when their Datadog Agent uses the default “Simple HTTP” tornado client. It can sometimes help to switch this to the cURL client instead. This can be done from the
datadog.yaml on this line.
For troubleshooting the same MTU issues described above, on Windows, reference this blog:
If you’ve done everything above and continue to have issues, send email@example.com the following information:
Additional helpful documentation, links, and articles: