Data Streams Monitoring

Data Streams Monitoring is not available for the site.

Datadog Data Streams Monitoring

Data Streams Monitoring provides a standardized method for teams to understand and manage pipelines at scale by making it easy to:

  • Measure pipeline health with end-to-end latencies for events traversing across your system.
  • Pinpoint faulty producers, consumers or queues, then pivot to related logs or clusters to troubleshoot faster.
  • Prevent cascading delays by equipping service owners to stop backed up events from overwhelming downstream services.

Setup

To get started, follow the installation instructions to configure services with Data Streams Monitoring:

java
Go
.NET

RuntimeSupported technologies
JavaKafka (self-hosted, Amazon MSK, Confluent Cloud / Platform), RabbitMQ, HTTP, gRPC
GoAll (with manual instrumentation)
.NETKafka (self-hosted, Amazon MSK, Confluent Cloud / Platform), RabbitMQ

Explore Data Streams Monitoring

Measure end-to-end pipeline health with new metrics

Once Data Streams Monitoring is configured, you can measure the time it usually takes for events to traverse between any two points in your asynchronous system:

Metric NameNotable TagsDescription
data_streams.latencystart, end, envEnd to end latency of a pathway from a specified source to destination service
data_streams.kafka.lag_secondsconsumer_group, partition, topic, envLag in seconds between producer and consumer. Requires Java Agent v1.9.0 or later.

You can also graph and visualize these metrics on any dashboard or notebook:

Datadog Data Streams Monitoring monitor

Monitor end-to-end latency of any pathway

Depending on how events traverse through your system, different paths can lead to increased latency. With the Pathways tab, you can view latency between any two points throughout your pipelines, including queues, producers, and consumers to identify bottlenecks and optimize performance. Easily create a monitor for a pathway, or export to a dashboard.

Datadog Data Streams Monitoring Pathway tab

Attribute incoming messages to any queue, service, or cluster

High lag on a consuming service, increased resource use on a Kafka broker and increased RabbitMQ queue size are frequently explained by changes in the way adjacent services are producing to or consuming from these entities.

Click on the Throughput tab on any service or queue in Data Streams Monitoring to quickly detect changes in throughput, and which upstream or downstream service these changes originate from. Once the Service Catalog is configured, you can immediately pivot to the corresponding team’s Slack channel or on-call engineer.

By filtering to a single Kafka or RabbitMQ cluster, you can detect changes in incoming or outgoing traffic for all detected topics or queues running on that cluster:

Datadog Data Streams Monitoring

Quickly pivot to identify root causes in infrastructure, logs, or traces

Datadog automatically links the infrastructure powering your services and related logs through Unified Service Tagging, so you can easily localize bottlenecks. Click the Infra or Logs tabs to further troubleshoot why pathway latency or consumer lag has increased. To view traces within your pathways, click the Processing Latency tab.

Datadog Data Streams Monitoring Infra tab

Further Reading