Error Tracking for Backend Services

Overview

The details of an issue in the Error Tracking Explorer

It is critical for your system’s health to consistently monitor the errors collected by Datadog. When there are many individual error events, it becomes hard to prioritize errors for troubleshooting.

Error Tracking simplifies debugging by grouping thousands of similar errors into a single issue. An issue is an aggregation of error data that provides insights such as

  • How many users have been impacted
  • When the error first occurred
  • Which commit probably caused the error

Error Tracking enables you to:

  • Track, triage, and debug fatal errors
  • Group similar errors into issues, so that you can more easily identify important errors and reduce noise
  • Set monitors on error tracking events, such as high error volume or new issues
  • Follow issues over time to know when they first started, if they are still ongoing, and how often they occur

Setup

Error Tracking is available for all languages supported by APM. It requires no additional SDK and no configuration changes.

Optionally, to see code snippets in your stack traces, set up the GitHub integration.

An inline code snippet in a stack trace

To get started with configuring your repository, see the Source Code Integration documentation.

Use span attributes to track error spans

The Datadog tracers collect errors through integrations and the manual instrumentation of your backend services’ source code. An error span must contain the error.stack, error.message, and error.type span attributes and belong to a complete trace to be tracked. If an error is reported multiple times within a service, only the top-most error is kept.

Flame graph with errors

Error Tracking computes a fingerprint for each error span it processes using the error type, the error message, and the frames that form the stack trace. Errors with the same fingerprint are grouped together and belong to the same issue. For more information, see the Trace Explorer documentation.

Control which errors are tracked

Error Tracking automatically processes all error spans, but you can control which errors are ingested and how they are managed:

  • Filter errors with inclusion and exclusion rules: Define rules to include or exclude errors based on attributes such as service, environment, or error type. See Manage Data Collection.
  • Set rate limits: Control the volume of errors ingested per day to manage costs. See Manage Data Collection.
  • Exclude specific issues: Mark recurring non-actionable issues as EXCLUDED to stop collecting them. See Issue States.
  • Filter entire traces: Prevent traces from being sent to Datadog (rather than filtering errors). See Ignoring Unwanted Resources in APM.

Examine issues to start troubleshooting or debugging

Error Tracking automatically categorizes errors into issues collected from your backend services in the Error Tracking Explorer. See the Error Tracking Explorer documentation for a tour of key features.

Issues created from APM include the distribution of impacted spans, the latest most relevant stack trace, span attributes, host tags, container tags, and metrics.

Further Reading