Watchdog is an algorithmic feature for APM that automatically detects potential application and infrastructure issues. Watchdog observes trends and patterns in application metrics—like error rate, request rate, and latency—and unexpected behavior. Watchdog evaluates all services and resources without the need to configure a monitor for each service.
Watchdog looks for irregularities in metrics, like a sudden spike in hit rate. For each irregularity, the Watchdog page displays a Watchdog story. Each story includes a graph of the detected metric irregularity and gives more information about the relevant timeframe and endpoint or endpoints. To avoid false alarms, Watchdog only reports issues after observing your data for a sufficient amount of time to establish a high degree of confidence.
Stories can be filtered by environment and availability zone, as well as by the type of service or resource. Typing in the “Filter stories” search box also allows user to filter stories by service or resource name.
Clicking on the Story shows further details about requests, errors, and latency at the time of the detected irregularity.
Selecting Show expected bounds in the corner reveals upper and lower thresholds of expected behavior on the graph.
Facets are listed in the left panel. Use these to filter Watchdog stories by different categories (e.g.
availability zone, etc.) and see the number of stories in each facet category.
When an irregularity in a metric is detected, the yellow Watchdog binoculars icon appears next to the affected service in the APM Services List. The number next to the binoculars indicates the number of issues Watchdog has noticed within that service.
If Watchdog has discovered something out of the ordinary in a specific service, viewing the corresponding Service page reveals a dedicated Watchdog section in the middle of the page, between the application performance graphs and the latency distribution section. The Watchdog section displays any relevant Watchdog Stories.