El desorden de seguimientos se acumula con el tiempo, resultando en ruido, alertas duplicadas y un aumento de la fricción operativa. Esta guía describe un enfoque claro para identificar y limpiar seguimientos desordenados, con casos de uso que le ayudarán a optimizar sus flujos de trabajo de alertas.
También proporciona mejores prácticas para ayudar a mantener un entorno de seguimiento limpio, facilitando la escalabilidad y la gobernanza de su estrategia de seguimiento a medida que sus sistemas crecen.
Esta guía cubre varios casos de uso clave para limpiar el desorden de seguimientos:
Los seguimientos sirven como un sistema de alerta temprana para fallas, amenazas de seguridad y problemas de rendimiento. Sin embargo, tener los seguimientos silenciados durante un largo período de tiempo anula ese propósito; el silencio prolongado a menudo indica que un seguimiento es obsoleto, irrelevante o demasiado ruidoso para ser útil. Estos deben ser revisados y reactivados con la configuración adecuada o retirados para reducir el desorden y eliminar seguimientos obsoletos de su entorno de alertas.
Limpie los seguimientos que no están proporcionando valor y reemplace los silencios prolongados con horarios limitados en el tiempo:
Audite los seguimientos que han estado silenciados durante un largo período de tiempo para entender cuáles son realmente necesarios o útiles. Algunos seguimientos pueden estar silenciados por una buena razón y desea evitar eliminarlos.
Después de obtener su lista, puede tomar acción en cada seguimiento desde la página de [Calidad del seguimiento] o realizar una eliminación masiva de seguimientos mediante los pasos 2 y 3.
Obtenga una lista de sus ID de seguimientos para automatizar los cambios. Comience con los seguimientos que han estado silenciados por más de 60 días.
Esto le proporciona los detalles de sus seguimientos en un archivo CSV para facilitar la lectura. Puede refinar la consulta para su caso de uso específico.
Con su lista de seguimientos que han estado silenciados por más de 60 días (del Paso 2), puede eliminarlos con el siguiente script. Antes de ejecutar el script, coloque la columna de ID de seguimiento primero en la tabla.
Las alertas persistentes sugieren uno de dos problemas: o el problema no es accionable, o el umbral del seguimiento está mal configurado. Ambos casos erosionan la confianza en las alertas y contribuyen a la fatiga de alertas. Estos seguimientos deben ser revisados y editados, o eliminados.
Aquí está cómo obtener la lista de seguimientos que han estado en estado de ALERTA por más de 60 días:
Crear seguimientos separados que solo difieran por una etiqueta puede llevar a una duplicación innecesaria. Por ejemplo, monitorear el uso de CPU con un seguimiento para prod y otro para staging aumenta el conteo de seguimientos.
Los seguimientos redundantes crean ruido y confusión innecesarios. En muchos casos, estos pueden ser consolidados en un solo [multi-alerta] con un contexto adecuado y etiquetado, reduciendo la duplicación y haciendo las alertas más manejables.
Si necesitas enviar diferentes notificaciones dependiendo del valor de la etiqueta que activó la alerta, utiliza variables de monitor para personalizar dinámicamente el mensaje basado en la etiqueta que superó el umbral.
Los seguimientos ruidosos desensibilizan a los equipos ante problemas reales. El parpadeo (cuando un seguimiento cambia frecuentemente entre estados de alerta y recuperación) a menudo indica umbrales inestables, retrasos de evaluación faltantes o volatilidad subyacente del sistema.
Para reducir el ruido, revise la agregación de evaluación del seguimiento y la configuración del umbral. Ajuste la configuración para estabilizar el comportamiento de las alertas, o elimine el seguimiento si ya no proporciona valor.
Aquí se explica cómo obtener una lista de seguimientos que están generando un alto volumen de alertas:
Los seguimientos mal configurados son seguimientos activos que pueden tener un uso adecuado, pero son ineficientes porque no recibirá notificaciones. Estas malas configuraciones socavan la fiabilidad del seguimiento y dificultan la depuración o el triage. Corregir estas configuraciones garantiza que sus alertas sean precisas, accionables e integradas en sus flujos de trabajo de observabilidad.
Aquí se explica cómo obtener la lista de seguimientos que tienen identificadores mal configurados:
Este problema afecta principalmente a los seguimientos basados en métricas de AWS. Debido a que Datadog recupera métricas de AWS a través de la API, a menudo hay un retraso incorporado antes de que los datos estén disponibles. Si no tiene en cuenta esto, los seguimientos pueden activar falsos positivos debido a datos incompletos o retrasados.
Los seguimientos compuestos evalúan su estado en función de la combinación lógica de dos o más seguimientos (llamados constituyentes). Si alguno de esos seguimientos constituyentes es eliminado o se vuelve no disponible, el seguimiento compuesto se vuelve inválido o poco confiable.
Un constituyente faltante generalmente significa que al menos uno de los seguimientos de entrada originales ha sido eliminado después de que se creó el seguimiento compuesto. Esto provoca que el seguimiento compuesto esté incompleto y potencialmente engañoso en el comportamiento de alerta.
Para obtener programáticamente la lista de seguimientos que carecen de constituyentes:
Para ayudarte a comenzar, importa la siguiente definición de tablero JSON directamente en tu cuenta de Datadog.
{
"title": "Monitor Quality OOTB Dashboard",
"description": "",
"widgets": [
{
"id": 8853380235542346,
"definition": {
"type": "note",
"content": "This Monitor Quality dashboard provides a comprehensive view of monitor quality metrics, broken down by `team` and `service`. Its goal is to help you easily analyze and act on monitor quality data, enabling you to schedule reports, download insights as PDFs, and more.\n\n**Key Features:**\n- Team and Service Views: You can filter the dashboard either by team or by service, but not both simultaneously. If you filter by `team`, refer to the [Team Section](https://app.datadoghq.com/dashboard/u7b-4n7-gn5/monitor-quality-ootb-dashboard?fromUser=false&refresh_mode=paused&from_ts=1732107838741&to_ts=1732280638741&live=false&tile_focus=4548404374449802) for relevant insights. If you filter by `service`, explore the [Service Section](https://app.datadoghq.com/dashboard/u7b-4n7-gn5/monitor-quality-ootb-dashboard?fromUser=false&refresh_mode=paused&from_ts=1732107865224&to_ts=1732280665224&live=false&tile_focus=2841959907422822) for detailed information.\n- Monitor-Level Details: For a deeper dive into specific impacted monitors, navigate to the [Monitor Quality page](https://app.datadoghq.com/monitors/quality).\n- Seamless Navigation: Use the context links provided in the dashboard to jump directly to the [Monitor Quality page](https://app.datadoghq.com/monitors/quality), pre-filtered with the same criteria you've applied on the dashboard.\n\nThis dashboard is designed to give you both a high-level overview and actionable paths to improve your monitoring posture.",
"background_color": "white",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 0, "width": 12, "height": 3 }
},
{
"id": 4548404374449802,
"definition": {
"title": "General overview - by team",
"background_color": "blue",
"show_title": true,
"type": "group",
"layout_type": "ordered",
"widgets": [
{
"id": 2449119265341574,
"definition": {
"type": "note",
"content": "This section is powered by the `datadog.monitor.suggested_monitor_health_by_team` metric, which is emitted daily.\n\nThe monitor counts reported in this metric exclude synthetic monitors.\n\nThese counts represent the total number of suggestions for monitor quality improvements, broken down by team.\n\nUse the `team` filter to view insights specific to your team.\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "white",
"font_size": "14",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 0, "width": 5, "height": 4 }
},
{
"id": 3001209940385798,
"definition": {
"title": "Distribution of Quality Improvements by Type",
"title_size": "16",
"title_align": "left",
"time": { "hide_incomplete_cost_data": true },
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{$team,$service} by {suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"style": { "palette": "datadog16" },
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 500,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"type": "sunburst",
"hide_total": false,
"legend": { "type": "automatic" },
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$team}}"
}
]
},
"layout": { "x": 5, "y": 0, "width": 7, "height": 4 }
},
{
"id": 498569597362654,
"definition": {
"title": "Evolution of Quality Improvements by Type over Time",
"title_size": "16",
"title_align": "left",
"show_legend": false,
"legend_layout": "auto",
"legend_columns": ["avg", "min", "max", "value", "sum"],
"time": { "hide_incomplete_cost_data": true },
"type": "timeseries",
"requests": [
{
"formulas": [{ "formula": "query1" }],
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{$team,$service} by {suggestion_type}"
}
],
"response_format": "timeseries",
"style": {
"palette": "datadog16",
"order_by": "values",
"line_type": "solid",
"line_width": "normal"
},
"display_type": "line"
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$team}}"
}
]
},
"layout": { "x": 0, "y": 4, "width": 12, "height": 4 }
},
{
"id": 1376609088194674,
"definition": {
"title": "Top Teams Impacted",
"title_size": "16",
"title_align": "left",
"time": { "hide_incomplete_cost_data": true },
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" }
}
},
"layout": { "x": 0, "y": 8, "width": 12, "height": 4 }
},
{
"id": 718136447073638,
"definition": {
"type": "note",
"content": "Monitors with Missing Recipients per Team",
"background_color": "vivid_blue",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 12, "width": 6, "height": 1 }
},
{
"id": 2393792996475864,
"definition": {
"type": "note",
"content": "Monitors with Broken Handles per Team",
"background_color": "vivid_green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 12, "width": 6, "height": 1 }
},
{
"id": 4443082314028290,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- no notification handle found in monitor body\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 13, "width": 6, "height": 2 }
},
{
"id": 3954366540293996,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- notification handle is not valid\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 13, "width": 6, "height": 2 }
},
{
"id": 2546970864549118,
"definition": {
"title": "Monitors with Missing Recipients per Team",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:missing_at_handle,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "blue"
}
},
"layout": { "x": 0, "y": 15, "width": 6, "height": 5 }
},
{
"id": 3744392131942638,
"definition": {
"title": "Monitors with Broken Handles per Team",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:broken_at_handle,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "green"
}
},
"layout": { "x": 6, "y": 15, "width": 6, "height": 5 }
},
{
"id": 2751217590574740,
"definition": {
"type": "note",
"content": "Monitors Muted for Too Long",
"background_color": "purple",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 20, "width": 6, "height": 1 }
},
{
"id": 5158165900159898,
"definition": {
"type": "note",
"content": "Monitors Generating a High Volume of Alerts",
"background_color": "green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 20, "width": 6, "height": 1 }
},
{
"id": 8032070484951580,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been muted for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 21, "width": 6, "height": 2 }
},
{
"id": 4153429942317530,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor generates the top 5% of alerts over the past 10 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 21, "width": 6, "height": 2 }
},
{
"id": 4158897740932848,
"definition": {
"title": "Monitors Muted for Too Long",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:muted_duration_over_sixty_days,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "semantic"
}
},
"layout": { "x": 0, "y": 23, "width": 6, "height": 5 }
},
{
"id": 5392245250417816,
"definition": {
"title": "Monitors Generating a High Volume of Alerts",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:noisy_monitor,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": { "display": { "type": "stacked" }, "palette": "grey" }
},
"layout": { "x": 6, "y": 23, "width": 6, "height": 5 }
},
{
"id": 1271026446632020,
"definition": {
"type": "note",
"content": "Monitors Stuck in Alert State",
"background_color": "vivid_yellow",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 28, "width": 6, "height": 1 }
},
{
"id": 6315895116466318,
"definition": {
"type": "note",
"content": "Composite Monitors have Deleted Components",
"background_color": "gray",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 28, "width": 6, "height": 1 }
},
{
"id": 8251226565664096,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been alerting for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 29, "width": 6, "height": 2 }
},
{
"id": 1329067816249636,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor is a composite one and has deleted components\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 29, "width": 6, "height": 2 }
},
{
"id": 7052384595427880,
"definition": {
"title": "Monitors Stuck in Alert State",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:alerted_too_long,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "orange"
}
},
"layout": { "x": 0, "y": 31, "width": 6, "height": 5 }
},
{
"id": 2768363536962548,
"definition": {
"title": "Composite Monitors have Deleted Components",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:composite_has_deleted_constituents ,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "datadog16"
}
},
"layout": { "x": 6, "y": 31, "width": 6, "height": 5 }
}
]
},
"layout": { "x": 0, "y": 3, "width": 12, "height": 37 }
},
{
"id": 2841959907422822,
"definition": {
"title": "General overview - by service",
"background_color": "pink",
"show_title": true,
"type": "group",
"layout_type": "ordered",
"widgets": [
{
"id": 3801590205295194,
"definition": {
"type": "note",
"content": "This section is powered by the `datadog.monitor.suggested_monitor_health_by_service` metric, which is emitted daily.\n\nThe monitor counts reported in this metric exclude synthetic monitors.\n\nThese counts represent the total number of suggestions for monitor quality improvements, broken down by service.\n\nUse the `service` filter to view insights specific to your team.\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "white",
"font_size": "14",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 0, "width": 5, "height": 4 }
},
{
"id": 8418200284207718,
"definition": {
"title": "Distribution of Quality Improvements by Type",
"title_size": "16",
"title_align": "left",
"time": { "hide_incomplete_cost_data": true },
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{$team,$service} by {suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"style": { "palette": "datadog16" },
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 500,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"type": "sunburst",
"hide_total": false,
"legend": { "type": "automatic" },
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$service}}"
}
]
},
"layout": { "x": 5, "y": 0, "width": 7, "height": 4 }
},
{
"id": 8281740697966220,
"definition": {
"title": "Evolution of Quality Improvements by Type over Time",
"title_size": "16",
"title_align": "left",
"show_legend": false,
"legend_layout": "auto",
"legend_columns": ["avg", "min", "max", "value", "sum"],
"time": { "hide_incomplete_cost_data": true },
"type": "timeseries",
"requests": [
{
"formulas": [{ "formula": "query1" }],
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{$team, $service} by {suggestion_type}"
}
],
"response_format": "timeseries",
"style": {
"palette": "datadog16",
"order_by": "values",
"line_type": "solid",
"line_width": "normal"
},
"display_type": "line"
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$service}}"
}
]
},
"layout": { "x": 0, "y": 4, "width": 12, "height": 4 }
},
{
"id": 5048429332292860,
"definition": {
"title": "Top services impacted",
"title_size": "16",
"title_align": "left",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" }
}
},
"layout": { "x": 0, "y": 8, "width": 12, "height": 5 }
},
{
"id": 2233801928907094,
"definition": {
"type": "note",
"content": "Monitors with Missing Recipients per Service",
"background_color": "vivid_blue",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 13, "width": 6, "height": 1 }
},
{
"id": 7329031300309162,
"definition": {
"type": "note",
"content": "Monitors with Broken Handles per Service",
"background_color": "vivid_green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 13, "width": 6, "height": 1 }
},
{
"id": 7627510169738418,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- no notification handle found in monitor body\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 14, "width": 6, "height": 2 }
},
{
"id": 2826082028591748,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- notification handle is not valid\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 14, "width": 6, "height": 2 }
},
{
"id": 5050954942402816,
"definition": {
"title": "Monitors with Missing Recipients per Service",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:missing_at_handle,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "blue"
}
},
"layout": { "x": 0, "y": 16, "width": 6, "height": 5 }
},
{
"id": 7809748805807956,
"definition": {
"title": "Monitors with Broken Handles per Service",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:broken_at_handle,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "green"
}
},
"layout": { "x": 6, "y": 16, "width": 6, "height": 5 }
},
{
"id": 8416588682594596,
"definition": {
"type": "note",
"content": "Monitors Muted for Too Long",
"background_color": "purple",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 21, "width": 6, "height": 1 }
},
{
"id": 4951606729784970,
"definition": {
"type": "note",
"content": "Monitors Generating a High Volume of Alerts",
"background_color": "green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 21, "width": 6, "height": 1 }
},
{
"id": 1778359756038190,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been muted for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 22, "width": 6, "height": 2 }
},
{
"id": 8559060613933804,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor generates the top 5% of alerts over the past 10 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 22, "width": 6, "height": 2 }
},
{
"id": 7041249940897320,
"definition": {
"title": "Monitors Muted for Too Long",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:muted_duration_over_sixty_days,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "semantic"
}
},
"layout": { "x": 0, "y": 24, "width": 6, "height": 5 }
},
{
"id": 7810615049061724,
"definition": {
"title": "Monitors Generating a High Volume of Alerts",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:noisy_monitor,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "grey"
}
},
"layout": { "x": 6, "y": 24, "width": 6, "height": 5 }
},
{
"id": 5108940190121326,
"definition": {
"type": "note",
"content": "Monitors Stuck in Alert State",
"background_color": "vivid_yellow",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 29, "width": 6, "height": 1 }
},
{
"id": 4931941666409286,
"definition": {
"type": "note",
"content": "Composite Monitors have Deleted Components",
"background_color": "gray",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 29, "width": 6, "height": 1 }
},
{
"id": 6520923360190496,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been alerting for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 30, "width": 6, "height": 2 }
},
{
"id": 1364025765104008,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor is a composite one and has deleted components\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 30, "width": 6, "height": 2 }
},
{
"id": 3670188762233230,
"definition": {
"title": "Monitors Stuck in Alert State",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:alerted_too_long,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "orange"
}
},
"layout": { "x": 0, "y": 32, "width": 6, "height": 5 }
},
{
"id": 9006201303765196,
"definition": {
"title": "Composite Monitors have Deleted Components",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:alerted_too_long,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "datadog16"
}
},
"layout": { "x": 6, "y": 32, "width": 6, "height": 5 }
}
]
},
"layout": {
"x": 0,
"y": 40,
"width": 12,
"height": 38,
"is_column_break": true
}
}
],
"template_variables": [
{
"name": "team",
"prefix": "team",
"available_values": [],
"default": "*"
},
{
"name": "service",
"prefix": "service",
"available_values": [],
"default": "*"
}
],
"layout_type": "ordered",
"notify_list": [],
"reflow_type": "fixed"
}