El desorden de monitor (noun) se acumula con el tiempo, dando lugar a ruido, alertas duplicadas y una mayor fricción operativa. En esta guía se describe un enfoque claro para identificar y limpiar los monitores desordenados, con casos de uso que te ayudarán a agilizar tus procesos de alertas.
También se proporcionan prácticas recomendadas para ayudar a mantener un entorno de monitorización limpio, lo que facilita la escalabilidad y el gobierno de tu estrategia de monitorización a medida que crecen tus sistemas.
Esta guía cubre varios casos de uso clave para limpiar el desorden de monitor (noun):
Los monitores sirven como sistema de alerta temprana de fallos, amenazas a la seguridad y problemas de rendimiento. Sin embargo, tener monitores silenciados durante un largo periodo de tiempo frustra ese objetivo, el silenciamiento prolongado a menudo indica que un monitor (noun) es obsoleto, irrelevante o demasiado ruidoso para ser útil. Deberían revisarse y volver a activarse con el ajuste adecuado o retirarse para reducir el desorden y eliminar los monitores obsoletos de tu entorno de alerta.
Elimina los monitores que no aporten valor y sustituye los silenciamientos prolongados por programaciones temporales:
Audita los monitores que han estado silenciados durante un largo periodo de tiempo para saber cuáles son realmente necesarios o útiles. Puede que algunos monitores estén silenciados por un buen motivo y quieras evitar eliminarlos.
Después de tener tu lista, puedes realizar una acción en cada monitor (noun) de la page (página) Calidad de monitor (noun) o hacer una eliminación masiva de monitores con los pasos 2 y 3.
Obtén una lista de tus identificadores de monitor (noun) para automatizar los cambios mediante programación. Empieza por los monitores que llevan silenciados más de 60 días.
Esto te proporciona los detalles de tus monitores en un archivo CSV para facilitar tu lectura. Puedes refinar la consulta para tu caso de uso específico.
Con tu lista de monitores que han estado silenciados durante más de 60 días (del step (UI) / paso (generic) 2), puedes borrarlos con el siguiente script. Antes de ejecutar el script, pon la columna de identificador de monitor (noun) primero en la tabla.
Las alertas persistentes sugieren uno de dos problemas: o bien el problema no es procesable o bien el umbral de monitor (noun) está mal configurado. Ambos casos erosionan la confianza en las alertas y contribuyen a la fatiga de las alertas. Estos monitores deben revisarse y editarse o eliminarse.
A continuación se indica cómo obtener la lista de monitores que han estado en estado de ALERTA durante más de 60 días:
La creación de monitores separados que solo se diferencian por una tag (etiqueta), puede dar lugar a una duplicación innecesaria. Por ejemplo, monitorizar el uso de la CPU con un monitor (noun) para prod y otro para staging aumenta el número de monitores (noun).
Los monitores redundantes crean ruido y confusión innecesarios. En muchos casos, pueden consolidarse en un único monitor (noun) multialerta con el alcance y el etiquetado adecuados, lo que reduce la duplicación y hace que las alertas sean más manejables.
Si necesitas enviar notificaciones diferentes en función del valor de la tag (etiqueta) que activó la alerta, utiliza variables de monitor (noun) para personalizar dinámicamente el mensaje en función de la tag (etiqueta) que superó el umbral.
Los monitores ruidosos insensibilizan a los equipos ante los problemas reales. El aleteo (cuando un monitor (noun) cambia con frecuencia entre los estados de alerta y recuperación) suele indicar umbrales inestables, retrasos en la evaluación faltantes o volatilidad subyacente del sistema.
Para reducir el ruido, revisa la agregación de evaluación del monitor (noun) y la configuración del umbral. Ajusta los parámetros para estabilizar el comportamiento de la alerta o elimina el monitor (noun) si ya no aporta valor.
A continuación se explica cómo obtener una lista de los monitores que están generando un alto volumen de alertas:
Los monitores mal configurados son monitores activos que pueden tener un uso adecuado, pero son ineficaces porque no se te notifica. Estas configuraciones erróneas socavan la fiabilidad del monitor (noun) y dificultan la depuración o la clasificación. Limpiarlos garantiza que las alertas sean precisas, procesables e integradas en los procesos de observabilidad.
A continuación se indica cómo obtener la lista de monitores que tienen asas mal configuradas:
Este problema afecta principalmente a los monitores basados en métricas de AWS. Dado que Datadog recupera las métricas de AWS a través de la API, a menudo se produce un retraso antes de que los datos estén disponibles. Si no se tienes esto en cuenta, los monitores pueden activar falsos positivos debido a datos incompletos o retrasados.
A continuación se indica cómo obtener la lista de monitores a los que les falta un retraso:
Los monitores composite (compuesto) evalúan su estado basándose en la combinación lógica de dos o más monitores (llamados constituyentes). Si alguno de esos monitores constituyentes se elimina o deja de estar disponible, el monitor (noun) composite (compuesto) deja de ser válido o fiable.
Un componente faltante significa normalmente que al menos uno de los monitores de entrada originales se ha eliminado después de que se creara el monitor (noun) composite (compuesto). Esto hace que el composite (compuesto) esté incompleto y pueda inducir a error en el comportamiento de alerta.
Para obtener mediante programación la lista de monitores a los que les faltan componentes:
Para ayudarte a empezar, importa la siguiente definición del dashboard de JSON directamente a tu cuenta de Datadog.
{
"title": "Monitor Quality OOTB Dashboard",
"description": "",
"widgets": [
{
"id": 8853380235542346,
"definition": {
"type": "note",
"content": "This Monitor Quality dashboard provides a comprehensive view of monitor quality metrics, broken down by `team` and `service`. Its goal is to help you easily analyze and act on monitor quality data, enabling you to schedule reports, download insights as PDFs, and more.\n\n**Key Features:**\n- Team and Service Views: You can filter the dashboard either by team or by service, but not both simultaneously. If you filter by `team`, refer to the [Team Section](https://app.datadoghq.com/dashboard/u7b-4n7-gn5/monitor-quality-ootb-dashboard?fromUser=false&refresh_mode=paused&from_ts=1732107838741&to_ts=1732280638741&live=false&tile_focus=4548404374449802) for relevant insights. If you filter by `service`, explore the [Service Section](https://app.datadoghq.com/dashboard/u7b-4n7-gn5/monitor-quality-ootb-dashboard?fromUser=false&refresh_mode=paused&from_ts=1732107865224&to_ts=1732280665224&live=false&tile_focus=2841959907422822) for detailed information.\n- Monitor-Level Details: For a deeper dive into specific impacted monitors, navigate to the [Monitor Quality page](https://app.datadoghq.com/monitors/quality).\n- Seamless Navigation: Use the context links provided in the dashboard to jump directly to the [Monitor Quality page](https://app.datadoghq.com/monitors/quality), pre-filtered with the same criteria you've applied on the dashboard.\n\nThis dashboard is designed to give you both a high-level overview and actionable paths to improve your monitoring posture.",
"background_color": "white",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 0, "width": 12, "height": 3 }
},
{
"id": 4548404374449802,
"definition": {
"title": "General overview - by team",
"background_color": "blue",
"show_title": true,
"type": "group",
"layout_type": "ordered",
"widgets": [
{
"id": 2449119265341574,
"definition": {
"type": "note",
"content": "This section is powered by the `datadog.monitor.suggested_monitor_health_by_team` metric, which is emitted daily.\n\nThe monitor counts reported in this metric exclude synthetic monitors.\n\nThese counts represent the total number of suggestions for monitor quality improvements, broken down by team.\n\nUse the `team` filter to view insights specific to your team.\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "white",
"font_size": "14",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 0, "width": 5, "height": 4 }
},
{
"id": 3001209940385798,
"definition": {
"title": "Distribution of Quality Improvements by Type",
"title_size": "16",
"title_align": "left",
"time": { "hide_incomplete_cost_data": true },
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{$team,$service} by {suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"style": { "palette": "datadog16" },
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 500,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"type": "sunburst",
"hide_total": false,
"legend": { "type": "automatic" },
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$team}}"
}
]
},
"layout": { "x": 5, "y": 0, "width": 7, "height": 4 }
},
{
"id": 498569597362654,
"definition": {
"title": "Evolution of Quality Improvements by Type over Time",
"title_size": "16",
"title_align": "left",
"show_legend": false,
"legend_layout": "auto",
"legend_columns": ["avg", "min", "max", "value", "sum"],
"time": { "hide_incomplete_cost_data": true },
"type": "timeseries",
"requests": [
{
"formulas": [{ "formula": "query1" }],
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{$team,$service} by {suggestion_type}"
}
],
"response_format": "timeseries",
"style": {
"palette": "datadog16",
"order_by": "values",
"line_type": "solid",
"line_width": "normal"
},
"display_type": "line"
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$team}}"
}
]
},
"layout": { "x": 0, "y": 4, "width": 12, "height": 4 }
},
{
"id": 1376609088194674,
"definition": {
"title": "Top Teams Impacted",
"title_size": "16",
"title_align": "left",
"time": { "hide_incomplete_cost_data": true },
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" }
}
},
"layout": { "x": 0, "y": 8, "width": 12, "height": 4 }
},
{
"id": 718136447073638,
"definition": {
"type": "note",
"content": "Monitors with Missing Recipients per Team",
"background_color": "vivid_blue",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 12, "width": 6, "height": 1 }
},
{
"id": 2393792996475864,
"definition": {
"type": "note",
"content": "Monitors with Broken Handles per Team",
"background_color": "vivid_green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 12, "width": 6, "height": 1 }
},
{
"id": 4443082314028290,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- no notification handle found in monitor body\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 13, "width": 6, "height": 2 }
},
{
"id": 3954366540293996,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- notification handle is not valid\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 13, "width": 6, "height": 2 }
},
{
"id": 2546970864549118,
"definition": {
"title": "Monitors with Missing Recipients per Team",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:missing_at_handle,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "blue"
}
},
"layout": { "x": 0, "y": 15, "width": 6, "height": 5 }
},
{
"id": 3744392131942638,
"definition": {
"title": "Monitors with Broken Handles per Team",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:broken_at_handle,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "green"
}
},
"layout": { "x": 6, "y": 15, "width": 6, "height": 5 }
},
{
"id": 2751217590574740,
"definition": {
"type": "note",
"content": "Monitors Muted for Too Long",
"background_color": "purple",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 20, "width": 6, "height": 1 }
},
{
"id": 5158165900159898,
"definition": {
"type": "note",
"content": "Monitors Generating a High Volume of Alerts",
"background_color": "green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 20, "width": 6, "height": 1 }
},
{
"id": 8032070484951580,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been muted for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 21, "width": 6, "height": 2 }
},
{
"id": 4153429942317530,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor generates the top 5% of alerts over the past 10 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 21, "width": 6, "height": 2 }
},
{
"id": 4158897740932848,
"definition": {
"title": "Monitors Muted for Too Long",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:muted_duration_over_sixty_days,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "semantic"
}
},
"layout": { "x": 0, "y": 23, "width": 6, "height": 5 }
},
{
"id": 5392245250417816,
"definition": {
"title": "Monitors Generating a High Volume of Alerts",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:noisy_monitor,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": { "display": { "type": "stacked" }, "palette": "grey" }
},
"layout": { "x": 6, "y": 23, "width": 6, "height": 5 }
},
{
"id": 1271026446632020,
"definition": {
"type": "note",
"content": "Monitors Stuck in Alert State",
"background_color": "vivid_yellow",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 28, "width": 6, "height": 1 }
},
{
"id": 6315895116466318,
"definition": {
"type": "note",
"content": "Composite Monitors have Deleted Components",
"background_color": "gray",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 28, "width": 6, "height": 1 }
},
{
"id": 8251226565664096,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been alerting for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 29, "width": 6, "height": 2 }
},
{
"id": 1329067816249636,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor is a composite one and has deleted components\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 29, "width": 6, "height": 2 }
},
{
"id": 7052384595427880,
"definition": {
"title": "Monitors Stuck in Alert State",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:alerted_too_long,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "orange"
}
},
"layout": { "x": 0, "y": 31, "width": 6, "height": 5 }
},
{
"id": 2768363536962548,
"definition": {
"title": "Composite Monitors have Deleted Components",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:composite_has_deleted_constituents ,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "datadog16"
}
},
"layout": { "x": 6, "y": 31, "width": 6, "height": 5 }
}
]
},
"layout": { "x": 0, "y": 3, "width": 12, "height": 37 }
},
{
"id": 2841959907422822,
"definition": {
"title": "General overview - by service",
"background_color": "pink",
"show_title": true,
"type": "group",
"layout_type": "ordered",
"widgets": [
{
"id": 3801590205295194,
"definition": {
"type": "note",
"content": "This section is powered by the `datadog.monitor.suggested_monitor_health_by_service` metric, which is emitted daily.\n\nThe monitor counts reported in this metric exclude synthetic monitors.\n\nThese counts represent the total number of suggestions for monitor quality improvements, broken down by service.\n\nUse the `service` filter to view insights specific to your team.\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "white",
"font_size": "14",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 0, "width": 5, "height": 4 }
},
{
"id": 8418200284207718,
"definition": {
"title": "Distribution of Quality Improvements by Type",
"title_size": "16",
"title_align": "left",
"time": { "hide_incomplete_cost_data": true },
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{$team,$service} by {suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"style": { "palette": "datadog16" },
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 500,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"type": "sunburst",
"hide_total": false,
"legend": { "type": "automatic" },
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$service}}"
}
]
},
"layout": { "x": 5, "y": 0, "width": 7, "height": 4 }
},
{
"id": 8281740697966220,
"definition": {
"title": "Evolution of Quality Improvements by Type over Time",
"title_size": "16",
"title_align": "left",
"show_legend": false,
"legend_layout": "auto",
"legend_columns": ["avg", "min", "max", "value", "sum"],
"time": { "hide_incomplete_cost_data": true },
"type": "timeseries",
"requests": [
{
"formulas": [{ "formula": "query1" }],
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{$team, $service} by {suggestion_type}"
}
],
"response_format": "timeseries",
"style": {
"palette": "datadog16",
"order_by": "values",
"line_type": "solid",
"line_width": "normal"
},
"display_type": "line"
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$service}}"
}
]
},
"layout": { "x": 0, "y": 4, "width": 12, "height": 4 }
},
{
"id": 5048429332292860,
"definition": {
"title": "Top services impacted",
"title_size": "16",
"title_align": "left",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" }
}
},
"layout": { "x": 0, "y": 8, "width": 12, "height": 5 }
},
{
"id": 2233801928907094,
"definition": {
"type": "note",
"content": "Monitors with Missing Recipients per Service",
"background_color": "vivid_blue",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 13, "width": 6, "height": 1 }
},
{
"id": 7329031300309162,
"definition": {
"type": "note",
"content": "Monitors with Broken Handles per Service",
"background_color": "vivid_green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 13, "width": 6, "height": 1 }
},
{
"id": 7627510169738418,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- no notification handle found in monitor body\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 14, "width": 6, "height": 2 }
},
{
"id": 2826082028591748,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- notification handle is not valid\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 14, "width": 6, "height": 2 }
},
{
"id": 5050954942402816,
"definition": {
"title": "Monitors with Missing Recipients per Service",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:missing_at_handle,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "blue"
}
},
"layout": { "x": 0, "y": 16, "width": 6, "height": 5 }
},
{
"id": 7809748805807956,
"definition": {
"title": "Monitors with Broken Handles per Service",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:broken_at_handle,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "green"
}
},
"layout": { "x": 6, "y": 16, "width": 6, "height": 5 }
},
{
"id": 8416588682594596,
"definition": {
"type": "note",
"content": "Monitors Muted for Too Long",
"background_color": "purple",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 21, "width": 6, "height": 1 }
},
{
"id": 4951606729784970,
"definition": {
"type": "note",
"content": "Monitors Generating a High Volume of Alerts",
"background_color": "green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 21, "width": 6, "height": 1 }
},
{
"id": 1778359756038190,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been muted for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 22, "width": 6, "height": 2 }
},
{
"id": 8559060613933804,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor generates the top 5% of alerts over the past 10 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 22, "width": 6, "height": 2 }
},
{
"id": 7041249940897320,
"definition": {
"title": "Monitors Muted for Too Long",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:muted_duration_over_sixty_days,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "semantic"
}
},
"layout": { "x": 0, "y": 24, "width": 6, "height": 5 }
},
{
"id": 7810615049061724,
"definition": {
"title": "Monitors Generating a High Volume of Alerts",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:noisy_monitor,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "grey"
}
},
"layout": { "x": 6, "y": 24, "width": 6, "height": 5 }
},
{
"id": 5108940190121326,
"definition": {
"type": "note",
"content": "Monitors Stuck in Alert State",
"background_color": "vivid_yellow",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 29, "width": 6, "height": 1 }
},
{
"id": 4931941666409286,
"definition": {
"type": "note",
"content": "Composite Monitors have Deleted Components",
"background_color": "gray",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 29, "width": 6, "height": 1 }
},
{
"id": 6520923360190496,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been alerting for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 30, "width": 6, "height": 2 }
},
{
"id": 1364025765104008,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor is a composite one and has deleted components\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 30, "width": 6, "height": 2 }
},
{
"id": 3670188762233230,
"definition": {
"title": "Monitors Stuck in Alert State",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:alerted_too_long,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "orange"
}
},
"layout": { "x": 0, "y": 32, "width": 6, "height": 5 }
},
{
"id": 9006201303765196,
"definition": {
"title": "Composite Monitors have Deleted Components",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:alerted_too_long,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "datadog16"
}
},
"layout": { "x": 6, "y": 32, "width": 6, "height": 5 }
}
]
},
"layout": {
"x": 0,
"y": 40,
"width": 12,
"height": 38,
"is_column_break": true
}
}
],
"template_variables": [
{
"name": "team",
"prefix": "team",
"available_values": [],
"default": "*"
},
{
"name": "service",
"prefix": "service",
"available_values": [],
"default": "*"
}
],
"layout_type": "ordered",
"notify_list": [],
"reflow_type": "fixed"
}