L’encombrement des moniteurs s’accumule avec le temps, engendrant du bruit, des alertes dupliquées et une friction opérationnelle accrue. Ce guide présente une approche claire pour identifier et nettoyer les moniteurs encombrés, avec des cas d’utilisation pour vous aider à rationaliser vos flux de travail d’alerte.
Il fournit également des meilleures pratiques pour maintenir un environnement de surveillance propre, facilitant ainsi l’évolutivité et la gouvernance de votre stratégie de surveillance à mesure que vos systèmes se développent.
Ce guide présente plusieurs cas d’utilisation clés pour éliminer l’encombrement des moniteurs :
Les moniteurs servent de système d’alerte précoce pour les pannes, les menaces de sécurité et les problèmes de performance. Cependant, avoir des moniteurs muets pendant une longue période va à l’encontre de cet objectif, un silence prolongé signale souvent qu’un moniteur est obsolète, non pertinent ou trop bruyant pour être utile. Ces moniteurs doivent être examinés et soit réactivés avec un réglage approprié, soit retirés pour réduire le désordre et éliminer les moniteurs obsolètes de votre environnement d’alerte.
Nettoyez les moniteurs qui ne fournissent pas de valeur et remplacez les silences à long terme par des horaires limités dans le temps :
Auditez les moniteurs qui ont été muets pendant une longue période pour comprendre lesquels sont réellement nécessaires ou utiles. Certains moniteurs peuvent être muets pour une bonne raison et vous souhaitez éviter de les supprimer.
Après avoir obtenu votre liste, vous pouvez soit agir sur chaque moniteur depuis la page Qualité des Moniteurs, soit procéder à une suppression en masse des moniteurs avec les étapes 2 et 3.
Obtenez une liste de vos ID de moniteurs pour automatiser les changements de manière programmatique. Commencez par les moniteurs qui ont été muets pendant plus de 60 jours.
Cela vous donne les détails de vos moniteurs dans un fichier CSV pour une meilleure lisibilité. Vous pouvez affiner la requête à votre cas d’utilisation spécifique.
Avec votre liste de moniteurs qui ont été muets pendant plus de 60 jours (de l’étape 2), vous pouvez les supprimer avec le script suivant. Avant d’exécuter le script, placez la colonne d’ID du moniteur en premier dans le tableau.
Des alertes persistantes suggèrent l’un des deux problèmes : soit le problème n’est pas actionnable, soit le seuil du moniteur est mal configuré. Dans les deux cas, cela érode la confiance dans les alertes et contribue à la fatigue des alertes. Ces moniteurs doivent être examinés et modifiés, ou supprimés.
Voici comment obtenir la liste des moniteurs qui ont été en état d’ALERT pendant plus de 60 jours :
Créer des moniteurs séparés qui ne diffèrent que par une étiquette peut entraîner une duplication inutile. Par exemple, surveiller l’utilisation du CPU avec un moniteur pour prod et un autre pour staging augmente votre nombre de moniteurs.
Les moniteurs redondants créent du bruit et de la confusion inutiles. Dans de nombreux cas, ceux-ci peuvent être consolidés en un seul moniteur multi-alert avec un bon cadrage et un bon étiquetage, réduisant la duplication et rendant les alertes plus gérables.
Si vous devez envoyer différentes notifications en fonction de la valeur de l’étiquette qui a déclenché l’alerte, utilisez les variables de moniteur pour personnaliser dynamiquement le message en fonction de l’étiquette qui a franchi le seuil.
Les moniteurs bruyants désensibilisent les équipes aux véritables problèmes. Le flapping (lorsqu’un moniteur passe fréquemment entre les états d’alerte et de récupération) indique souvent des seuils instables, des délais d’évaluation manquants ou une volatilité sous-jacente du système.
Pour réduire le bruit, examinez l’agrégation d’évaluation du moniteur et la configuration du seuil. Ajustez les paramètres pour stabiliser le comportement des alertes, ou supprimez le moniteur s’il n’apporte plus de valeur.
Voici comment obtenir une liste de moniteurs qui génèrent un volume élevé d’alertes :
Les moniteurs mal configurés sont des moniteurs actifs qui peuvent avoir une utilisation appropriée, mais qui sont inefficaces car vous ne serez pas averti. Ces mauvaises configurations compromettent la fiabilité du moniteur et rendent le débogage ou le triage plus difficiles. Les nettoyer garantit que vos alertes sont précises, exploitables et intégrées dans vos flux de travail d’observabilité.
Voici comment obtenir la liste des moniteurs qui ont des poignées mal configurées :
Ce problème impacte principalement les moniteurs basés sur les métriques AWS. Parce que Datadog récupère les métriques AWS via l’API, il y a souvent un délai intégré avant que les données ne soient disponibles. Si vous ne tenez pas compte de cela, les moniteurs peuvent déclencher de faux positifs en raison de données incomplètes ou retardées.
Les moniteurs composites évaluent leur état en fonction de la combinaison logique de deux moniteurs ou plus (appelés constituants). Si l’un de ces moniteurs constituants est supprimé ou devient indisponible, le moniteur composite devient invalide ou peu fiable.
Un constituant manquant signifie généralement qu’au moins un des moniteurs d’entrée d’origine a été supprimé après la création du moniteur composite. Cela rend le composite incomplet et potentiellement trompeur dans son comportement d’alerte.
Pour obtenir de manière programmatique la liste des moniteurs qui manquent des constituants :
Pour vous aider à démarrer, importez la définition de tableau de bord JSON suivante directement dans votre compte Datadog.
{
"title": "Monitor Quality OOTB Dashboard",
"description": "",
"widgets": [
{
"id": 8853380235542346,
"definition": {
"type": "note",
"content": "This Monitor Quality dashboard provides a comprehensive view of monitor quality metrics, broken down by `team` and `service`. Its goal is to help you easily analyze and act on monitor quality data, enabling you to schedule reports, download insights as PDFs, and more.\n\n**Key Features:**\n- Team and Service Views: You can filter the dashboard either by team or by service, but not both simultaneously. If you filter by `team`, refer to the [Team Section](https://app.datadoghq.com/dashboard/u7b-4n7-gn5/monitor-quality-ootb-dashboard?fromUser=false&refresh_mode=paused&from_ts=1732107838741&to_ts=1732280638741&live=false&tile_focus=4548404374449802) for relevant insights. If you filter by `service`, explore the [Service Section](https://app.datadoghq.com/dashboard/u7b-4n7-gn5/monitor-quality-ootb-dashboard?fromUser=false&refresh_mode=paused&from_ts=1732107865224&to_ts=1732280665224&live=false&tile_focus=2841959907422822) for detailed information.\n- Monitor-Level Details: For a deeper dive into specific impacted monitors, navigate to the [Monitor Quality page](https://app.datadoghq.com/monitors/quality).\n- Seamless Navigation: Use the context links provided in the dashboard to jump directly to the [Monitor Quality page](https://app.datadoghq.com/monitors/quality), pre-filtered with the same criteria you've applied on the dashboard.\n\nThis dashboard is designed to give you both a high-level overview and actionable paths to improve your monitoring posture.",
"background_color": "white",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 0, "width": 12, "height": 3 }
},
{
"id": 4548404374449802,
"definition": {
"title": "General overview - by team",
"background_color": "blue",
"show_title": true,
"type": "group",
"layout_type": "ordered",
"widgets": [
{
"id": 2449119265341574,
"definition": {
"type": "note",
"content": "This section is powered by the `datadog.monitor.suggested_monitor_health_by_team` metric, which is emitted daily.\n\nThe monitor counts reported in this metric exclude synthetic monitors.\n\nThese counts represent the total number of suggestions for monitor quality improvements, broken down by team.\n\nUse the `team` filter to view insights specific to your team.\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "white",
"font_size": "14",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 0, "width": 5, "height": 4 }
},
{
"id": 3001209940385798,
"definition": {
"title": "Distribution of Quality Improvements by Type",
"title_size": "16",
"title_align": "left",
"time": { "hide_incomplete_cost_data": true },
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{$team,$service} by {suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"style": { "palette": "datadog16" },
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 500,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"type": "sunburst",
"hide_total": false,
"legend": { "type": "automatic" },
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$team}}"
}
]
},
"layout": { "x": 5, "y": 0, "width": 7, "height": 4 }
},
{
"id": 498569597362654,
"definition": {
"title": "Evolution of Quality Improvements by Type over Time",
"title_size": "16",
"title_align": "left",
"show_legend": false,
"legend_layout": "auto",
"legend_columns": ["avg", "min", "max", "value", "sum"],
"time": { "hide_incomplete_cost_data": true },
"type": "timeseries",
"requests": [
{
"formulas": [{ "formula": "query1" }],
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{$team,$service} by {suggestion_type}"
}
],
"response_format": "timeseries",
"style": {
"palette": "datadog16",
"order_by": "values",
"line_type": "solid",
"line_width": "normal"
},
"display_type": "line"
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$team}}"
}
]
},
"layout": { "x": 0, "y": 4, "width": 12, "height": 4 }
},
{
"id": 1376609088194674,
"definition": {
"title": "Top Teams Impacted",
"title_size": "16",
"title_align": "left",
"time": { "hide_incomplete_cost_data": true },
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" }
}
},
"layout": { "x": 0, "y": 8, "width": 12, "height": 4 }
},
{
"id": 718136447073638,
"definition": {
"type": "note",
"content": "Monitors with Missing Recipients per Team",
"background_color": "vivid_blue",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 12, "width": 6, "height": 1 }
},
{
"id": 2393792996475864,
"definition": {
"type": "note",
"content": "Monitors with Broken Handles per Team",
"background_color": "vivid_green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 12, "width": 6, "height": 1 }
},
{
"id": 4443082314028290,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- no notification handle found in monitor body\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 13, "width": 6, "height": 2 }
},
{
"id": 3954366540293996,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- notification handle is not valid\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 13, "width": 6, "height": 2 }
},
{
"id": 2546970864549118,
"definition": {
"title": "Monitors with Missing Recipients per Team",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:missing_at_handle,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "blue"
}
},
"layout": { "x": 0, "y": 15, "width": 6, "height": 5 }
},
{
"id": 3744392131942638,
"definition": {
"title": "Monitors with Broken Handles per Team",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:broken_at_handle,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "green"
}
},
"layout": { "x": 6, "y": 15, "width": 6, "height": 5 }
},
{
"id": 2751217590574740,
"definition": {
"type": "note",
"content": "Monitors Muted for Too Long",
"background_color": "purple",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 20, "width": 6, "height": 1 }
},
{
"id": 5158165900159898,
"definition": {
"type": "note",
"content": "Monitors Generating a High Volume of Alerts",
"background_color": "green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 20, "width": 6, "height": 1 }
},
{
"id": 8032070484951580,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been muted for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 21, "width": 6, "height": 2 }
},
{
"id": 4153429942317530,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor generates the top 5% of alerts over the past 10 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 21, "width": 6, "height": 2 }
},
{
"id": 4158897740932848,
"definition": {
"title": "Monitors Muted for Too Long",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:muted_duration_over_sixty_days,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "semantic"
}
},
"layout": { "x": 0, "y": 23, "width": 6, "height": 5 }
},
{
"id": 5392245250417816,
"definition": {
"title": "Monitors Generating a High Volume of Alerts",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:noisy_monitor,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": { "display": { "type": "stacked" }, "palette": "grey" }
},
"layout": { "x": 6, "y": 23, "width": 6, "height": 5 }
},
{
"id": 1271026446632020,
"definition": {
"type": "note",
"content": "Monitors Stuck in Alert State",
"background_color": "vivid_yellow",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 28, "width": 6, "height": 1 }
},
{
"id": 6315895116466318,
"definition": {
"type": "note",
"content": "Composite Monitors have Deleted Components",
"background_color": "gray",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 28, "width": 6, "height": 1 }
},
{
"id": 8251226565664096,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been alerting for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 29, "width": 6, "height": 2 }
},
{
"id": 1329067816249636,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor is a composite one and has deleted components\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 29, "width": 6, "height": 2 }
},
{
"id": 7052384595427880,
"definition": {
"title": "Monitors Stuck in Alert State",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:alerted_too_long,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "orange"
}
},
"layout": { "x": 0, "y": 31, "width": 6, "height": 5 }
},
{
"id": 2768363536962548,
"definition": {
"title": "Composite Monitors have Deleted Components",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:composite_has_deleted_constituents ,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "datadog16"
}
},
"layout": { "x": 6, "y": 31, "width": 6, "height": 5 }
}
]
},
"layout": { "x": 0, "y": 3, "width": 12, "height": 37 }
},
{
"id": 2841959907422822,
"definition": {
"title": "General overview - by service",
"background_color": "pink",
"show_title": true,
"type": "group",
"layout_type": "ordered",
"widgets": [
{
"id": 3801590205295194,
"definition": {
"type": "note",
"content": "This section is powered by the `datadog.monitor.suggested_monitor_health_by_service` metric, which is emitted daily.\n\nThe monitor counts reported in this metric exclude synthetic monitors.\n\nThese counts represent the total number of suggestions for monitor quality improvements, broken down by service.\n\nUse the `service` filter to view insights specific to your team.\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "white",
"font_size": "14",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 0, "width": 5, "height": 4 }
},
{
"id": 8418200284207718,
"definition": {
"title": "Distribution of Quality Improvements by Type",
"title_size": "16",
"title_align": "left",
"time": { "hide_incomplete_cost_data": true },
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{$team,$service} by {suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"style": { "palette": "datadog16" },
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 500,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"type": "sunburst",
"hide_total": false,
"legend": { "type": "automatic" },
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$service}}"
}
]
},
"layout": { "x": 5, "y": 0, "width": 7, "height": 4 }
},
{
"id": 8281740697966220,
"definition": {
"title": "Evolution of Quality Improvements by Type over Time",
"title_size": "16",
"title_align": "left",
"show_legend": false,
"legend_layout": "auto",
"legend_columns": ["avg", "min", "max", "value", "sum"],
"time": { "hide_incomplete_cost_data": true },
"type": "timeseries",
"requests": [
{
"formulas": [{ "formula": "query1" }],
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{$team, $service} by {suggestion_type}"
}
],
"response_format": "timeseries",
"style": {
"palette": "datadog16",
"order_by": "values",
"line_type": "solid",
"line_width": "normal"
},
"display_type": "line"
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$service}}"
}
]
},
"layout": { "x": 0, "y": 4, "width": 12, "height": 4 }
},
{
"id": 5048429332292860,
"definition": {
"title": "Top services impacted",
"title_size": "16",
"title_align": "left",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" }
}
},
"layout": { "x": 0, "y": 8, "width": 12, "height": 5 }
},
{
"id": 2233801928907094,
"definition": {
"type": "note",
"content": "Monitors with Missing Recipients per Service",
"background_color": "vivid_blue",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 13, "width": 6, "height": 1 }
},
{
"id": 7329031300309162,
"definition": {
"type": "note",
"content": "Monitors with Broken Handles per Service",
"background_color": "vivid_green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 13, "width": 6, "height": 1 }
},
{
"id": 7627510169738418,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- no notification handle found in monitor body\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 14, "width": 6, "height": 2 }
},
{
"id": 2826082028591748,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- notification handle is not valid\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 14, "width": 6, "height": 2 }
},
{
"id": 5050954942402816,
"definition": {
"title": "Monitors with Missing Recipients per Service",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:missing_at_handle,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "blue"
}
},
"layout": { "x": 0, "y": 16, "width": 6, "height": 5 }
},
{
"id": 7809748805807956,
"definition": {
"title": "Monitors with Broken Handles per Service",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:broken_at_handle,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "green"
}
},
"layout": { "x": 6, "y": 16, "width": 6, "height": 5 }
},
{
"id": 8416588682594596,
"definition": {
"type": "note",
"content": "Monitors Muted for Too Long",
"background_color": "purple",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 21, "width": 6, "height": 1 }
},
{
"id": 4951606729784970,
"definition": {
"type": "note",
"content": "Monitors Generating a High Volume of Alerts",
"background_color": "green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 21, "width": 6, "height": 1 }
},
{
"id": 1778359756038190,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been muted for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 22, "width": 6, "height": 2 }
},
{
"id": 8559060613933804,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor generates the top 5% of alerts over the past 10 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 22, "width": 6, "height": 2 }
},
{
"id": 7041249940897320,
"definition": {
"title": "Monitors Muted for Too Long",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:muted_duration_over_sixty_days,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "semantic"
}
},
"layout": { "x": 0, "y": 24, "width": 6, "height": 5 }
},
{
"id": 7810615049061724,
"definition": {
"title": "Monitors Generating a High Volume of Alerts",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:noisy_monitor,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "grey"
}
},
"layout": { "x": 6, "y": 24, "width": 6, "height": 5 }
},
{
"id": 5108940190121326,
"definition": {
"type": "note",
"content": "Monitors Stuck in Alert State",
"background_color": "vivid_yellow",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 29, "width": 6, "height": 1 }
},
{
"id": 4931941666409286,
"definition": {
"type": "note",
"content": "Composite Monitors have Deleted Components",
"background_color": "gray",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 29, "width": 6, "height": 1 }
},
{
"id": 6520923360190496,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been alerting for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 30, "width": 6, "height": 2 }
},
{
"id": 1364025765104008,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor is a composite one and has deleted components\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 30, "width": 6, "height": 2 }
},
{
"id": 3670188762233230,
"definition": {
"title": "Monitors Stuck in Alert State",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:alerted_too_long,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "orange"
}
},
"layout": { "x": 0, "y": 32, "width": 6, "height": 5 }
},
{
"id": 9006201303765196,
"definition": {
"title": "Composite Monitors have Deleted Components",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:alerted_too_long,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "datadog16"
}
},
"layout": { "x": 6, "y": 32, "width": 6, "height": 5 }
}
]
},
"layout": {
"x": 0,
"y": 40,
"width": 12,
"height": 38,
"is_column_break": true
}
}
],
"template_variables": [
{
"name": "team",
"prefix": "team",
"available_values": [],
"default": "*"
},
{
"name": "service",
"prefix": "service",
"available_values": [],
"default": "*"
}
],
"layout_type": "ordered",
"notify_list": [],
"reflow_type": "fixed"
}