시간이 지남에 따라 모니터링 혼잡이 쌓이면 불필요한 경보, 중복 경보, 운영 마찰 증가가 발생합니다. 이 가이드는 혼잡한 모니터링을 식별하고 정리하는 명확한 접근 방식을 제시하며, 경보 워크플로를 효율화하는 데 도움이 되는 사용 사례를 제공합니다.
또한 시스템이 성장함에 따라 모니터링 전략을 확장하고 관리하기 쉬울 수 있도록 모니터링 환경을 깨끗하게 유지하기 위한 모범 사례를 제공합니다.
이 가이드는 모니터링 혼잡을 정리하기 위한 주요 사용 사례를 다룹니다.
모니터링은 장애, 보안 위협, 성능 문제에 대한 조기 경고 시스템 역할을 합니다. 그러나 장기간 음소거된 모니터링은 이 목적을 무색하게 하며, 장기 음소거는 해당 모니터링이 구식이거나 관련성이 없거나 소음이 과다하여 유용하지 않다는 신호인 경우가 많습니다. 이러한 모니터링은 검토 후 적절하게 조정하여 다시 활성화하거나 폐기하여 혼잡을 줄이고 경보 환경에서 오래된 모니터링을 제거해야 합니다.
가치가 없는 모니터링을 정리하고 장기간 음소거된 모니터링을 기간이 정해진 일정으로 바꾸세요.
장기간 음소거된 모니터링을 감사하여 실제로 필요한지 또는 유용한지 확인합니다. 일부 모니터링은 정당한 이유로 음소거되어 있을 수 있으므로, 삭제하지 않도록 주의해야 합니다.
목록을 확보한 후, 모니터링 품질 페이지에서 개별 모니터링을 조치하거나 2단계와 3단계를 통해 일괄 삭제할 수 있습니다.
변경 사항을 프로그래밍 방식으로 자동화하려면 모니터링 ID 목록을 가져옵니다. 먼저 60일 이상 음소거된 모니터링을 대상으로 시작합니다.
다음 CURL 명령어가 해당 정보를 가져옵니다.
가독성을 위해 CSV 파일 형태로 모니터링 세부 정보를 가져옵니다. 쿼리를 특정 사용 사례에 맞게 조정할 수 있습니다.
2단계에서 확보한 60일 이상 음소거된 모니터링 목록을 기반으로, 다음 스크립트를 사용하여 모니터링을 삭제할 수 있습니다. 스크립트를 실행하기 전에 표에서 모니터링 ID 열을 첫 번째로 배치합니다.
지속적인 경보는 두 가지 문제를 나타낼 수 있습니다. 문제 자체가 조치 불가하거나 모니터링 임계값이 잘못 구성된 경우입니다. 두 경우 모두 경보에 대한 신뢰를 떨어뜨려 경보 피로를 유발합니다. 이러한 모니터링은 검토 후 편집하거나 제거해야 합니다.
60일 이상 ALERT 상태에 있는 모니터링 목록을 가져오는 방법은 다음과 같습니다.
태그만 다른 별도의 모니터링을 생성하면 불필요한 중복이 발생할 수 있습니다. 예를 들어, prod용 CPU 사용량 모니터링과 staging용 모니터링을 각각 만들면 모니터링 수가 증가합니다.
소음이 많은 모니터링은 팀을 실제 문제에 둔감하게 만듭니다. 플래핑(모니터링이 자주 경보와 복구 상태를 오가는 현상)은 불안정한 임계값, 평가 지연 누락 또는 시스템 불안정을 나타냅니다.
소음을 줄이려면 모니터링의 평가 집계와 임계값 구성을 검토하세요. 경보 동작을 안정화하도록 설정을 조정하고, 모니터링이 더 이상 가치가 없으면 삭제하세요.
다음은 경보가 많이 발생하는 모니터링 목록을 가져오는 방법입니다.
잘못 구성된 모니터링은 활성화된 모니터링으로서 올바르게 사용할 수 있지만, 알림을 받지 못하기 때문에 비효율적입니다. 이러한 잘못된 구성은 모니터링의 신뢰성을 떨어뜨리고 디버깅이나 분류를 어렵게 만듭니다. 이들을 정리하면 경보가 정확하고 실행 가능하며 관측 워크플로에 통합됩니다.
핸들이 잘못 구성된 모니터링 목록을 가져오는 방법은 다음과 같습니다.
이 문제는 주로 AWS 메트릭을 기반으로 하는 모니터링에 영향을 미칩니다. Datadog은 API를 통해 AWS 메트릭을 가져오기 때문에 데이터가 사용 가능해지기까지 내장 지연이 존재합니다. 이를 고려하지 않으면, 불완전하거나 지연된 데이터로 인해 모니터링이 오탐을 트리거할 수 있습니다.
지연이 누락된 모니터링 목록을 가져오는 방법은 다음과 같습니다.
복합 조건 모니터링은 두 개 이상의 모니터링(구성 요소라고 함)의 논리적 조합에 따라 상태를 평가합니다. 이 구성 요소 모니터링 중 하나라도 삭제되거나 사용할 수 없게 되면, 복합 조건 모니터링은 유효하지 않거나 신뢰할 수 없게 됩니다.
구성 요소가 누락되었다는 것은 일반적으로 복합 조건 모니터링이 생성된 후 원래 입력 모니터링 중 적어도 하나가 제거되었음을 의미합니다. 이로 인해 복합 조건 모니터링이 불완전해지고 경보 동작에서 잠재적으로 오해를 불러일으킬 수 있습니다.
시작하는 데 도움이 되도록 다음 JSON 대시보드 정의를 Datadog 계정에 직접 가져올 수 있습니다.
{
"title": "Monitor Quality OOTB Dashboard",
"description": "",
"widgets": [
{
"id": 8853380235542346,
"definition": {
"type": "note",
"content": "This Monitor Quality dashboard provides a comprehensive view of monitor quality metrics, broken down by `team` and `service`. Its goal is to help you easily analyze and act on monitor quality data, enabling you to schedule reports, download insights as PDFs, and more.\n\n**Key Features:**\n- Team and Service Views: You can filter the dashboard either by team or by service, but not both simultaneously. If you filter by `team`, refer to the [Team Section](https://app.datadoghq.com/dashboard/u7b-4n7-gn5/monitor-quality-ootb-dashboard?fromUser=false&refresh_mode=paused&from_ts=1732107838741&to_ts=1732280638741&live=false&tile_focus=4548404374449802) for relevant insights. If you filter by `service`, explore the [Service Section](https://app.datadoghq.com/dashboard/u7b-4n7-gn5/monitor-quality-ootb-dashboard?fromUser=false&refresh_mode=paused&from_ts=1732107865224&to_ts=1732280665224&live=false&tile_focus=2841959907422822) for detailed information.\n- Monitor-Level Details: For a deeper dive into specific impacted monitors, navigate to the [Monitor Quality page](https://app.datadoghq.com/monitors/quality).\n- Seamless Navigation: Use the context links provided in the dashboard to jump directly to the [Monitor Quality page](https://app.datadoghq.com/monitors/quality), pre-filtered with the same criteria you've applied on the dashboard.\n\nThis dashboard is designed to give you both a high-level overview and actionable paths to improve your monitoring posture.",
"background_color": "white",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 0, "width": 12, "height": 3 }
},
{
"id": 4548404374449802,
"definition": {
"title": "General overview - by team",
"background_color": "blue",
"show_title": true,
"type": "group",
"layout_type": "ordered",
"widgets": [
{
"id": 2449119265341574,
"definition": {
"type": "note",
"content": "This section is powered by the `datadog.monitor.suggested_monitor_health_by_team` metric, which is emitted daily.\n\nThe monitor counts reported in this metric exclude synthetic monitors.\n\nThese counts represent the total number of suggestions for monitor quality improvements, broken down by team.\n\nUse the `team` filter to view insights specific to your team.\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "white",
"font_size": "14",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 0, "width": 5, "height": 4 }
},
{
"id": 3001209940385798,
"definition": {
"title": "Distribution of Quality Improvements by Type",
"title_size": "16",
"title_align": "left",
"time": { "hide_incomplete_cost_data": true },
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{$team,$service} by {suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"style": { "palette": "datadog16" },
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 500,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"type": "sunburst",
"hide_total": false,
"legend": { "type": "automatic" },
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$team}}"
}
]
},
"layout": { "x": 5, "y": 0, "width": 7, "height": 4 }
},
{
"id": 498569597362654,
"definition": {
"title": "Evolution of Quality Improvements by Type over Time",
"title_size": "16",
"title_align": "left",
"show_legend": false,
"legend_layout": "auto",
"legend_columns": ["avg", "min", "max", "value", "sum"],
"time": { "hide_incomplete_cost_data": true },
"type": "timeseries",
"requests": [
{
"formulas": [{ "formula": "query1" }],
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{$team,$service} by {suggestion_type}"
}
],
"response_format": "timeseries",
"style": {
"palette": "datadog16",
"order_by": "values",
"line_type": "solid",
"line_width": "normal"
},
"display_type": "line"
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$team}}"
}
]
},
"layout": { "x": 0, "y": 4, "width": 12, "height": 4 }
},
{
"id": 1376609088194674,
"definition": {
"title": "Top Teams Impacted",
"title_size": "16",
"title_align": "left",
"time": { "hide_incomplete_cost_data": true },
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" }
}
},
"layout": { "x": 0, "y": 8, "width": 12, "height": 4 }
},
{
"id": 718136447073638,
"definition": {
"type": "note",
"content": "Monitors with Missing Recipients per Team",
"background_color": "vivid_blue",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 12, "width": 6, "height": 1 }
},
{
"id": 2393792996475864,
"definition": {
"type": "note",
"content": "Monitors with Broken Handles per Team",
"background_color": "vivid_green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 12, "width": 6, "height": 1 }
},
{
"id": 4443082314028290,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- no notification handle found in monitor body\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 13, "width": 6, "height": 2 }
},
{
"id": 3954366540293996,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- notification handle is not valid\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 13, "width": 6, "height": 2 }
},
{
"id": 2546970864549118,
"definition": {
"title": "Monitors with Missing Recipients per Team",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:missing_at_handle,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "blue"
}
},
"layout": { "x": 0, "y": 15, "width": 6, "height": 5 }
},
{
"id": 3744392131942638,
"definition": {
"title": "Monitors with Broken Handles per Team",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:broken_at_handle,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "green"
}
},
"layout": { "x": 6, "y": 15, "width": 6, "height": 5 }
},
{
"id": 2751217590574740,
"definition": {
"type": "note",
"content": "Monitors Muted for Too Long",
"background_color": "purple",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 20, "width": 6, "height": 1 }
},
{
"id": 5158165900159898,
"definition": {
"type": "note",
"content": "Monitors Generating a High Volume of Alerts",
"background_color": "green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 20, "width": 6, "height": 1 }
},
{
"id": 8032070484951580,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been muted for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 21, "width": 6, "height": 2 }
},
{
"id": 4153429942317530,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor generates the top 5% of alerts over the past 10 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 21, "width": 6, "height": 2 }
},
{
"id": 4158897740932848,
"definition": {
"title": "Monitors Muted for Too Long",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:muted_duration_over_sixty_days,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "semantic"
}
},
"layout": { "x": 0, "y": 23, "width": 6, "height": 5 }
},
{
"id": 5392245250417816,
"definition": {
"title": "Monitors Generating a High Volume of Alerts",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:noisy_monitor,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": { "display": { "type": "stacked" }, "palette": "grey" }
},
"layout": { "x": 6, "y": 23, "width": 6, "height": 5 }
},
{
"id": 1271026446632020,
"definition": {
"type": "note",
"content": "Monitors Stuck in Alert State",
"background_color": "vivid_yellow",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 28, "width": 6, "height": 1 }
},
{
"id": 6315895116466318,
"definition": {
"type": "note",
"content": "Composite Monitors have Deleted Components",
"background_color": "gray",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 28, "width": 6, "height": 1 }
},
{
"id": 8251226565664096,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been alerting for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 29, "width": 6, "height": 2 }
},
{
"id": 1329067816249636,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor is a composite one and has deleted components\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 29, "width": 6, "height": 2 }
},
{
"id": 7052384595427880,
"definition": {
"title": "Monitors Stuck in Alert State",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:alerted_too_long,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "orange"
}
},
"layout": { "x": 0, "y": 31, "width": 6, "height": 5 }
},
{
"id": 2768363536962548,
"definition": {
"title": "Composite Monitors have Deleted Components",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_team{!team:none,suggestion_type:composite_has_deleted_constituents ,$team,$service} by {team,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{team}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "datadog16"
}
},
"layout": { "x": 6, "y": 31, "width": 6, "height": 5 }
}
]
},
"layout": { "x": 0, "y": 3, "width": 12, "height": 37 }
},
{
"id": 2841959907422822,
"definition": {
"title": "General overview - by service",
"background_color": "pink",
"show_title": true,
"type": "group",
"layout_type": "ordered",
"widgets": [
{
"id": 3801590205295194,
"definition": {
"type": "note",
"content": "This section is powered by the `datadog.monitor.suggested_monitor_health_by_service` metric, which is emitted daily.\n\nThe monitor counts reported in this metric exclude synthetic monitors.\n\nThese counts represent the total number of suggestions for monitor quality improvements, broken down by service.\n\nUse the `service` filter to view insights specific to your team.\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "white",
"font_size": "14",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 0, "width": 5, "height": 4 }
},
{
"id": 8418200284207718,
"definition": {
"title": "Distribution of Quality Improvements by Type",
"title_size": "16",
"title_align": "left",
"time": { "hide_incomplete_cost_data": true },
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{$team,$service} by {suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"style": { "palette": "datadog16" },
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 500,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"type": "sunburst",
"hide_total": false,
"legend": { "type": "automatic" },
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$service}}"
}
]
},
"layout": { "x": 5, "y": 0, "width": 7, "height": 4 }
},
{
"id": 8281740697966220,
"definition": {
"title": "Evolution of Quality Improvements by Type over Time",
"title_size": "16",
"title_align": "left",
"show_legend": false,
"legend_layout": "auto",
"legend_columns": ["avg", "min", "max", "value", "sum"],
"time": { "hide_incomplete_cost_data": true },
"type": "timeseries",
"requests": [
{
"formulas": [{ "formula": "query1" }],
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{$team, $service} by {suggestion_type}"
}
],
"response_format": "timeseries",
"style": {
"palette": "datadog16",
"order_by": "values",
"line_type": "solid",
"line_width": "normal"
},
"display_type": "line"
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{$service}}"
}
]
},
"layout": { "x": 0, "y": 4, "width": 12, "height": 4 }
},
{
"id": 5048429332292860,
"definition": {
"title": "Top services impacted",
"title_size": "16",
"title_align": "left",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" }
}
},
"layout": { "x": 0, "y": 8, "width": 12, "height": 5 }
},
{
"id": 2233801928907094,
"definition": {
"type": "note",
"content": "Monitors with Missing Recipients per Service",
"background_color": "vivid_blue",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 13, "width": 6, "height": 1 }
},
{
"id": 7329031300309162,
"definition": {
"type": "note",
"content": "Monitors with Broken Handles per Service",
"background_color": "vivid_green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 13, "width": 6, "height": 1 }
},
{
"id": 7627510169738418,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- no notification handle found in monitor body\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 14, "width": 6, "height": 2 }
},
{
"id": 2826082028591748,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- notification handle is not valid\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 14, "width": 6, "height": 2 }
},
{
"id": 5050954942402816,
"definition": {
"title": "Monitors with Missing Recipients per Service",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:missing_at_handle,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "blue"
}
},
"layout": { "x": 0, "y": 16, "width": 6, "height": 5 }
},
{
"id": 7809748805807956,
"definition": {
"title": "Monitors with Broken Handles per Service",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:broken_at_handle,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "green"
}
},
"layout": { "x": 6, "y": 16, "width": 6, "height": 5 }
},
{
"id": 8416588682594596,
"definition": {
"type": "note",
"content": "Monitors Muted for Too Long",
"background_color": "purple",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 21, "width": 6, "height": 1 }
},
{
"id": 4951606729784970,
"definition": {
"type": "note",
"content": "Monitors Generating a High Volume of Alerts",
"background_color": "green",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 21, "width": 6, "height": 1 }
},
{
"id": 1778359756038190,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been muted for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 22, "width": 6, "height": 2 }
},
{
"id": 8559060613933804,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor generates the top 5% of alerts over the past 10 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 22, "width": 6, "height": 2 }
},
{
"id": 7041249940897320,
"definition": {
"title": "Monitors Muted for Too Long",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:muted_duration_over_sixty_days,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "semantic"
}
},
"layout": { "x": 0, "y": 24, "width": 6, "height": 5 }
},
{
"id": 7810615049061724,
"definition": {
"title": "Monitors Generating a High Volume of Alerts",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:noisy_monitor,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "grey"
}
},
"layout": { "x": 6, "y": 24, "width": 6, "height": 5 }
},
{
"id": 5108940190121326,
"definition": {
"type": "note",
"content": "Monitors Stuck in Alert State",
"background_color": "vivid_yellow",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 0, "y": 29, "width": 6, "height": 1 }
},
{
"id": 4931941666409286,
"definition": {
"type": "note",
"content": "Composite Monitors have Deleted Components",
"background_color": "gray",
"font_size": "18",
"text_align": "center",
"vertical_align": "center",
"show_tick": false,
"tick_pos": "50%",
"tick_edge": "left",
"has_padding": true
},
"layout": { "x": 6, "y": 29, "width": 6, "height": 1 }
},
{
"id": 6520923360190496,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor has been alerting for at least 60 days\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 0, "y": 30, "width": 6, "height": 2 }
},
{
"id": 1364025765104008,
"definition": {
"type": "note",
"content": "Monitor counts reported in this metric satisfy the following conditions:\n- the monitor is a composite one and has deleted components\n- monitor type is not `synthetics`\n\n_You can use the context links to jump to the list of affected monitors._",
"background_color": "yellow",
"font_size": "14",
"text_align": "left",
"vertical_align": "center",
"show_tick": true,
"tick_pos": "50%",
"tick_edge": "bottom",
"has_padding": true
},
"layout": { "x": 6, "y": 30, "width": 6, "height": 2 }
},
{
"id": 3670188762233230,
"definition": {
"title": "Monitors Stuck in Alert State",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:alerted_too_long,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "orange"
}
},
"layout": { "x": 0, "y": 32, "width": 6, "height": 5 }
},
{
"id": 9006201303765196,
"definition": {
"title": "Composite Monitors have Deleted Components",
"type": "toplist",
"requests": [
{
"queries": [
{
"name": "query1",
"data_source": "metrics",
"query": "sum:datadog.monitor.suggested_monitor_health_by_service{!service:none,suggestion_type:alerted_too_long,$team,$service} by {service,suggestion_type}",
"aggregator": "last"
}
],
"response_format": "scalar",
"formulas": [{ "formula": "query1" }],
"sort": {
"count": 10,
"order_by": [
{ "type": "formula", "index": 0, "order": "desc" }
]
}
}
],
"custom_links": [
{
"label": "See list of monitors",
"link": "https://app.datadoghq.com/monitors/quality?q={{service}}"
}
],
"style": {
"display": { "type": "stacked", "legend": "automatic" },
"palette": "datadog16"
}
},
"layout": { "x": 6, "y": 32, "width": 6, "height": 5 }
}
]
},
"layout": {
"x": 0,
"y": 40,
"width": 12,
"height": 38,
"is_column_break": true
}
}
],
"template_variables": [
{
"name": "team",
"prefix": "team",
"available_values": [],
"default": "*"
},
{
"name": "service",
"prefix": "service",
"available_values": [],
"default": "*"
}
],
"layout_type": "ordered",
"notify_list": [],
"reflow_type": "fixed"
}