This page is not yet available in Spanish. We are working on its translation. If you have any questions or feedback about our current translation project, feel free to reach out to us!
Overview
Monitors are essential for keeping businesses and systems running smoothly. When a monitor alerts, it signals that attention is needed. However, detecting an issue is only the tip of the iceberg, the notification is what greatly impacts the resolution time.
Notification messages bridge the gap between your monitoring system and problem solvers. Unclear or poorly written messages can cause confusion, slow down response times, or unresolved issues. Whereas a clear and actionable message helps your team quickly understand what’s wrong and what to do next.
Use this guide to improve your notification messages and learn about:
Key principles of effective communication
Common mistakes to avoid
Tips for crafting messages that get results
From product managers to developers, this resource ensures notifications enhance system reliability and team efficiency.
Notification Configuration
The first step is to configure the notification with the required fields:
Monitor Name, which is also the Notification title.
Craft the Monitor Name to include key information for the responder to quickly understand the alert context. The monitor title should give a clear and concise description of the signal, including:
The failure mode(s) or the diverging metrics
What resource is affected (such as Datacenter, Kubernetes Cluster, host, or service)
Needs Revision
Improved Title
Memory usage
High memory usage on {{pod_name.name}}
While both the examples refer to a memory consumption monitor, the improved title gives a thorough representation with essential context for focused investigation.
Message
On-call responders rely on the notification body to understand and act on alerts. Write concise, accurate, and legible messages for clarity.
Precisely mention what is failing and list major root causes
Add a solution runbook for quick resolution guidance
Include links to relevant pages for clear next steps
Ensure notifications are sent to the appropriate recipients, either as direct email notifications or through integration handles (such as Slack).
Read the following sections to explore advanced features that can further enhance your monitor messages.
Variables
Monitor message variables are dynamic placeholders that allow you to customize notification messages with real-time contextual information. Use variables to enhance message clarity, and provide detailed context. There are two types of variables:
Enriches monitor notifications with contextual information.
Variables are especially important in a Multi-Alert monitor. When triggered, you need to know which group is responsible. For example, monitoring CPU usage by container, grouped by host. A valuable variable is {{host.name}} indicating the host that triggered the alert.
Conditional variables
These variables allow you to tailor the notification message by implementing branch logic based on your needs and use case. Use conditional variables to notify different people/groups depending on the group that triggered the alert.
{{#is_exact_match "role.name" "network"}}
# The content displays if the host triggering the alert contains `network` in the role name, and only notifes @network-team@company.com.
@network-team@company.com
{{/is_exact_match}}
You can receive a notification if the group that triggered the alert contains a specific string.
{{#is_match "datacenter.name" "us"}}
# The content displays if the region triggering the alert contains `us` (such as us1 or us3)
@us.datacenter@company.com
{{/is_match}}
Add monitor template variables to access the metadata that caused your monitor to alert, such as {{value}}, but also information related to the context of the alert.
For example, if you want to see the hostname, IP and value of the monitor query:
The CPU for {{host.name}} (IP:{{host.ip}}) reached a critical value of {{value}}.
For the list of available template variables, see the documentation.
You can also use template variables to create dynamic links and handles that automatically route your notifications. Example of handles:
@slack-{{service.name}} There is an ongoing issue with {{service.name}}.
Results in the following when the group service:ad-server triggers:
@slack-ad-server There is an ongoing issue with ad-server.
Example of a notification message following best practices
## What’s happening? The CPU usage on {{host.name}} has exceeded the defined threshold.
Current CPU Usage: {{value}} Threshold: {{threshold}} Time: {{last_triggered_at_epoch}}
## Impact 1. Customers are experiencing lag on the website. 2. Timeouts and Errors.
## Why? There can be several reasons as to why the CPU usage exceeded the threshold:
Increase in traffic
Hardware Issues
External Attack
## How to troubleshoot/solve the issue? 1. Analyze workload to identify CPU-intensive processes. a. for OOM - [increase pod limits if too low](<Link>) 2. Upscale {{host.name}} capacity by adding more replicas: a. directly: <Code to do so> b. change configuration through [add more replicas runbook](<Link>) 3. Check for any [Kafka issues](<Link>) 4. Check for any other outages/incident (attempted connections)