Versión de la integración3.4.0
To find out if this integration is available in your organization, see your Datadog Integrations page or ask your organization administrator.
To initiate an exception request to enable this integration for your organization, email support@ddog-gov.com.
Overview
Send events from your Nagios-monitored infrastructure to Datadog for richer alerting and to help correlate Nagios events with metrics from your Datadog-monitored infrastructure.
This check watches your Nagios server’s logs and sends events to Datadog for the following:
- Service flaps
- Host state changes
- Passive service checks
- Host and service downtimes
This check can also send Nagios performance data as metrics to Datadog.
Minimum Agent version: 6.0.0
Setup
Installation
The Nagios check is included in the Datadog Agent package, so you don’t need to install anything else on your Nagios servers.
Configuration
Follow the instructions below to configure this check for an Agent running on a host. For containerized environments, see the Containerized section.
Containerized
For containerized environments, see the Autodiscovery Integration Templates for guidance on applying the parameters below.
| Parameter | Value |
|---|
<INTEGRATION_NAME> | nagios |
<INIT_CONFIG> | blank or {} |
<INSTANCE_CONFIG> | {"nagios_conf": "/etc/nagios3/nagios.cfg"} |
Note: The containerized Agent should be able to access the /etc/nagios3/nagios.cfg file to enable the Datadog-Nagios integration.
Validation
Run the Agent’s status subcommand and look for nagios under the Checks section.
Data Collected
Metrics
| |
|---|
nagios.current_users.users (gauge) | Number of current user Shown as user |
nagios.disk_space (gauge) | Disk space Shown as mebibyte |
nagios.host.pl (gauge) | Packet loss Shown as percent |
nagios.host.rta (gauge) | Round trip time between Nagios and client Shown as millisecond |
nagios.http.size (gauge) | HTTP response size Shown as byte |
nagios.http.time (gauge) | HTTP response time Shown as second |
nagios.ping.pl (gauge) | Packet loss Shown as percent |
nagios.ping.rta (gauge) | Round trip time between Nagios and client Shown as millisecond |
nagios.swap_usage.swap (gauge) | Memory swap usage Shown as mebibyte |
Log collection
Collecting logs is disabled by default in the Datadog Agent, enable it in your datadog.yaml file:
Add this configuration block to your nagios.d/conf.yaml file to start collecting your Nagios logs:
logs:
- type: file
path: /opt/nagios/var/log/nagios.log
source: nagios
Change the path parameter value based on your environment, see log_file value in your nagios configuration file. See the sample nagios.d/conf.yaml for all available configuration options.
Restart the Agent.
Events
The check watches the Nagios events log for log lines containing these strings, emitting an event for each line:
- SERVICE FLAPPING ALERT
- ACKNOWLEDGE_SVC_PROBLEM
- SERVICE ALERT
- HOST ALERT
- ACKNOWLEDGE_HOST_PROBLEM
- SERVICE NOTIFICATION
- HOST DOWNTIME ALERT
- PROCESS_SERVICE_CHECK_RESULT
- SERVICE DOWNTIME ALERT
Service Checks
The Nagios check does not include any service checks.
Trigger on-call pages
Configure Nagios notification commands to forward notifications to the Datadog events intake, which routes them to Datadog On-Call using an oncall_team query parameter. Datadog deduplicates and auto-resolves events that share an aggregation_key, so a Nagios RECOVERY resolves the page created by its corresponding PROBLEM automatically.
Map Nagios events to on-call pages
| Nagios state | Event category | Alert status | On-Call effect |
|---|
CRITICAL, DOWN | alert | error | Pages the configured On-Call team |
WARNING | alert | warn | Pages the configured On-Call team |
OK, UP | alert | ok | Resolves the page with the same aggregation_key |
UNKNOWN | alert | warn | Pages the configured On-Call team |
Setup
The script depends on curl and python3. Both are commonly available on Nagios hosts.
Keep the Datadog API key out of commands.cfg. Set DD_API_KEY as an environment variable exported by the Nagios service, or load it from a Nagios resource file with restricted permissions.
Create the notification script
Create /usr/local/nagios/libexec/notify_datadog_oncall.sh:
#!/bin/bash
set -u
DD_API_KEY="${DD_API_KEY:-<YOUR_DATADOG_API_KEY>}"
DD_SITE="${DD_SITE:-datadoghq.com}" # for example, datadoghq.eu, us3.datadoghq.com
NAGIOS_HOST="${1}"
SERVICEDESC="${2}"
STATE="${3}" # CRITICAL, WARNING, OK, UNKNOWN, UP, DOWN
ONCALL_TEAM="${4}" # Datadog On-Call team handle, for example, "ops"
OUTPUT="${5}"
case "$STATE" in
CRITICAL|DOWN) STATUS="error" ;;
WARNING) STATUS="warn" ;;
OK|UP) STATUS="ok" ;;
*) STATUS="warn" ;;
esac
TITLE_JSON=$(printf 'Nagios: %s / %s is %s' "$NAGIOS_HOST" "$SERVICEDESC" "$STATE" \
| python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))')
MESSAGE_JSON=$(printf '%s' "$OUTPUT" \
| python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))')
PAYLOAD=$(cat <<EOF
{
"data": {
"type": "event",
"attributes": {
"category": "alert",
"title": $TITLE_JSON,
"message": $MESSAGE_JSON,
"integration_id": "nagios",
"aggregation_key": "nagios:${NAGIOS_HOST}:${SERVICEDESC}",
"tags": ["integration:nagios", "host:${NAGIOS_HOST}", "service:${SERVICEDESC}"],
"attributes": {"status": "$STATUS"},
"nagios": {"source": "nagios-core"}
}
}
}
EOF
)
URL="https://event-management-intake.${DD_SITE}/api/v2/events/webhook?dd-api-key=${DD_API_KEY}&integration_id=nagios&oncall_team=${ONCALL_TEAM}"
RESPONSE_FILE=$(mktemp)
trap 'rm -f "$RESPONSE_FILE"' EXIT
HTTP_CODE=$(curl -s -m 15 -o "$RESPONSE_FILE" -w '%{http_code}' \
-X POST "$URL" \
-H "Content-Type: application/json" \
-d "$PAYLOAD")
if [ "$HTTP_CODE" -lt 200 ] || [ "$HTTP_CODE" -ge 300 ]; then
echo "Datadog On-Call notification failed: HTTP $HTTP_CODE" >&2
cat "$RESPONSE_FILE" >&2
exit 1
fi
Make the script executable:
sudo chmod 755 /usr/local/nagios/libexec/notify_datadog_oncall.sh
Define the Nagios commands
Add to commands.cfg. Use separate commands for service and host notifications so the correct Nagios macros are passed:
define command {
command_name notify-datadog-oncall-service
command_line /usr/local/nagios/libexec/notify_datadog_oncall.sh "$HOSTALIAS$" "$SERVICEDESC$" "$SERVICESTATE$" "$_CONTACTONCALL_TEAM$" "$SERVICEOUTPUT$"
}
define command {
command_name notify-datadog-oncall-host
command_line /usr/local/nagios/libexec/notify_datadog_oncall.sh "$HOSTALIAS$" "Host" "$HOSTSTATE$" "$_CONTACTONCALL_TEAM$" "$HOSTOUTPUT$"
}
The custom variable _oncall_team maps to the Datadog On-Call team handle @oncall-<handle>. Set it to exactly the team handle configured in Datadog On-Call, without the @oncall- prefix. Add contacts to contacts.cfg:
define contact {
contact_name datadog-ops
alias Ops Team On-Call
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-datadog-oncall-service
host_notification_commands notify-datadog-oncall-host
_oncall_team ops
}
define service {
use generic-service
host_name webserver-01
service_description HTTP_Service
check_command check_http
contacts datadog-ops
notification_options w,u,c,r
}
Reload Nagios
sudo systemctl reload nagios
Verify pages appear under On-Call > Pages in Datadog.
Troubleshooting
Need help? Contact Datadog support.
Further Reading
Más enlaces, artículos y documentación útiles: