Prometheus (legacy)

Supported OS Linux Windows Mac OS

Integration version6.3.0

Overview

Connect to Prometheus to:

  • Extract custom metrics from Prometheus endpoints
  • See Prometheus Alertmanager alerts in your Datadog event stream

Note: Datadog recommends using the OpenMetrics check since it is more efficient and fully supports Prometheus text format. Use the Prometheus check only when the metrics endpoint does not support a text format.

All the metrics retrieved by this integration are considered custom metrics.

See the Prometheus metrics collection Getting Started to learn how to configure a Prometheus Check.

Setup

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.

Installation

The Prometheus check is packaged with the Datadog Agent starting version 6.1.0.

Configuration

Edit the prometheus.d/conf.yaml file to retrieve metrics from applications that expose OpenMetrics / Prometheus end points.

Each instance is at least composed of:

SettingDescription
prometheus_urlA URL that points to the metric route (Note: must be unique)
namespaceThis namespace is prepended to all metrics (to avoid metrics name collision)
metricsA list of metrics to retrieve as custom metrics in the form - <METRIC_NAME> or - <METRIC_NAME>: <RENAME_METRIC>

When listing metrics, it’s possible to use the wildcard * like this - <METRIC_NAME>* to retrieve all matching metrics. Note: use wildcards with caution as it can potentially send a lot of custom metrics.

More advanced settings (ssl, labels joining, custom tags,…) are documented in the sample prometheus.d/conf.yaml

Due to the nature of this integration, it’s possible to submit a high number of custom metrics to Datadog. Users can control the maximum number of metrics sent for configuration errors or input changes. The check has a default limit of 2000 metrics. If needed, this limit can be increased by setting the option max_returned_metrics in the prometheus.d/conf.yaml file.

If send_monotonic_counter: True, the Agent sends the deltas of the values in question, and the in-app type is set to count (this is the default behavior). If send_monotonic_counter: False, the Agent sends the raw, monotonically increasing value, and the in-app type is set to gauge.

Validation

Run the Agent’s status subcommand and look for prometheus under the Checks section.

Data Collected

Metrics

All metrics collected by the prometheus check are forwarded to Datadog as custom metrics.

Note: Bucket data for a given <HISTOGRAM_METRIC_NAME> Prometheus histogram metric are stored in the <HISTOGRAM_METRIC_NAME>.count metric within Datadog with the tags upper_bound including the name of the buckets. To access the +Inf bucket, use upper_bound:none.

Events

Prometheus Alertmanager alerts are automatically sent to your Datadog event stream following the webhook configuration. See the Prometheus Alertmanager section for setup instructions.

Service Checks

The Prometheus check does not include any service checks.

Prometheus Alertmanager

Send Prometheus Alertmanager alerts in the event stream. Natively, Alertmanager sends all alerts simultaneously to the configured webhook. To see alerts in Datadog, you must configure your instance of Alertmanager to send alerts one at a time. You can add a group-by parameter under route to have alerts grouped by the actual name of the alert rule.

Setup

  1. Edit the alertmanager.yml configuration file to include the following:

    receivers:
    - name: datadog
      webhook_configs: 
      - send_resolved: true
        url: https://event-management-intake.datadoghq.com/api/v2/events/webhook?dd-api-key=<DATADOG_API_KEY>&integration_id=prometheus
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 5m
      receiver: datadog
      repeat_interval: 3h
    
    • The group_by parameter determines how alerts are grouped together when sent to Datadog. Alerts with matching values for the specified labels are combined into a single notification. For details on routing configuration, see the Prometheus Alertmanager documentation.
    • This endpoint accepts only one event in the payload at a time.
  2. (Optional) Use matchers to redirect specific alerts to different receivers. Matchers allow routing based on any alert label. For syntax details, see the Alertmanager matcher documentation.

    The V2 webhook supports additional query parameters. For example, use the oncall_team parameter to integrate with Datadog On-Call and redirect pages to different teams:

    receivers:
    - name: datadog-ops
      webhook_configs: 
      - send_resolved: true
        url: https://event-management-intake.datadoghq.com/api/v2/events/webhook?dd-api-key=<DATADOG_API_KEY>&integration_id=prometheus&oncall_team=ops
    - name: datadog-db
      webhook_configs:
      - send_resolved: true
        url: https://event-management-intake.datadoghq.com/api/v2/events/webhook?dd-api-key=<DATADOG_API_KEY>&integration_id=prometheus&oncall_team=database
    
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 5m
      receiver: datadog-ops
      repeat_interval: 3h
      routes:
      - matchers:
        - team="database"
        receiver: datadog-db
    
    Setting send_resolved: true (the default value) enables Alertmanager to send notifications when alerts are resolved in Prometheus. This is particularly important when using the oncall_team parameter to ensure that pages are marked as resolved. Note that resolved notifications may be delayed until the next group_interval.
  3. Restart the Prometheus and Alertmanager services.

    sudo systemctl restart prometheus.service alertmanager.service
    
  1. Edit the alertmanager.yml configuration file to include the following:

    receivers:
    - name: datadog
      webhook_configs: 
      - send_resolved: true
        url: https://app.datadoghq.com/intake/webhook/prometheus?api_key=<DATADOG_API_KEY>
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 5m
      receiver: datadog
      repeat_interval: 3h
    
    This endpoint accepts only one event in the payload at a time.
  2. Restart the Prometheus and Alertmanager services.

    sudo systemctl restart prometheus.service alertmanager.service
    

Troubleshooting

Need help? Contact Datadog support.

Further Reading