---
title: Velero
description: Monitor the performance and usage of your Velero deployments.
breadcrumbs: Docs > Integrations > Velero
---

# Velero
Supported OS Integration version3.4.1
{% callout %}
# Important note for users on the following Datadog sites: us2.ddog-gov.com

{% alert level="info" %}
To find out if this integration is available in your organization, see your [Datadog Integrations](https://app.datadoghq.com/integrations) page or ask your organization administrator.

To initiate an exception request to enable this integration for your organization, email [support@ddog-gov.com](mailto:support@ddog-gov.com).
{% /alert %}

{% /callout %}

## Overview{% #overview %}

This check monitors [Velero](https://velero.io) through the Datadog Agent. It collects data about Velero's backup, restore and snapshot operations. This allows users to gain insight into the health, performance and reliability of their disaster recovery processes.

**Minimum Agent version:** 7.64.0

## Setup{% #setup %}

### Installation{% #installation %}

The Velero check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package. No additional installation is needed on your server.

### Configuration{% #configuration %}

#### Metrics{% #metrics %}

{% tab title="Host" %}
Follow the instructions below to install and configure this check for an Agent running on a host.

1. Edit the `velero.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory to start collecting your Velero performance data. See the [sample velero.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/velero/datadog_checks/velero/data/conf.yaml.example) for all available configuration options.

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands.md#start-stop-and-restart-the-agent).

{% /tab %}

{% tab title="Kubernetes" %}
See the [Autodiscovery Integration Templates](https://docs.datadoghq.com/agent/kubernetes/integrations.md) for guidance on configuring this integration in a containerized environment.

Note that two types of pods need to be queried for all metrics to be collected: `velero` and `node-agent` Therefore, make sure to update the annotations of the `velero` deployment as well as the `node-agent` daemonset.
{% /tab %}

#### Logs{% #logs %}

The Velero integration can collect logs from the Velero pods.

{% tab title="Host" %}
To collect logs from Velero containers on a host:

1. Collecting logs is disabled by default in the Datadog Agent. Enable it in your `datadog.yaml` file:

   ```yaml
   logs_enabled: true
   ```

1. Uncomment and edit the logs configuration block in your `velero.d/conf.yaml` file. For example:

   ```yaml
   logs:
     - type: docker
       source: velero
       service: velero
   ```

{% /tab %}

{% tab title="Kubernetes" %}
To collect logs from a Velero Kubernetes deployment:

1. Collecting logs is disabled by default in the Datadog Agent. To enable it, see [Kubernetes Log Collection](https://docs.datadoghq.com/agent/kubernetes/log.md#setup).

1. Set Log Integrations as pod annotations. This can also be configured with a file, a ConfigMap, or a key-value store. For more information, see the configuration section of [Kubernetes Log Collection](https://docs.datadoghq.com/agent/kubernetes/log.md#configuration).

{% /tab %}

### Validation{% #validation %}

[Run the Agent's status subcommand](https://docs.datadoghq.com/agent/guide/agent-commands.md#agent-status-and-information) and look for `velero` under the Checks section.

## Data Collected{% #data-collected %}

### Metrics{% #metrics-1 %}

This integration collects various Velero metrics, including:

- **Backup**: Success/failure rates, durations, and data sizes.
- **Restore**: Success/failure counts and validation failures.
- **Snapshot**: CSI and volume snapshot attempts, successes, and failures.
- **Pod volume data**: Upload/download success and failure rates. These are exposed by the `node-agent` pods.

|  |
|  |
| **velero.backup.amount**(gauge)                                 | Current number of existent backups                                                                |
| **velero.backup.attempt.count**(count)                          | Total number of attempted backups                                                                 |
| **velero.backup.deletion.attempt.count**(count)                 | Total number of attempted backup deletions                                                        |
| **velero.backup.deletion.failure.count**(count)                 | Total number of failed backup deletions                                                           |
| **velero.backup.deletion.success.count**(count)                 | Total number of successful backup deletions                                                       |
| **velero.backup.duration.seconds.bucket**(count)                | Bucket for time taken to complete backup, in seconds                                              |
| **velero.backup.duration.seconds.count**(count)                 | Count aggregation for time taken to complete backup                                               |
| **velero.backup.duration.seconds.sum**(count)                   | Cumulative sum of time taken to complete backup, in seconds*Shown as second*                      |
| **velero.backup.failure.count**(count)                          | Total number of failed backups                                                                    |
| **velero.backup.items**(gauge)                                  | Total number of items backed up                                                                   |
| **velero.backup.items.errors**(gauge)                           | Total number of errors encountered during backup*Shown as error*                                  |
| **velero.backup.last\_status**(gauge)                           | Last status of the backup. A value of 1 is success, 0 is failure                                  |
| **velero.backup.last\_successful\_timestamp**(gauge)            | Last time a backup ran successfully, Unix timestamp in seconds                                    |
| **velero.backup.partial\_failure.count**(count)                 | Total number of partially failed backups                                                          |
| **velero.backup.success.count**(count)                          | Total number of successful backups                                                                |
| **velero.backup.tarball\_size\_bytes**(gauge)                   | Size, in bytes, of a backup*Shown as byte*                                                        |
| **velero.backup.validation\_failure.count**(count)              | Total number of validation failed backups                                                         |
| **velero.backup.warning.count**(count)                          | Total number of warned backups                                                                    |
| **velero.csi\_snapshot.attempt.count**(count)                   | Total number of CSI attempted volume snapshots                                                    |
| **velero.csi\_snapshot.failure.count**(count)                   | Total number of CSI failed volume snapshots                                                       |
| **velero.csi\_snapshot.success.count**(count)                   | Total number of CSI successful volume snapshots                                                   |
| **velero.pod\_volume.backup.dequeue.count**(count)              | Total number of pod_volume_backup objects dequeued                                                |
| **velero.pod\_volume.backup.enqueue.count**(count)              | Total number of pod_volume_backup objects enqueued                                                |
| **velero.pod\_volume.data.download.cancel.count**(count)        | Total number of canceled downloaded snapshots                                                     |
| **velero.pod\_volume.data.download.failure.count**(count)       | Total number of failed downloaded snapshots                                                       |
| **velero.pod\_volume.data.download.success.count**(count)       | Total number of successful downloaded snapshots                                                   |
| **velero.pod\_volume.data.upload.cancel.count**(count)          | Total number of canceled uploaded snapshots                                                       |
| **velero.pod\_volume.data.upload.failure.count**(count)         | Total number of failed uploaded snapshots                                                         |
| **velero.pod\_volume.data.upload.success.count**(count)         | Total number of successful uploaded snapshots                                                     |
| **velero.pod\_volume.operation\_latency.seconds.bucket**(count) | Histogram bucket for time taken to complete pod volume operations, in seconds                     |
| **velero.pod\_volume.operation\_latency.seconds.count**(count)  | Count aggregation for time taken to complete pod volume operations                                |
| **velero.pod\_volume.operation\_latency.seconds.gauge**(gauge)  | Gauge metric indicating time taken, in seconds, to perform pod volume operations*Shown as second* |
| **velero.pod\_volume.operation\_latency.seconds.sum**(count)    | Sum aggregation for time taken to complete pod volume operations, in seconds*Shown as second*     |
| **velero.restore.amount**(gauge)                                | Current number of existent restores                                                               |
| **velero.restore.attempt.count**(count)                         | Total number of attempted restores                                                                |
| **velero.restore.failed.count**(count)                          | Total number of failed restores                                                                   |
| **velero.restore.partial\_failure.count**(count)                | Total number of partially failed restores                                                         |
| **velero.restore.success.count**(count)                         | Total number of successful restores                                                               |
| **velero.restore.validation\_failed.count**(count)              | Total number of failed restores failing validations                                               |
| **velero.volume\_snapshot.attempt.count**(count)                | Total number of attempted volume snapshots                                                        |
| **velero.volume\_snapshot.failure.count**(count)                | Total number of failed volume snapshots                                                           |
| **velero.volume\_snapshot.success.count**(count)                | Total number of successful volume snapshots                                                       |

### Events{% #events %}

The Velero integration does not include any events.

### Service Checks{% #service-checks %}

**velero.openmetrics.health**

Returns `CRITICAL` if the Agent is unable to connect to the Velero OpenMetrics endpoint, otherwise returns `OK`.

*Statuses: ok, critical*

## Troubleshooting{% #troubleshooting %}

Make sure that your Velero server is exposing metrics by checking that the feature is enabled in the deployment configuration:

```yaml
# Settings for Velero's prometheus metrics. Enabled by default.
metrics:
  enabled: true
  scrapeInterval: 30s
  scrapeTimeout: 10s
```

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).