---
title: Prefect
description: A Python-first workflow orchestration platform for data, ML, and automation.
breadcrumbs: Docs > Integrations > Prefect
---

# Prefect
Supported OS Integration version1.0.0
{% callout %}
# Important note for users on the following Datadog sites: us2.ddog-gov.com

{% alert level="info" %}
To find out if this integration is available in your organization, see your [Datadog Integrations](https://app.datadoghq.com/integrations) page or ask your organization administrator.

To initiate an exception request to enable this integration for your organization, email [support@ddog-gov.com](mailto:support@ddog-gov.com).
{% /alert %}

{% /callout %}
  Prefect dashboard (light mode)Prefect dashboard (dark mode)
## Overview{% #overview %}

This check monitors [Prefect Server](https://www.prefect.io/) through the Datadog Agent.

Prefect is a Python-first workflow orchestration platform used to schedule and execute flows and tasks across work pools, work queues, and workers. This integration collects orchestration health and performance metrics and events directly from the [Prefect Server API](https://docs.prefect.io/v3/api-ref/rest-api/server/index) and supports [log collection](https://docs.datadoghq.com/logs/log_collection.md) for comprehensive monitoring.

### What this integration monitors{% #what-this-integration-monitors %}

The integration collects metrics across multiple layers of the Prefect orchestration hierarchy:

- **Server health**: API readiness and health status to confirm the control plane is operational.
- **Work pool layer**: Pool readiness, paused or not-ready state, and aggregated worker availability to detect capacity or configuration issues.
- **Worker layer**: Online or offline status and heartbeat age to identify lost or unhealthy workers.
- **Work queue layer**: Backlog size, backlog age, last polled age, concurrency utilization, and queue state (ready, paused or not-ready) to detect congestion, starvation, and stalled consumers.
- **Deployment and flow layer**: Flow run counts by state (running, completed, failed, crashed, etc.), throughput, late starts, execution duration, queue wait time, and retry gaps to track reliability and latency percentiles.
- **Task layer**: Task run counts by state, throughput, execution duration, and dependency wait time to enable drilldowns from slow flows to individual task bottlenecks.
- **Events**: Prefect events for state transitions and lifecycle changes.

## Setup{% #setup %}

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the [Autodiscovery Integration Templates](https://docs.datadoghq.com/containers/kubernetes/integrations.md) for guidance on applying these instructions.

### Installation{% #installation %}

The Prefect check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package. No additional installation is needed on your server.

### Configuration{% #configuration %}

1. Edit the `prefect.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory to start collecting your Prefect performance data. See the [sample prefect.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/prefect/datadog_checks/prefect/data/conf.yaml.example) for all available configuration options.

1. [Restart the Agent](https://docs.datadoghq.com/agent/configuration/agent-commands.md#start-stop-and-restart-the-agent).

### Validation{% #validation %}

[Run the Agent's status subcommand](https://docs.datadoghq.com/agent/configuration/agent-commands.md#agent-status-and-information) and look for `prefect` under the Checks section.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **prefect.server.deployment.is\_ready**(gauge)                               | Indicates whether the deployment is ready. Submits 1 when ready and 0 when not.                             |
| **prefect.server.flow\_runs.cancelled.count**(count)                         | Number of flow runs that were cancelled in the collection interval.*Shown as run*                           |
| **prefect.server.flow\_runs.completed.count**(count)                         | Number of flow runs that completed successfully in the collection interval.*Shown as run*                   |
| **prefect.server.flow\_runs.crashed.count**(count)                           | Number of flow runs that crashed unexpectedly in the collection interval.*Shown as run*                     |
| **prefect.server.flow\_runs.execution\_duration.95percentile**(gauge)        | 95th percentile execution time of flow runs.*Shown as second*                                               |
| **prefect.server.flow\_runs.execution\_duration.avg**(gauge)                 | Average execution time of flow runs.*Shown as second*                                                       |
| **prefect.server.flow\_runs.execution\_duration.count**(gauge)               | Count of flow run execution duration samples.*Shown as run*                                                 |
| **prefect.server.flow\_runs.execution\_duration.max**(gauge)                 | Maximum execution time of flow runs.*Shown as second*                                                       |
| **prefect.server.flow\_runs.execution\_duration.median**(gauge)              | Median execution time of flow runs.*Shown as second*                                                        |
| **prefect.server.flow\_runs.failed.count**(count)                            | Number of flow runs that failed in the collection interval.*Shown as run*                                   |
| **prefect.server.flow\_runs.late\_start.count**(count)                       | Number of flow runs that started later than their scheduled time.*Shown as run*                             |
| **prefect.server.flow\_runs.paused**(gauge)                                  | Number of flow runs currently paused.*Shown as run*                                                         |
| **prefect.server.flow\_runs.pending**(gauge)                                 | Number of flow runs currently in the pending state.*Shown as run*                                           |
| **prefect.server.flow\_runs.queue\_wait\_duration.95percentile**(gauge)      | 95th percentile time a flow run spent waiting in the queue before starting.*Shown as second*                |
| **prefect.server.flow\_runs.queue\_wait\_duration.avg**(gauge)               | Average time a flow run spent waiting in the queue before starting.*Shown as second*                        |
| **prefect.server.flow\_runs.queue\_wait\_duration.count**(gauge)             | Count of flow run queue wait duration samples.*Shown as run*                                                |
| **prefect.server.flow\_runs.queue\_wait\_duration.max**(gauge)               | Maximum time a flow run spent waiting in the queue before starting.*Shown as second*                        |
| **prefect.server.flow\_runs.queue\_wait\_duration.median**(gauge)            | Median time a flow run spent waiting in the queue before starting.*Shown as second*                         |
| **prefect.server.flow\_runs.retry\_gaps\_duration.95percentile**(gauge)      | 95th percentile time gap between consecutive retries of the same flow run.*Shown as second*                 |
| **prefect.server.flow\_runs.retry\_gaps\_duration.avg**(gauge)               | Average time gap between consecutive retries of the same flow run.*Shown as second*                         |
| **prefect.server.flow\_runs.retry\_gaps\_duration.count**(gauge)             | Count of flow run retry gap samples.*Shown as occurrence*                                                   |
| **prefect.server.flow\_runs.retry\_gaps\_duration.max**(gauge)               | Maximum time gap between consecutive retries of the same flow run.*Shown as second*                         |
| **prefect.server.flow\_runs.retry\_gaps\_duration.median**(gauge)            | Median time gap between consecutive retries of the same flow run.*Shown as second*                          |
| **prefect.server.flow\_runs.running**(gauge)                                 | Number of flow runs currently in the running state.*Shown as run*                                           |
| **prefect.server.flow\_runs.scheduled**(gauge)                               | Number of flow runs currently in the scheduled state.*Shown as run*                                         |
| **prefect.server.flow\_runs.throughput**(count)                              | Count of flow runs started per second.*Shown as run*                                                        |
| **prefect.server.health**(gauge)                                             | Indicates that the Prefect API is responding to requests. Submits 1 when healthy and 0 when not.            |
| **prefect.server.ready**(gauge)                                              | Indicates that the Prefect API is able to accept and process work. Submits 1 when ready and 0 when not.     |
| **prefect.server.task\_runs.cancelled.count**(count)                         | Number of task runs that were cancelled in the collection interval.*Shown as run*                           |
| **prefect.server.task\_runs.completed.count**(count)                         | Number of task runs that completed successfully in the collection interval.*Shown as run*                   |
| **prefect.server.task\_runs.crashed.count**(count)                           | Number of task runs that crashed unexpectedly in the collection interval.*Shown as run*                     |
| **prefect.server.task\_runs.dependency\_wait\_duration.95percentile**(gauge) | 95th percentile time a task run waited after its latest upstream dependency completed.*Shown as second*     |
| **prefect.server.task\_runs.dependency\_wait\_duration.avg**(gauge)          | Average time a task run waited after its latest upstream dependency completed.*Shown as second*             |
| **prefect.server.task\_runs.dependency\_wait\_duration.count**(gauge)        | Count of task run dependency wait duration samples.*Shown as occurrence*                                    |
| **prefect.server.task\_runs.dependency\_wait\_duration.max**(gauge)          | Maximum time a task run waited after its latest upstream dependency completed.*Shown as second*             |
| **prefect.server.task\_runs.dependency\_wait\_duration.median**(gauge)       | Median time a task run waited after its latest upstream dependency completed.*Shown as second*              |
| **prefect.server.task\_runs.execution\_duration.95percentile**(gauge)        | 95th percentile execution time of individual task runs.*Shown as second*                                    |
| **prefect.server.task\_runs.execution\_duration.avg**(gauge)                 | Average execution time of individual task runs.*Shown as second*                                            |
| **prefect.server.task\_runs.execution\_duration.count**(gauge)               | Count of task run execution duration samples.*Shown as run*                                                 |
| **prefect.server.task\_runs.execution\_duration.max**(gauge)                 | Maximum execution time of individual task runs.*Shown as second*                                            |
| **prefect.server.task\_runs.execution\_duration.median**(gauge)              | Median execution time of individual task runs.*Shown as second*                                             |
| **prefect.server.task\_runs.failed.count**(count)                            | Number of task runs that failed in the collection interval.*Shown as run*                                   |
| **prefect.server.task\_runs.late\_start.count**(count)                       | Number of task runs that started later than their scheduled time.*Shown as run*                             |
| **prefect.server.task\_runs.paused**(gauge)                                  | Number of task runs currently paused.*Shown as run*                                                         |
| **prefect.server.task\_runs.pending**(gauge)                                 | Number of task runs currently in the pending state.*Shown as run*                                           |
| **prefect.server.task\_runs.running**(gauge)                                 | Number of task runs currently in the running state.*Shown as run*                                           |
| **prefect.server.task\_runs.throughput**(count)                              | Count of task runs started per second.*Shown as run*                                                        |
| **prefect.server.work\_pool.is\_not\_ready**(gauge)                          | Whether the work pool is not ready to accept and dispatch flow runs. Submits 1 when true and 0 when false.  |
| **prefect.server.work\_pool.is\_paused**(gauge)                              | Whether the work pool is paused. Submits 1 when true and 0 when false.                                      |
| **prefect.server.work\_pool.is\_ready**(gauge)                               | Whether the work pool is ready to accept and dispatch flow runs. Submits 1 when true and 0 when false.      |
| **prefect.server.work\_pool.worker.heartbeat\_age\_seconds**(gauge)          | Time since the worker last sent a heartbeat.*Shown as second*                                               |
| **prefect.server.work\_pool.worker.is\_online**(gauge)                       | Whether the worker is online. Submits 1 when true and 0 when false.                                         |
| **prefect.server.work\_queue.backlog.age**(gauge)                            | Age of the oldest item in the queue backlog.*Shown as second*                                               |
| **prefect.server.work\_queue.backlog.size**(gauge)                           | Number of flow runs waiting in the queue backlog.*Shown as run*                                             |
| **prefect.server.work\_queue.concurrency.in\_use**(gauge)                    | Percentage of concurrency in use by the queue.*Shown as percent*                                            |
| **prefect.server.work\_queue.is\_not\_ready**(gauge)                         | Whether the work queue is not ready to accept and dispatch flow runs. Submits 1 when true and 0 when false. |
| **prefect.server.work\_queue.is\_paused**(gauge)                             | Whether the work queue is paused. Submits 1 when true and 0 when false.                                     |
| **prefect.server.work\_queue.is\_ready**(gauge)                              | Whether the work queue is ready to accept and dispatch flow runs. Submits 1 when true and 0 when false.     |
| **prefect.server.work\_queue.last\_polled\_age\_seconds**(gauge)             | Time elapsed since any worker last polled the queue.*Shown as second*                                       |

### Logs{% #logs %}

1. Enable log collection in your `datadog.yaml` file:

   ```
   logs_enabled: true
   ```

1. Uncomment and edit the logs configuration block in your `prefect.d/conf.yaml` file. For example:

   ```
   logs:
     - type: docker
       source: prefect
       service: <SERVICE>
   ```

### Events{% #events %}

The Prefect integration includes event support. Events are disabled by default; to enable them, set `collect_events` to `true` in the configuration.

After you enable it, the integration submits flow-run, task-run, and ready or not-ready events. The set of submitted events can be customized by adding or removing entries in the configuration.

### Service Checks{% #service-checks %}

The Prefect integration does not include any service checks.

## Uninstallation{% #uninstallation %}

To disable the integration, rename the configuration file from `prefect.yaml` to `prefect.yaml.example`. Alternatively, if you are running a containerized environment, you can remove the annotation used to enable the integration.

## Support{% #support %}

Need help? Contact [Datadog Support](https://app.datadoghq.com/help).
