---
title: vLLM
description: vLLM is a library for LLM inference and serving
breadcrumbs: Docs > Integrations > vLLM
---

> For the complete documentation index, see [llms.txt](https://docs.datadoghq.com/llms.txt).

# vLLM
Supported OS Integration version3.4.1
{% callout %}
# Important note for users on the following Datadog sites: us2.ddog-gov.com

{% alert level="info" %}
To find out if this integration is available in your organization, see your [Datadog Integrations](https://app.datadoghq.com/integrations) page or ask your organization administrator.

To initiate an exception request to enable this integration for your organization, email [support@ddog-gov.com](mailto:support@ddog-gov.com).
{% /alert %}

{% /callout %}

## Overview{% #overview %}

This check monitors [vLLM](https://docs.vllm.ai/en/stable/) through the Datadog Agent.

**Minimum Agent version:** 7.56.0

## Setup{% #setup %}

Follow the instructions below to install and configure this check for an Agent running on a host.

### Installation{% #installation %}

The vLLM check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package. No additional installation is needed on your server.

### Configuration{% #configuration %}

1. Edit the `vllm.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory to start collecting your vllm performance data. See the [sample vllm.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/vllm/datadog_checks/vllm/data/conf.yaml.example) for all available configuration options.

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands.md#start-stop-and-restart-the-agent).

### Validation{% #validation %}

[Run the Agent's status subcommand](https://docs.datadoghq.com/agent/guide/agent-commands.md#agent-status-and-information) and look for `vllm` under the Checks section.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **vllm.avg.generation\_throughput.toks\_per\_s**(gauge) | Average generation throughput in tokens/s                               |
| **vllm.avg.prompt.throughput.toks\_per\_s**(gauge)      | Average prefill throughput in tokens/s                                  |
| **vllm.cache\_config\_info**(gauge)                     | Information on cache config                                             |
| **vllm.cpu\_cache\_usage\_perc**(gauge)                 | CPU KV-cache usage. 1 means 100 percent usage*Shown as percent*         |
| **vllm.e2e\_request\_latency.seconds.bucket**(count)    | The observations of end to end request latency bucketed by seconds.     |
| **vllm.e2e\_request\_latency.seconds.count**(count)     | The total number of observations of end to end request latency.         |
| **vllm.e2e\_request\_latency.seconds.sum**(count)       | The sum of end to end request latency in seconds.*Shown as second*      |
| **vllm.generation\_tokens.count**(count)                | Number of generation tokens processed.                                  |
| **vllm.gpu\_cache\_usage\_perc**(gauge)                 | GPU KV-cache usage. 1 means 100 percent usage*Shown as percent*         |
| **vllm.num\_preemptions.count**(count)                  | Cumulative number of preemption from the engine.                        |
| **vllm.num\_requests.running**(gauge)                   | Number of requests currently running on GPU.                            |
| **vllm.num\_requests.swapped**(gauge)                   | Number of requests swapped to CPU.                                      |
| **vllm.num\_requests.waiting**(gauge)                   | Number of requests waiting.                                             |
| **vllm.process.cpu\_seconds.count**(count)              | Total user and system CPU time spent in seconds.*Shown as second*       |
| **vllm.process.max\_fds**(gauge)                        | Maximum number of open file descriptors.*Shown as file*                 |
| **vllm.process.open\_fds**(gauge)                       | Number of open file descriptors.*Shown as file*                         |
| **vllm.process.resident\_memory\_bytes**(gauge)         | Resident memory size in bytes.*Shown as byte*                           |
| **vllm.process.start\_time\_seconds**(gauge)            | Start time of the process since unix epoch in seconds.*Shown as second* |
| **vllm.process.virtual\_memory\_bytes**(gauge)          | Virtual memory size in bytes.*Shown as byte*                            |
| **vllm.prompt\_tokens.count**(count)                    | Number of prefill tokens processed.                                     |
| **vllm.python.gc.collections.count**(count)             | Number of times this generation was collected                           |
| **vllm.python.gc.objects.collected.count**(count)       | Objects collected during gc                                             |
| **vllm.python.gc.objects.uncollectable.count**(count)   | Uncollectable objects found during GC                                   |
| **vllm.python.info**(gauge)                             | Python platform information                                             |
| **vllm.request.generation\_tokens.bucket**(count)       | Number of generation tokens processed.                                  |
| **vllm.request.generation\_tokens.count**(count)        | Number of generation tokens processed.                                  |
| **vllm.request.generation\_tokens.sum**(count)          | Number of generation tokens processed.                                  |
| **vllm.request.params.best\_of.bucket**(count)          | Histogram of the best_of request parameter.                             |
| **vllm.request.params.best\_of.count**(count)           | Histogram of the best_of request parameter.                             |
| **vllm.request.params.best\_of.sum**(count)             | Histogram of the best_of request parameter.                             |
| **vllm.request.params.n.bucket**(count)                 | Histogram of the n request parameter.                                   |
| **vllm.request.params.n.count**(count)                  | Histogram of the n request parameter.                                   |
| **vllm.request.params.n.sum**(count)                    | Histogram of the n request parameter.                                   |
| **vllm.request.prompt\_tokens.bucket**(count)           | Number of prefill tokens processed.                                     |
| **vllm.request.prompt\_tokens.count**(count)            | Number of prefill tokens processed.                                     |
| **vllm.request.prompt\_tokens.sum**(count)              | Number of prefill tokens processed.                                     |
| **vllm.request.success.count**(count)                   | Count of successfully processed requests.                               |
| **vllm.time\_per\_output\_token.seconds.bucket**(count) | The observations of time per output token bucketed by seconds.          |
| **vllm.time\_per\_output\_token.seconds.count**(count)  | The total number of observations of time per output token.              |
| **vllm.time\_per\_output\_token.seconds.sum**(count)    | The sum of time per output token in seconds.*Shown as second*           |
| **vllm.time\_to\_first\_token.seconds.bucket**(count)   | The observations of time to first token bucketed by seconds.            |
| **vllm.time\_to\_first\_token.seconds.count**(count)    | The total number of observations of time to first token.                |
| **vllm.time\_to\_first\_token.seconds.sum**(count)      | The sum of time to first token in seconds.*Shown as second*             |

### Events{% #events %}

The vLLM integration does not include any events.

### Service Checks{% #service-checks %}

The vLLM integration does not include any service checks.

**vllm.openmetrics.health**

Returns `CRITICAL` if the Agent is unable to connect to the vLLM OpenMetrics endpoint, otherwise returns `OK`.

*Statuses: ok, critical*

### Logs{% #logs %}

Log collection is disabled by default in the Datadog Agent. If you are running your Agent as a container, see [container installation](https://docs.datadoghq.com/containers/docker/log.md?tab=containerinstallation#installation) to enable log collection. If you are running a host Agent, see [host Agent](https://docs.datadoghq.com/containers/docker/log.md?tab=hostagent#installation) instead. In either case, make sure that the `source` value for your logs is `vllm`. This setting ensures that the built-in processing pipeline finds your logs. To set your log configuration for a container, see [log integrations](https://docs.datadoghq.com/containers/docker/log.md?tab=dockerfile#log-integrations).

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).

## Further Reading{% #further-reading %}

- [Optimize LLM application performance with Datadog's vLLM integration](https://www.datadoghq.com/blog/vllm-integration/)