---
title: Nvidia NIM
description: >-
  NVIDIA NIM integration with Datadog enables real-time GPU observability by
  collecting Prometheus metrics for monitoring.
breadcrumbs: Docs > Integrations > Nvidia NIM
---

# Nvidia NIM
Supported OS Integration version2.4.1
## Overview{% #overview %}

This check monitors [NVIDIA NIM](https://docs.nvidia.com/nim/large-language-models/latest/observability.html) through the Datadog Agent.

**Minimum Agent version:** 7.61.0

## Setup{% #setup %}

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the [Autodiscovery Integration Templates](https://docs.datadoghq.com/agent/kubernetes/integrations.md) for guidance on applying these instructions.

**Requirements**:

- This check requires Agent v7.61.0+
- This check uses [OpenMetrics](https://docs.datadoghq.com/integrations/openmetrics.md) for metric collection, which requires Python 3.

`### Installation The NVIDIA NIM check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package. No additional installation is needed on your server.

#### LLM Observability: Get end-to-end visibility into your LLM application's calls to NVIDIA Nim{% #llm-observability-get-end-to-end-visibility-into-your-llm-applications-calls-to-nvidia-nim %}

NVIDIA NIM uses the OpenAI client to handle API calls from [NVIDIA NIM](https://www.nvidia.com/en-us/ai/). To monitor your application using NVIDIA NIM and set up LLM Observability, follow the instructions in the [OpenAI integration](https://docs.datadoghq.com/integrations/openai.md) documentation. `

### Configuration{% #configuration %}

NVIDIA NIM provides Prometheus [metrics](https://docs.nvidia.com/nim/large-language-models/latest/observability.html) indicating request statistics. By default, these metrics are available at http://localhost:8000/metrics. The Datadog Agent can collect the exposed metrics using this integration. Follow the instructions below to configure data collection from any or all of the components.

To start collecting your NVIDIA NIM performance data:

1. Edit the `nvidia_nim.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory to start collecting your NVIDIA NIM performance data. See the [sample nvidia_nim.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/nvidia_nim/datadog_checks/nvidia_nim/data/conf.yaml.example) for all available configuration options.

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands.md#start-stop-and-restart-the-agent).

### Validation{% #validation %}

[Run the Agent's status subcommand](https://docs.datadoghq.com/agent/guide/agent-commands.md#agent-status-and-information) and look for `nvidia_nim` under the Checks section.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **nvidia\_nim.e2e\_request\_latency.seconds.bucket**(count)    | The observations of end to end request latency bucketed by seconds. |
| **nvidia\_nim.e2e\_request\_latency.seconds.count**(count)     | The total number of observations of end to end request latency.     |
| **nvidia\_nim.e2e\_request\_latency.seconds.sum**(count)       | The sum of end to end request latency in seconds.*Shown as second*  |
| **nvidia\_nim.generation\_tokens.count**(count)                | Number of generation tokens processed.*Shown as token*              |
| **nvidia\_nim.gpu\_cache\_usage\_percent**(gauge)              | GPU KV-cache usage. 1 means 100 percent usage.*Shown as fraction*   |
| **nvidia\_nim.num\_request.max**(gauge)                        | The max number of concurrently running requests.*Shown as request*  |
| **nvidia\_nim.num\_requests.running**(gauge)                   | Number of requests currently running on GPU.*Shown as request*      |
| **nvidia\_nim.num\_requests.waiting**(gauge)                   | Number of requests waiting.*Shown as request*                       |
| **nvidia\_nim.process.cpu\_seconds.count**(count)              | Total user and system CPU time spent in seconds.*Shown as second*   |
| **nvidia\_nim.process.max\_fds**(gauge)                        | Maximum number of open file descriptors.*Shown as file*             |
| **nvidia\_nim.process.open\_fds**(gauge)                       | Number of open file descriptors.*Shown as file*                     |
| **nvidia\_nim.process.resident\_memory\_bytes**(gauge)         | Resident memory size in bytes.*Shown as byte*                       |
| **nvidia\_nim.process.start\_time\_seconds**(gauge)            | Time in seconds since process started.*Shown as second*             |
| **nvidia\_nim.process.virtual\_memory\_bytes**(gauge)          | Virtual memory size in bytes.*Shown as byte*                        |
| **nvidia\_nim.prompt\_tokens.count**(count)                    | Number of prefill tokens processed.*Shown as token*                 |
| **nvidia\_nim.python.gc.collections.count**(count)             | Number of times this generation was collected.                      |
| **nvidia\_nim.python.gc.objects.collected.count**(count)       | Objects collected during GC.                                        |
| **nvidia\_nim.python.gc.objects.uncollectable.count**(count)   | Uncollectable objects found during GC.                              |
| **nvidia\_nim.python.info**(gauge)                             | Python platform information.                                        |
| **nvidia\_nim.request.failure.count**(count)                   | The count of failed requests.*Shown as request*                     |
| **nvidia\_nim.request.finish.count**(count)                    | The count of finished requests.*Shown as request*                   |
| **nvidia\_nim.request.generation\_tokens.bucket**(count)       | Number of generation tokens processed.                              |
| **nvidia\_nim.request.generation\_tokens.count**(count)        | Number of generation tokens processed.                              |
| **nvidia\_nim.request.generation\_tokens.sum**(count)          | Number of generation tokens processed.*Shown as token*              |
| **nvidia\_nim.request.prompt\_tokens.bucket**(count)           | Number of prefill tokens processed.                                 |
| **nvidia\_nim.request.prompt\_tokens.count**(count)            | Number of prefill tokens processed.                                 |
| **nvidia\_nim.request.prompt\_tokens.sum**(count)              | Number of prefill tokens processed.*Shown as token*                 |
| **nvidia\_nim.request.success.count**(count)                   | Count of successfully processed requests.                           |
| **nvidia\_nim.time\_per\_output\_token.seconds.bucket**(count) | The observations of time per output token bucketed by seconds.      |
| **nvidia\_nim.time\_per\_output\_token.seconds.count**(count)  | The total number of observations of time per output token.          |
| **nvidia\_nim.time\_per\_output\_token.seconds.sum**(count)    | The sum of time per output token in seconds.*Shown as second*       |
| **nvidia\_nim.time\_to\_first\_token.seconds.bucket**(count)   | The observations of time to first token bucketed by seconds.        |
| **nvidia\_nim.time\_to\_first\_token.seconds.count**(count)    | The total number of observations of time to first token.            |
| **nvidia\_nim.time\_to\_first\_token.seconds.sum**(count)      | The sum of time to first token in seconds.*Shown as second*         |

### Events{% #events %}

The NVIDIA NIM integration does not include any events.

### Service Checks{% #service-checks %}

**nvidia\_nim.openmetrics.health**

Returns `CRITICAL` if the Agent is unable to connect to the NVIDIA NIM OpenMetrics endpoint, otherwise returns `OK`.

*Statuses: ok, critical*

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).
