Nvidia NIM

Supported OS Linux Windows Mac OS

Integration version1.0.0

Overview

This check monitors NVIDIA NIM through the Datadog Agent.

Setup

This integration is currently in Preview. Its availability is subject to change in the future.

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.

Requirements:

  • This check requires Agent v7.61.0+
  • This check uses OpenMetrics for metric collection, which requires Python 3.

Installation

The NVIDIA NIM check is included in the Datadog Agent package. No additional installation is needed on your server.

Configuration

NVIDIA NIM provides Prometheus metrics indicating request statistics. By default, these metrics are available at http://localhost:8000/metrics. The Datadog Agent can collect the exposed metrics using this integration. Follow the instructions below to configure data collection from any or all of the components.

To start collecting your NVIDIA NIM performance data:

  1. Edit the nvidia_nim.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your NVIDIA NIM performance data. See the sample nvidia_nim.d/conf.yaml for all available configuration options.

  2. Restart the Agent.

Validation

Run the Agent’s status subcommand and look for nvidia_nim under the Checks section.

Data Collected

Metrics

nvidia_nim.e2e_request_latency.seconds.bucket
(count)
The observations of end to end request latency bucketed by seconds.
nvidia_nim.e2e_request_latency.seconds.count
(count)
The total number of observations of end to end request latency.
nvidia_nim.e2e_request_latency.seconds.sum
(count)
The sum of end to end request latency in seconds.
Shown as second
nvidia_nim.generation_tokens.count
(count)
Number of generation tokens processed.
Shown as token
nvidia_nim.gpu_cache_usage_percent
(gauge)
GPU KV-cache usage. 1 means 100 percent usage.
Shown as fraction
nvidia_nim.num_request.max
(gauge)
The max number of concurrently running requests.
Shown as request
nvidia_nim.num_requests.running
(gauge)
Number of requests currently running on GPU.
Shown as request
nvidia_nim.num_requests.waiting
(gauge)
Number of requests waiting.
Shown as request
nvidia_nim.process.cpu_seconds.count
(count)
Total user and system CPU time spent in seconds.
Shown as second
nvidia_nim.process.max_fds
(gauge)
Maximum number of open file descriptors.
Shown as file
nvidia_nim.process.open_fds
(gauge)
Number of open file descriptors.
Shown as file
nvidia_nim.process.resident_memory_bytes
(gauge)
Resident memory size in bytes.
Shown as byte
nvidia_nim.process.start_time_seconds
(gauge)
Time in seconds since process started.
Shown as second
nvidia_nim.process.virtual_memory_bytes
(gauge)
Virtual memory size in bytes.
Shown as byte
nvidia_nim.prompt_tokens.count
(count)
Number of prefill tokens processed.
Shown as token
nvidia_nim.python.gc.collections.count
(count)
Number of times this generation was collected.
nvidia_nim.python.gc.objects.collected.count
(count)
Objects collected during GC.
nvidia_nim.python.gc.objects.uncollectable.count
(count)
Uncollectable objects found during GC.
nvidia_nim.python.info
(gauge)
Python platform information.
nvidia_nim.request.failure.count
(count)
The count of failed requests.
Shown as request
nvidia_nim.request.finish.count
(count)
The count of finished requests.
Shown as request
nvidia_nim.request.generation_tokens.bucket
(count)
Number of generation tokens processed.
nvidia_nim.request.generation_tokens.count
(count)
Number of generation tokens processed.
nvidia_nim.request.generation_tokens.sum
(count)
Number of generation tokens processed.
Shown as token
nvidia_nim.request.prompt_tokens.bucket
(count)
Number of prefill tokens processed.
nvidia_nim.request.prompt_tokens.count
(count)
Number of prefill tokens processed.
nvidia_nim.request.prompt_tokens.sum
(count)
Number of prefill tokens processed.
Shown as token
nvidia_nim.request.success.count
(count)
Count of successfully processed requests.
nvidia_nim.time_per_output_token.seconds.bucket
(count)
The observations of time per output token bucketed by seconds.
nvidia_nim.time_per_output_token.seconds.count
(count)
The total number of observations of time per output token.
nvidia_nim.time_per_output_token.seconds.sum
(count)
The sum of time per output token in seconds.
Shown as second
nvidia_nim.time_to_first_token.seconds.bucket
(count)
The observations of time to first token bucketed by seconds.
nvidia_nim.time_to_first_token.seconds.count
(count)
The total number of observations of time to first token.
nvidia_nim.time_to_first_token.seconds.sum
(count)
The sum of time to first token in seconds.
Shown as second

Events

The NVIDIA NIM integration does not include any events.

Service Checks

nvidia_nim.openmetrics.health
Returns CRITICAL if the Agent is unable to connect to the NVIDIA NIM OpenMetrics endpoint, otherwise returns OK.
Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog support.