nvidia_nim.e2e_request_latency.seconds.bucket (count) | The observations of end to end request latency bucketed by seconds. |
nvidia_nim.e2e_request_latency.seconds.count (count) | The total number of observations of end to end request latency. |
nvidia_nim.e2e_request_latency.seconds.sum (count) | The sum of end to end request latency in seconds. Shown as second |
nvidia_nim.generation_tokens.count (count) | Number of generation tokens processed. Shown as token |
nvidia_nim.gpu_cache_usage_percent (gauge) | GPU KV-cache usage. 1 means 100 percent usage. Shown as fraction |
nvidia_nim.num_request.max (gauge) | The max number of concurrently running requests. Shown as request |
nvidia_nim.num_requests.running (gauge) | Number of requests currently running on GPU. Shown as request |
nvidia_nim.num_requests.waiting (gauge) | Number of requests waiting. Shown as request |
nvidia_nim.process.cpu_seconds.count (count) | Total user and system CPU time spent in seconds. Shown as second |
nvidia_nim.process.max_fds (gauge) | Maximum number of open file descriptors. Shown as file |
nvidia_nim.process.open_fds (gauge) | Number of open file descriptors. Shown as file |
nvidia_nim.process.resident_memory_bytes (gauge) | Resident memory size in bytes. Shown as byte |
nvidia_nim.process.start_time_seconds (gauge) | Time in seconds since process started. Shown as second |
nvidia_nim.process.virtual_memory_bytes (gauge) | Virtual memory size in bytes. Shown as byte |
nvidia_nim.prompt_tokens.count (count) | Number of prefill tokens processed. Shown as token |
nvidia_nim.python.gc.collections.count (count) | Number of times this generation was collected. |
nvidia_nim.python.gc.objects.collected.count (count) | Objects collected during GC. |
nvidia_nim.python.gc.objects.uncollectable.count (count) | Uncollectable objects found during GC. |
nvidia_nim.python.info (gauge) | Python platform information. |
nvidia_nim.request.failure.count (count) | The count of failed requests. Shown as request |
nvidia_nim.request.finish.count (count) | The count of finished requests. Shown as request |
nvidia_nim.request.generation_tokens.bucket (count) | Number of generation tokens processed. |
nvidia_nim.request.generation_tokens.count (count) | Number of generation tokens processed. |
nvidia_nim.request.generation_tokens.sum (count) | Number of generation tokens processed. Shown as token |
nvidia_nim.request.prompt_tokens.bucket (count) | Number of prefill tokens processed. |
nvidia_nim.request.prompt_tokens.count (count) | Number of prefill tokens processed. |
nvidia_nim.request.prompt_tokens.sum (count) | Number of prefill tokens processed. Shown as token |
nvidia_nim.request.success.count (count) | Count of successfully processed requests. |
nvidia_nim.time_per_output_token.seconds.bucket (count) | The observations of time per output token bucketed by seconds. |
nvidia_nim.time_per_output_token.seconds.count (count) | The total number of observations of time per output token. |
nvidia_nim.time_per_output_token.seconds.sum (count) | The sum of time per output token in seconds. Shown as second |
nvidia_nim.time_to_first_token.seconds.bucket (count) | The observations of time to first token bucketed by seconds. |
nvidia_nim.time_to_first_token.seconds.count (count) | The total number of observations of time to first token. |
nvidia_nim.time_to_first_token.seconds.sum (count) | The sum of time to first token in seconds. Shown as second |