| vllm.avg.generation_throughput.toks_per_s (gauge)
 | Average generation throughput in tokens/s | 
| vllm.avg.prompt.throughput.toks_per_s (gauge)
 | Average prefill throughput in tokens/s | 
| vllm.cache_config_info (gauge)
 | Information on cache config | 
| vllm.cpu_cache_usage_perc (gauge)
 | CPU KV-cache usage. 1 means 100 percent usage Shown as percent
 | 
| vllm.e2e_request_latency.seconds.bucket (count)
 | The observations of end to end request latency bucketed by seconds. | 
| vllm.e2e_request_latency.seconds.count (count)
 | The total number of observations of end to end request latency. | 
| vllm.e2e_request_latency.seconds.sum (count)
 | The sum of end to end request latency in seconds. Shown as second
 | 
| vllm.generation_tokens.count (count)
 | Number of generation tokens processed. | 
| vllm.gpu_cache_usage_perc (gauge)
 | GPU KV-cache usage. 1 means 100 percent usage Shown as percent
 | 
| vllm.num_preemptions.count (count)
 | Cumulative number of preemption from the engine. | 
| vllm.num_requests.running (gauge)
 | Number of requests currently running on GPU. | 
| vllm.num_requests.swapped (gauge)
 | Number of requests swapped to CPU. | 
| vllm.num_requests.waiting (gauge)
 | Number of requests waiting. | 
| vllm.process.cpu_seconds.count (count)
 | Total user and system CPU time spent in seconds. Shown as second
 | 
| vllm.process.max_fds (gauge)
 | Maximum number of open file descriptors. Shown as file
 | 
| vllm.process.open_fds (gauge)
 | Number of open file descriptors. Shown as file
 | 
| vllm.process.resident_memory_bytes (gauge)
 | Resident memory size in bytes. Shown as byte
 | 
| vllm.process.start_time_seconds (gauge)
 | Start time of the process since unix epoch in seconds. Shown as second
 | 
| vllm.process.virtual_memory_bytes (gauge)
 | Virtual memory size in bytes. Shown as byte
 | 
| vllm.prompt_tokens.count (count)
 | Number of prefill tokens processed. | 
| vllm.python.gc.collections.count (count)
 | Number of times this generation was collected | 
| vllm.python.gc.objects.collected.count (count)
 | Objects collected during gc | 
| vllm.python.gc.objects.uncollectable.count (count)
 | Uncollectable objects found during GC | 
| vllm.python.info (gauge)
 | Python platform information | 
| vllm.request.generation_tokens.bucket (count)
 | Number of generation tokens processed. | 
| vllm.request.generation_tokens.count (count)
 | Number of generation tokens processed. | 
| vllm.request.generation_tokens.sum (count)
 | Number of generation tokens processed. | 
| vllm.request.params.best_of.bucket (count)
 | Histogram of the best_of request parameter. | 
| vllm.request.params.best_of.count (count)
 | Histogram of the best_of request parameter. | 
| vllm.request.params.best_of.sum (count)
 | Histogram of the best_of request parameter. | 
| vllm.request.params.n.bucket (count)
 | Histogram of the n request parameter. | 
| vllm.request.params.n.count (count)
 | Histogram of the n request parameter. | 
| vllm.request.params.n.sum (count)
 | Histogram of the n request parameter. | 
| vllm.request.prompt_tokens.bucket (count)
 | Number of prefill tokens processed. | 
| vllm.request.prompt_tokens.count (count)
 | Number of prefill tokens processed. | 
| vllm.request.prompt_tokens.sum (count)
 | Number of prefill tokens processed. | 
| vllm.request.success.count (count)
 | Count of successfully processed requests. | 
| vllm.time_per_output_token.seconds.bucket (count)
 | The observations of time per output token bucketed by seconds. | 
| vllm.time_per_output_token.seconds.count (count)
 | The total number of observations of time per output token. | 
| vllm.time_per_output_token.seconds.sum (count)
 | The sum of time per output token in seconds. Shown as second
 | 
| vllm.time_to_first_token.seconds.bucket (count)
 | The observations of time to first token bucketed by seconds. | 
| vllm.time_to_first_token.seconds.count (count)
 | The total number of observations of time to first token. | 
| vllm.time_to_first_token.seconds.sum (count)
 | The sum of time to first token in seconds. Shown as second
 |