litellm.api.key.budget.remaining_hours.metric (gauge) | Remaining hours for api key budget to be reset |
litellm.api.key.max_budget.metric (gauge) | Maximum budget set for api key |
litellm.auth.failed_requests.count (count) | Total failed_requests for auth service |
litellm.auth.latency.bucket (count) | Latency for auth service |
litellm.auth.latency.count (count) | Latency for auth service |
litellm.auth.latency.sum (count) | Latency for auth service |
litellm.auth.total_requests.count (count) | Total total_requests for auth service |
litellm.batch_write_to_db.failed_requests.count (count) | Total failed_requests for batch_write_to_db service |
litellm.batch_write_to_db.latency.bucket (count) | Latency for batch_write_to_db service |
litellm.batch_write_to_db.latency.count (count) | Latency for batch_write_to_db service |
litellm.batch_write_to_db.latency.sum (count) | Latency for batch_write_to_db service |
litellm.batch_write_to_db.total_requests.count (count) | Total total_requests for batch_write_to_db service |
litellm.deployment.cooled_down.count (count) | Number of times a deployment has been cooled down by LiteLLM load balancing logic. exception_status is the status of the exception that caused the deployment to be cooled down |
litellm.deployment.failed_fallbacks.count (count) | Number of failed fallback requests from primary model -> fallback model |
litellm.deployment.failure_by_tag_responses.count (count) | Total number of failed LLM API calls for a specific LLM deploymeny by custom metadata tags |
litellm.deployment.failure_responses.count (count) | Total number of failed LLM API calls for a specific LLM deploymeny. exception_status is the status of the exception from the llm api |
litellm.deployment.latency_per_output_token.bucket (count) | Latency per output token |
litellm.deployment.latency_per_output_token.count (count) | Latency per output token |
litellm.deployment.latency_per_output_token.sum (count) | Latency per output token |
litellm.deployment.state (gauge) | The state of the deployment: 0 = healthy, 1 = partial outage, 2 = complete outage |
litellm.deployment.success_responses.count (count) | Total number of successful LLM API calls via litellm |
litellm.deployment.successful_fallbacks.count (count) | Number of successful fallback requests from primary model -> fallback model |
litellm.deployment.total_requests.count (count) | Total number of LLM API calls via litellm - success + failure |
litellm.endpoint.healthy_count (count) | Number of healthy endpoints |
litellm.endpoint.info (gauge) | LiteLLM Health Endpoint info metric that is tagged by endpoint_health, llm_model, custom_llm_provider, error_type |
litellm.endpoint.unhealthy_count (count) | Number of unhealthy endpoints |
litellm.in_memory.daily_spend_update_queue.size (gauge) | Gauge for in_memory_daily_spend_update_queue service |
litellm.in_memory.spend_update_queue.size (gauge) | Gauge for in_memory_spend_update_queue service |
litellm.input.tokens.count (count) | Total number of input tokens from LLM requests |
litellm.llm.api.failed_requests.metric.count (count) | Deprecated - use litellm.proxy.failed_requests.metric. Total number of failed responses from proxy - the client did not get a success response from litellm proxy |
litellm.llm.api.latency.metric.bucket (count) | Total latency (seconds) for a models LLM API call |
litellm.llm.api.latency.metric.count (count) | Total latency (seconds) for a models LLM API call |
litellm.llm.api.latency.metric.sum (count) | Total latency (seconds) for a models LLM API call |
litellm.llm.api.time_to_first_token.metric.bucket (count) | Time to first token for a models LLM API call |
litellm.llm.api.time_to_first_token.metric.count (count) | Time to first token for a models LLM API call |
litellm.llm.api.time_to_first_token.metric.sum (count) | Time to first token for a models LLM API call |
litellm.output.tokens.count (count) | Total number of output tokens from LLM requests |
litellm.overhead_latency.metric.bucket (count) | Latency overhead (milliseconds) added by LiteLLM processing |
litellm.overhead_latency.metric.count (count) | Latency overhead (milliseconds) added by LiteLLM processing |
litellm.overhead_latency.metric.sum (count) | Latency overhead (milliseconds) added by LiteLLM processing |
litellm.pod_lock_manager.size (gauge) | Gauge for pod_lock_manager service |
litellm.postgres.failed_requests.count (count) | Total failed_requests for postgres service |
litellm.postgres.latency.bucket (count) | Latency for postgres service |
litellm.postgres.latency.count (count) | Latency for postgres service |
litellm.postgres.latency.sum (count) | Latency for postgres service |
litellm.postgres.total_requests.count (count) | Total total_requests for postgres service |
litellm.process.uptime.seconds (gauge) | Start time of the process since unix epoch in seconds. |
litellm.provider.remaining_budget.metric (gauge) | Remaining budget for provider - used when you set provider budget limits |
litellm.proxy.failed_requests.metric.count (count) | Total number of failed responses from proxy - the client did not get a success response from litellm proxy |
litellm.proxy.pre_call.failed_requests.count (count) | Total failed_requests for proxy_pre_call service |
litellm.proxy.pre_call.latency.bucket (count) | Latency for proxy_pre_call service |
litellm.proxy.pre_call.latency.count (count) | Latency for proxy_pre_call service |
litellm.proxy.pre_call.latency.sum (count) | Latency for proxy_pre_call service |
litellm.proxy.pre_call.total_requests.count (count) | Total total_requests for proxy_pre_call service |
litellm.proxy.total_requests.metric.count (count) | Total number of requests made to the proxy server - track number of client side requests |
litellm.redis.daily_spend_update_queue.size (gauge) | Gauge for redis_daily_spend_update_queue service |
litellm.redis.daily_tag_spend_update_queue.failed_requests.count (count) | Total failed_requests for redis_daily_tag_spend_update_queue service |
litellm.redis.daily_tag_spend_update_queue.latency.bucket (count) | Latency for redis_daily_tag_spend_update_queue service |
litellm.redis.daily_tag_spend_update_queue.latency.count (count) | Latency for redis_daily_tag_spend_update_queue service |
litellm.redis.daily_tag_spend_update_queue.latency.sum (count) | Latency for redis_daily_tag_spend_update_queue service |
litellm.redis.daily_tag_spend_update_queue.total_requests.count (count) | Total total_requests for redis_daily_tag_spend_update_queue service |
litellm.redis.daily_team_spend_update_queue.failed_requests.count (count) | Total failed_requests for redis_daily_team_spend_update_queue service |
litellm.redis.daily_team_spend_update_queue.latency.bucket (count) | Latency for redis_daily_team_spend_update_queue service |
litellm.redis.daily_team_spend_update_queue.latency.count (count) | Latency for redis_daily_team_spend_update_queue service |
litellm.redis.daily_team_spend_update_queue.latency.sum (count) | Latency for redis_daily_team_spend_update_queue service |
litellm.redis.daily_team_spend_update_queue.total_requests.count (count) | Total total_requests for redis_daily_team_spend_update_queue service |
litellm.redis.failed_requests.count (count) | Total failed_requests for redis service |
litellm.redis.latency.bucket (count) | Latency for redis service |
litellm.redis.spend_update_queue.size (gauge) | Gauge for redis_spend_update_queue service |
litellm.redis.total_requests.count (count) | Total total_requests for redis service |
litellm.remaining.api_key.budget.metric (gauge) | Remaining budget for api key |
litellm.remaining.api_key.requests_for_model (gauge) | Remaining Requests API Key can make for model (model based rpm limit on key) |
litellm.remaining.api_key.tokens_for_model (gauge) | Remaining Tokens API Key can make for model (model based tpm limit on key) |
litellm.remaining.requests (gauge) | remaining requests for model,returned from LLM API Provider |
litellm.remaining.team_budget.metric (gauge) | Remaining budget for team |
litellm.remaining_tokens (gauge) | remaining tokens for model,returned from LLM API Provider |
litellm.request.total_latency.metric.bucket (count) | Total latency (seconds) for a request to LiteLLM |
litellm.request.total_latency.metric.count (count) | Total latency (seconds) for a request to LiteLLM |
litellm.request.total_latency.metric.sum (count) | Total latency (seconds) for a request to LiteLLM |
litellm.requests.metric.count (count) | Deprecated - use litellm.proxy.total_requests.metric.count. Total number of LLM calls to litellm - track total per API Key, team, user |
litellm.reset_budget_job.failed_requests.count (count) | Total failed_requests for reset_budget_job service |
litellm.reset_budget_job.latency.bucket (count) | Latency for reset_budget_job service |
litellm.reset_budget_job.total_requests.count (count) | Total total_requests for reset_budget_job service |
litellm.router.failed_requests.count (count) | Total failed_requests for router service |
litellm.router.latency.bucket (count) | Latency for router service |
litellm.router.latency.count (count) | Latency for router service |
litellm.router.latency.sum (count) | Latency for router service |
litellm.router.total_requests.count (count) | Total total_requests for router service |
litellm.self.failed_requests.count (count) | Total failed_requests for self service |
litellm.self.latency.bucket (count) | Latency for self service |
litellm.self.latency.count (count) | Latency for self service |
litellm.self.latency.sum (count) | Latency for self service |
litellm.self.total_requests.count (count) | Total total_requests for self service |
litellm.spend.metric.count (count) | Total spend on LLM requests |
litellm.team.budget.remaining_hours.metric (gauge) | Remaining days for team budget to be reset |
litellm.team.max_budget.metric (gauge) | Maximum budget set for team |
litellm.total.tokens.count (count) | Total number of input + output tokens from LLM requests |