litellm.api.key.budget.remaining_hours.metric (gauge) | Remaining hours for api key budget to be reset Shown as hour |
litellm.api.key.max_budget.metric (gauge) | Maximum budget set for api key |
litellm.auth.failed_requests.count (count) | Number of failed requests for auth service in the time period Shown as error |
litellm.auth.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for auth service |
litellm.auth.latency.count (count) | Number of latency observations for auth service in the time period |
litellm.auth.latency.sum (count) | Latency for auth service Shown as millisecond |
litellm.auth.total_requests.count (count) | Number of requests for auth service in the time period Shown as request |
litellm.batch_write_to_db.failed_requests.count (count) | Number of failed requests for batch_write_to_db service in the time period Shown as error |
litellm.batch_write_to_db.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for batch_write_to_db service |
litellm.batch_write_to_db.latency.count (count) | Number of latency observations for batch_write_to_db service in the time period |
litellm.batch_write_to_db.latency.sum (count) | Latency for batch_write_to_db service Shown as millisecond |
litellm.batch_write_to_db.total_requests.count (count) | Number of requests for batch_write_to_db service in the time period Shown as request |
litellm.deployment.cooled_down.count (count) | Number of times a deployment has been cooled down by LiteLLM load balancing logic in the time period. exception_status is the status of the exception that caused the deployment to be cooled down Shown as event |
litellm.deployment.failed_fallbacks.count (count) | Number of failed fallback requests from primary model -> fallback model in the time period Shown as error |
litellm.deployment.failure_by_tag_responses.count (count) | Number of failed LLM API calls for a specific LLM deployment by custom metadata tags in the time period Shown as error |
litellm.deployment.failure_responses.count (count) | Number of failed LLM API calls for a specific LLM deployment in the time period. exception_status is the status of the exception from the LLM API Shown as error |
litellm.deployment.latency_per_output_token.bucket (count) | Number of observations that fall into each upper_bound latency per output token bucket for deployment |
litellm.deployment.latency_per_output_token.count (count) | Number of latency per output token observations for deployment in the time period |
litellm.deployment.latency_per_output_token.sum (count) | Latency per output token Shown as millisecond |
litellm.deployment.state (gauge) | The state of the deployment: 0 = healthy,1 = partial outage,2 = complete outage Shown as unit |
litellm.deployment.success_responses.count (count) | Number of successful LLM API calls via litellm in the time period Shown as response |
litellm.deployment.successful_fallbacks.count (count) | Number of successful fallback requests from primary model -> fallback model in the time period Shown as response |
litellm.deployment.total_requests.count (count) | Number of LLM API calls via litellm in the time period - success + failure Shown as request |
litellm.in_memory.daily_spend_update_queue.size (gauge) | Gauge for in_memory_daily_spend_update_queue service Shown as item |
litellm.in_memory.spend_update_queue.size (gauge) | Gauge for in_memory_spend_update_queue service Shown as item |
litellm.input.tokens.count (count) | Number of input tokens from LLM requests in the time period Shown as token |
litellm.llm.api.failed_requests.metric.count (count) | Deprecated - use litellm.proxy.failed_requests.metric. Number of failed responses from proxy in the time period - the client did not get a success response from litellm proxy Shown as error |
litellm.llm.api.latency.metric.bucket (count) | Number of observations that fall into each upper_bound latency bucket (seconds) for a model’s LLM API call |
litellm.llm.api.latency.metric.count (count) | Number of latency observations (seconds) for a model’s LLM API call in the time period |
litellm.llm.api.latency.metric.sum (count) | Total latency (seconds) for a model’s LLM API call Shown as second |
litellm.llm.api.time_to_first_token.metric.bucket (count) | Number of observations that fall into each upper_bound time to first token bucket for a model’s LLM API call |
litellm.llm.api.time_to_first_token.metric.count (count) | Number of time to first token observations for a model’s LLM API call in the time period |
litellm.llm.api.time_to_first_token.metric.sum (count) | Time to first token for a model’s LLM API call Shown as second |
litellm.output.tokens.count (count) | Number of output tokens from LLM requests in the time period Shown as token |
litellm.overhead_latency.metric.bucket (count) | Number of observations that fall into each upper_bound overhead latency bucket (milliseconds) added by LiteLLM processing |
litellm.overhead_latency.metric.count (count) | Number of overhead latency observations (milliseconds) added by LiteLLM processing in the time period |
litellm.overhead_latency.metric.sum (count) | Latency overhead (milliseconds) added by LiteLLM processing Shown as millisecond |
litellm.pod_lock_manager.size (gauge) | Gauge for pod_lock_manager service Shown as item |
litellm.postgres.failed_requests.count (count) | Number of failed requests for Postgres service in the time period Shown as error |
litellm.postgres.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for Postgres service |
litellm.postgres.latency.count (count) | Number of latency observations for Postgres service in the time period |
litellm.postgres.latency.sum (count) | Latency for Postgres service Shown as millisecond |
litellm.postgres.total_requests.count (count) | Number of requests for Postgres service in the time period Shown as request |
litellm.process.uptime.seconds (gauge) | Start time of the process since unix epoch in seconds. Shown as second |
litellm.provider.remaining_budget.metric (gauge) | Remaining budget for provider - used when you set provider budget limits |
litellm.proxy.failed_requests.metric.count (count) | Number of failed responses from proxy in the time period - the client did not get a success response from litellm proxy Shown as error |
litellm.proxy.pre_call.failed_requests.count (count) | Number of failed requests for proxy_pre_call service in the time period Shown as error |
litellm.proxy.pre_call.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for proxy_pre_call service |
litellm.proxy.pre_call.latency.count (count) | Number of latency observations for proxy_pre_call service in the time period |
litellm.proxy.pre_call.latency.sum (count) | Latency for proxy_pre_call service Shown as millisecond |
litellm.proxy.pre_call.total_requests.count (count) | Number of requests for proxy_pre_call service in the time period Shown as request |
litellm.proxy.total_requests.metric.count (count) | Number of requests made to the proxy server in the time period - track number of client side requests Shown as request |
litellm.redis.daily_spend_update_queue.size (gauge) | Gauge for redis_daily_spend_update_queue service Shown as item |
litellm.redis.daily_tag_spend_update_queue.failed_requests.count (count) | Number of failed requests for redis_daily_tag_spend_update_queue service in the time period Shown as error |
litellm.redis.daily_tag_spend_update_queue.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for redis_daily_tag_spend_update_queue service |
litellm.redis.daily_tag_spend_update_queue.latency.count (count) | Number of latency observations for redis_daily_tag_spend_update_queue service in the time period |
litellm.redis.daily_tag_spend_update_queue.latency.sum (count) | Latency for redis_daily_tag_spend_update_queue service Shown as millisecond |
litellm.redis.daily_tag_spend_update_queue.total_requests.count (count) | Number of requests for redis_daily_tag_spend_update_queue service in the time period Shown as request |
litellm.redis.daily_team_spend_update_queue.failed_requests.count (count) | Number of failed requests for redis_daily_team_spend_update_queue service in the time period Shown as error |
litellm.redis.daily_team_spend_update_queue.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for redis_daily_team_spend_update_queue service |
litellm.redis.daily_team_spend_update_queue.latency.count (count) | Number of latency observations for redis_daily_team_spend_update_queue service in the time period |
litellm.redis.daily_team_spend_update_queue.latency.sum (count) | Latency for redis_daily_team_spend_update_queue service Shown as millisecond |
litellm.redis.daily_team_spend_update_queue.total_requests.count (count) | Number of requests for redis_daily_team_spend_update_queue service in the time period Shown as request |
litellm.redis.failed_requests.count (count) | Number of failed requests for Redis service in the time period Shown as error |
litellm.redis.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for Redis service |
litellm.redis.latency.count (count) | Number of latency observations for Redis service in the time period |
litellm.redis.latency.sum (count) | Total latency (milliseconds) for Redis service Shown as millisecond |
litellm.redis.spend_update_queue.size (gauge) | Gauge for redis_spend_update_queue service Shown as item |
litellm.redis.total_requests.count (count) | Number of requests for Redis service in the time period Shown as request |
litellm.remaining.api_key.budget.metric (gauge) | Remaining budget for api key |
litellm.remaining.api_key.requests_for_model (gauge) | Remaining requests API Key can make for model (model based rpm limit on key) Shown as request |
litellm.remaining.api_key.tokens_for_model (gauge) | Remaining tokens API Key can make for model (model based tpm limit on key) Shown as token |
litellm.remaining.requests (gauge) | Remaining requests for model, returned from LLM API Provider Shown as request |
litellm.remaining.team_budget.metric (gauge) | Remaining budget for team |
litellm.remaining_requests.metric (gauge) | Track x-ratelimit-remaining-requests returned from LLM API Deployment Shown as request |
litellm.remaining_tokens (gauge) | Remaining tokens for model, returned from LLM API Provider Shown as token |
litellm.request.total_latency.metric.bucket (count) | Number of observations that fall into each upper_bound total latency bucket (seconds) for a request to LiteLLM |
litellm.request.total_latency.metric.count (count) | Number of total latency observations (seconds) for a request to LiteLLM in the time period |
litellm.request.total_latency.metric.sum (count) | Total latency (seconds) for a request to LiteLLM Shown as second |
litellm.requests.metric.count (count) | Deprecated - use litellm.proxy.total_requests.metric.count. Number of LLM calls to litellm in the time period - track total per API Key, team, user Shown as request |
litellm.reset_budget_job.failed_requests.count (count) | Number of failed requests for reset_budget_job service in the time period Shown as error |
litellm.reset_budget_job.latency.bucket (count) | Latency for reset_budget_job service |
litellm.reset_budget_job.total_requests.count (count) | Number of requests for reset_budget_job service in the time period Shown as request |
litellm.router.failed_requests.count (count) | Number of failed requests for router service in the time period Shown as error |
litellm.router.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for router service |
litellm.router.latency.count (count) | Number of latency observations for router service in the time period |
litellm.router.latency.sum (count) | Latency for router service Shown as millisecond |
litellm.router.total_requests.count (count) | Number of requests for router service in the time period Shown as request |
litellm.self.failed_requests.count (count) | Number of failed requests for self service in the time period Shown as error |
litellm.self.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for self service |
litellm.self.latency.count (count) | Number of latency observations for self service in the time period |
litellm.self.latency.sum (count) | Latency for self service Shown as millisecond |
litellm.self.total_requests.count (count) | Number of requests for self service in the time period Shown as request |
litellm.spend.metric.count (count) | Spend on LLM requests in the time period |
litellm.team.budget.remaining_hours.metric (gauge) | Remaining hours for team budget to be reset Shown as hour |
litellm.team.max_budget.metric (gauge) | Maximum budget set for team |
litellm.total.tokens.count (count) | Number of input + output tokens from LLM requests in the time period Shown as token |