LiteLLM

Supported OS Linux Windows Mac OS

Integration version2.1.1

Overview

Monitor, troubleshoot, and evaluate your LLM-powered applications built using LiteLLM: a lightweight, open-source proxy and analytics layer for large language model (LLM) APIs. It enables unified access, observability, and cost control across multiple LLM providers.

Use LLM Observability to investigate the root cause of issues, monitor operational performance, and evaluate the quality, privacy, and safety of your LLM applications.

See the LLM Observability tracing view video for an example of how you can investigate a trace.

Get cost estimation, prompt and completion sampling, error tracking, performance metrics, and more out of LiteLLM Python library requests using Datadog metrics and APM.

Key metrics such as request/response counts, latency, error rates, token usage, and spend per provider or deployment are monitored. This data enables customers to track usage patterns, detect anomalies, control costs, and troubleshoot issues quickly, ensuring efficient and reliable LLM operations through LiteLLM’s health check and Prometheus endpoints.

Minimum Agent version: 7.68.0

Setup

LLM Observability: Get end-to-end visibility into your LLM application using LiteLLM

See the LiteLLM integration docs for details on how to get started with LLM Observability for LiteLLM.

Agent Check: LiteLLM

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.

Installation

Starting from Agent 7.68.0, the LiteLLM check is included in the Datadog Agent package. No additional installation is needed on your server.

Configuration

This integration collects metrics through the Prometheus endpoint exposed by the LiteLLM Proxy. This feature is only available for enterprise users of LiteLLM. By default, the metrics are exposed on the /metrics endpoint. If connecting locally, the default port is 4000. For more information, see the LiteLLM Prometheus documentation.

Note: The listed metrics can only be collected if they are available. Some metrics are generated only when certain actions are performed. For example, the litellm.auth.failed_requests.count metric might only be exposed after an authentication failed request has occurred.

Host-based

Edit the litellm.d/conf.yaml file in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your LiteLLM performance data. See the sample litellm.d/conf.yaml for all available configuration options. Example config:

init_config:

instances:
  - openmetrics_endpoint: http://localhost:4000/metrics
    # If authorization is required to access the endpoint, use the settings below.
    # headers:
    #  Authorization: Bearer sk-1234

Restart the Agent.

Kubernetes-based

For LiteLLM Proxy running on Kubernetes, configuration can be easily done via pod annotations. See the example below:

apiVersion: v1
kind: Pod
# (...)
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/<CONTAINER_NAME>.checks: | # <CONTAINER_NAME> must match the container name specified in the containers section below.
      {
        "litellm": {
          "init_config": {},
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:4000/metrics"
            }
          ]
        }
      }
    # (...)
spec:
  containers:
    - name: <CONTAINER_NAME>
# (...)

For more information and alternative ways to configure the check in Kubernetes-based environments, see the Kubernetes Integration Setup documentation.

Logs

LiteLLM can send logs to Datadog through its callback system. You can configure various logging settings in LiteLLM to customize log formatting and delivery to Datadog for ingestion. For detailed configuration options and setup instructions, refer to the LiteLLM Logging Documentation.

Validation

Run the Agent’s status subcommand (see documentation) and look for litellm under the Checks section.

Data Collected

Metrics


litellm.api.key.budget.remaining_hours.metric (gauge)	Remaining hours for api key budget to be reset Shown as hour
litellm.api.key.max_budget.metric (gauge)	Maximum budget set for api key
litellm.auth.failed_requests.count (count)	Number of failed requests for auth service in the time period Shown as error
litellm.auth.latency.bucket (count)	Number of observations that fall into each upper_bound latency bucket for auth service
litellm.auth.latency.count (count)	Number of latency observations for auth service in the time period
litellm.auth.latency.sum (count)	Latency for auth service Shown as millisecond
litellm.auth.total_requests.count (count)	Number of requests for auth service in the time period Shown as request
litellm.batch_write_to_db.failed_requests.count (count)	Number of failed requests for batch_write_to_db service in the time period Shown as error
litellm.batch_write_to_db.latency.bucket (count)	Number of observations that fall into each upper_bound latency bucket for batch_write_to_db service
litellm.batch_write_to_db.latency.count (count)	Number of latency observations for batch_write_to_db service in the time period
litellm.batch_write_to_db.latency.sum (count)	Latency for batch_write_to_db service Shown as millisecond
litellm.batch_write_to_db.total_requests.count (count)	Number of requests for batch_write_to_db service in the time period Shown as request
litellm.deployment.cooled_down.count (count)	Number of times a deployment has been cooled down by LiteLLM load balancing logic in the time period. exception_status is the status of the exception that caused the deployment to be cooled down Shown as event
litellm.deployment.failed_fallbacks.count (count)	Number of failed fallback requests from primary model -> fallback model in the time period Shown as error
litellm.deployment.failure_by_tag_responses.count (count)	Number of failed LLM API calls for a specific LLM deployment by custom metadata tags in the time period Shown as error
litellm.deployment.failure_responses.count (count)	Number of failed LLM API calls for a specific LLM deployment in the time period. exception_status is the status of the exception from the LLM API Shown as error
litellm.deployment.latency_per_output_token.bucket (count)	Number of observations that fall into each upper_bound latency per output token bucket for deployment
litellm.deployment.latency_per_output_token.count (count)	Number of latency per output token observations for deployment in the time period
litellm.deployment.latency_per_output_token.sum (count)	Latency per output token Shown as millisecond
litellm.deployment.state (gauge)	The state of the deployment: 0 = healthy,1 = partial outage,2 = complete outage Shown as unit
litellm.deployment.success_responses.count (count)	Number of successful LLM API calls via litellm in the time period Shown as response
litellm.deployment.successful_fallbacks.count (count)	Number of successful fallback requests from primary model -> fallback model in the time period Shown as response
litellm.deployment.total_requests.count (count)	Number of LLM API calls via litellm in the time period - success + failure Shown as request
litellm.in_memory.daily_spend_update_queue.size (gauge)	Gauge for in_memory_daily_spend_update_queue service Shown as item
litellm.in_memory.spend_update_queue.size (gauge)	Gauge for in_memory_spend_update_queue service Shown as item
litellm.input.tokens.count (count)	Number of input tokens from LLM requests in the time period Shown as token
litellm.llm.api.failed_requests.metric.count (count)	Deprecated - use litellm.proxy.failed_requests.metric. Number of failed responses from proxy in the time period - the client did not get a success response from litellm proxy Shown as error
litellm.llm.api.latency.metric.bucket (count)	Number of observations that fall into each upper_bound latency bucket (seconds) for a model’s LLM API call
litellm.llm.api.latency.metric.count (count)	Number of latency observations (seconds) for a model’s LLM API call in the time period
litellm.llm.api.latency.metric.sum (count)	Total latency (seconds) for a model’s LLM API call Shown as second
litellm.llm.api.time_to_first_token.metric.bucket (count)	Number of observations that fall into each upper_bound time to first token bucket for a model’s LLM API call
litellm.llm.api.time_to_first_token.metric.count (count)	Number of time to first token observations for a model’s LLM API call in the time period
litellm.llm.api.time_to_first_token.metric.sum (count)	Time to first token for a model’s LLM API call Shown as second
litellm.output.tokens.count (count)	Number of output tokens from LLM requests in the time period Shown as token
litellm.overhead_latency.metric.bucket (count)	Number of observations that fall into each upper_bound overhead latency bucket (milliseconds) added by LiteLLM processing
litellm.overhead_latency.metric.count (count)	Number of overhead latency observations (milliseconds) added by LiteLLM processing in the time period
litellm.overhead_latency.metric.sum (count)	Latency overhead (milliseconds) added by LiteLLM processing Shown as millisecond
litellm.pod_lock_manager.size (gauge)	Gauge for pod_lock_manager service Shown as item
litellm.postgres.failed_requests.count (count)	Number of failed requests for Postgres service in the time period Shown as error
litellm.postgres.latency.bucket (count)	Number of observations that fall into each upper_bound latency bucket for Postgres service
litellm.postgres.latency.count (count)	Number of latency observations for Postgres service in the time period
litellm.postgres.latency.sum (count)	Latency for Postgres service Shown as millisecond
litellm.postgres.total_requests.count (count)	Number of requests for Postgres service in the time period Shown as request
litellm.process.uptime.seconds (gauge)	Start time of the process since unix epoch in seconds. Shown as second
litellm.provider.remaining_budget.metric (gauge)	Remaining budget for provider - used when you set provider budget limits
litellm.proxy.failed_requests.metric.count (count)	Number of failed responses from proxy in the time period - the client did not get a success response from litellm proxy Shown as error
litellm.proxy.pre_call.failed_requests.count (count)	Number of failed requests for proxy_pre_call service in the time period Shown as error
litellm.proxy.pre_call.latency.bucket (count)	Number of observations that fall into each upper_bound latency bucket for proxy_pre_call service
litellm.proxy.pre_call.latency.count (count)	Number of latency observations for proxy_pre_call service in the time period
litellm.proxy.pre_call.latency.sum (count)	Latency for proxy_pre_call service Shown as millisecond
litellm.proxy.pre_call.total_requests.count (count)	Number of requests for proxy_pre_call service in the time period Shown as request
litellm.proxy.total_requests.metric.count (count)	Number of requests made to the proxy server in the time period - track number of client side requests Shown as request
litellm.redis.daily_spend_update_queue.size (gauge)	Gauge for redis_daily_spend_update_queue service Shown as item
litellm.redis.daily_tag_spend_update_queue.failed_requests.count (count)	Number of failed requests for redis_daily_tag_spend_update_queue service in the time period Shown as error
litellm.redis.daily_tag_spend_update_queue.latency.bucket (count)	Number of observations that fall into each upper_bound latency bucket for redis_daily_tag_spend_update_queue service
litellm.redis.daily_tag_spend_update_queue.latency.count (count)	Number of latency observations for redis_daily_tag_spend_update_queue service in the time period
litellm.redis.daily_tag_spend_update_queue.latency.sum (count)	Latency for redis_daily_tag_spend_update_queue service Shown as millisecond
litellm.redis.daily_tag_spend_update_queue.total_requests.count (count)	Number of requests for redis_daily_tag_spend_update_queue service in the time period Shown as request
litellm.redis.daily_team_spend_update_queue.failed_requests.count (count)	Number of failed requests for redis_daily_team_spend_update_queue service in the time period Shown as error
litellm.redis.daily_team_spend_update_queue.latency.bucket (count)	Number of observations that fall into each upper_bound latency bucket for redis_daily_team_spend_update_queue service
litellm.redis.daily_team_spend_update_queue.latency.count (count)	Number of latency observations for redis_daily_team_spend_update_queue service in the time period
litellm.redis.daily_team_spend_update_queue.latency.sum (count)	Latency for redis_daily_team_spend_update_queue service Shown as millisecond
litellm.redis.daily_team_spend_update_queue.total_requests.count (count)	Number of requests for redis_daily_team_spend_update_queue service in the time period Shown as request
litellm.redis.failed_requests.count (count)	Number of failed requests for Redis service in the time period Shown as error
litellm.redis.latency.bucket (count)	Number of observations that fall into each upper_bound latency bucket for Redis service
litellm.redis.latency.count (count)	Number of latency observations for Redis service in the time period
litellm.redis.latency.sum (count)	Total latency (milliseconds) for Redis service Shown as millisecond
litellm.redis.spend_update_queue.size (gauge)	Gauge for redis_spend_update_queue service Shown as item
litellm.redis.total_requests.count (count)	Number of requests for Redis service in the time period Shown as request
litellm.remaining.api_key.budget.metric (gauge)	Remaining budget for api key
litellm.remaining.api_key.requests_for_model (gauge)	Remaining requests API Key can make for model (model based rpm limit on key) Shown as request
litellm.remaining.api_key.tokens_for_model (gauge)	Remaining tokens API Key can make for model (model based tpm limit on key) Shown as token
litellm.remaining.requests (gauge)	Remaining requests for model, returned from LLM API Provider Shown as request
litellm.remaining.team_budget.metric (gauge)	Remaining budget for team
litellm.remaining_requests.metric (gauge)	Track x-ratelimit-remaining-requests returned from LLM API Deployment Shown as request
litellm.remaining_tokens (gauge)	Remaining tokens for model, returned from LLM API Provider Shown as token
litellm.request.total_latency.metric.bucket (count)	Number of observations that fall into each upper_bound total latency bucket (seconds) for a request to LiteLLM
litellm.request.total_latency.metric.count (count)	Number of total latency observations (seconds) for a request to LiteLLM in the time period
litellm.request.total_latency.metric.sum (count)	Total latency (seconds) for a request to LiteLLM Shown as second
litellm.requests.metric.count (count)	Deprecated - use litellm.proxy.total_requests.metric.count. Number of LLM calls to litellm in the time period - track total per API Key, team, user Shown as request
litellm.reset_budget_job.failed_requests.count (count)	Number of failed requests for reset_budget_job service in the time period Shown as error
litellm.reset_budget_job.latency.bucket (count)	Latency for reset_budget_job service
litellm.reset_budget_job.total_requests.count (count)	Number of requests for reset_budget_job service in the time period Shown as request
litellm.router.failed_requests.count (count)	Number of failed requests for router service in the time period Shown as error
litellm.router.latency.bucket (count)	Number of observations that fall into each upper_bound latency bucket for router service
litellm.router.latency.count (count)	Number of latency observations for router service in the time period
litellm.router.latency.sum (count)	Latency for router service Shown as millisecond
litellm.router.total_requests.count (count)	Number of requests for router service in the time period Shown as request
litellm.self.failed_requests.count (count)	Number of failed requests for self service in the time period Shown as error
litellm.self.latency.bucket (count)	Number of observations that fall into each upper_bound latency bucket for self service
litellm.self.latency.count (count)	Number of latency observations for self service in the time period
litellm.self.latency.sum (count)	Latency for self service Shown as millisecond
litellm.self.total_requests.count (count)	Number of requests for self service in the time period Shown as request
litellm.spend.metric.count (count)	Spend on LLM requests in the time period
litellm.team.budget.remaining_hours.metric (gauge)	Remaining hours for team budget to be reset Shown as hour
litellm.team.max_budget.metric (gauge)	Maximum budget set for team
litellm.total.tokens.count (count)	Number of input + output tokens from LLM requests in the time period Shown as token

Events

The LiteLLM integration does not include any events.

Service Checks

The LiteLLM integration does not include any service checks.

litellm.openmetrics.health

Returns CRITICAL if the Agent is unable to connect to the LiteLLM OpenMetrics endpoint, otherwise returns OK.

Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog support.