---
title: LiteLLM
description: >-
  This integration allows for real-time collection of LiteLLM metrics for
  enhanced observability and monitoring.
breadcrumbs: Docs > Integrations > LiteLLM
---

# LiteLLM
Supported OS Integration version2.4.1
## Overview{% #overview %}

Monitor, troubleshoot, and evaluate your LLM-powered applications built using [LiteLLM](https://www.litellm.ai/): a lightweight, open-source proxy and analytics layer for large language model (LLM) APIs. It enables unified access, observability, and cost control across multiple LLM providers.

Use LLM Observability to investigate the root cause of issues, monitor operational performance, and evaluate the quality, privacy, and safety of your LLM applications.

See the [LLM Observability tracing view video](https://imgix.datadoghq.com/video/products/llm-observability/expedite-troubleshooting.mp4?fm=webm&fit=max) for an example of how you can investigate a trace.

Get cost estimation, prompt and completion sampling, error tracking, performance metrics, and more out of [LiteLLM](https://www.litellm.ai/) Python library requests using Datadog metrics and APM.

Key metrics such as request/response counts, latency, error rates, token usage, and spend per provider or deployment are monitored. This data enables customers to track usage patterns, detect anomalies, control costs, and troubleshoot issues quickly, ensuring efficient and reliable LLM operations through LiteLLM's health check and Prometheus endpoints.

**Minimum Agent version:** 7.68.0

## Setup{% #setup %}

You can configure this integration either as a standalone integration with LLM Observability, or as an agent check.

Use it as an integration with LLM Observability to:

- Monitor LLM agents and applications that use LiteLLM
- Debug and troubleshoot every single operation, including calls to LiteLLM
- See contextual visibility across prompt, model or provider selection, latency, error, token usage and cost, and more

Use it as an agent check to:

- Monitor the LiteLLM proxy service itself directly at the infrastructure level
- Capture metrics like request volume, latency, error and fallback rates, token usage, and costs for the LiteLLM service
- Support platform and SREs to ensure overall reliability of LiteLLM deployment

{% tab title="LLM Observability" %}
Get end-to-end visibility into your LLM application using LiteLLM.

When you [run your LLM application with the LLM Observability SDK](https://docs.datadoghq.com/llm_observability/quickstart.md), LiteLLM and other integrations are enabled by default, so no further configuration is required. See [Automatic Instrumentation for LLM Observability](https://docs.datadoghq.com/llm_observability/instrumentation/auto_instrumentation.md?tab=python#litellm) for details on automatic tracing provided by the LiteLLM integration.
{% /tab %}

{% tab title="Agent Check: LiteLLM" %}
Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the [Autodiscovery Integration Templates](https://docs.datadoghq.com/agent/kubernetes/integrations.md) for guidance on applying these instructions.

### Installation{% #installation %}

Starting from Agent 7.68.0, the LiteLLM check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package. No additional installation is needed on your server.

### Configuration{% #configuration %}

This integration collects metrics through the Prometheus endpoint exposed by the LiteLLM Proxy. This feature is only available for enterprise users of LiteLLM. By default, the metrics are exposed on the `/metrics` endpoint. If connecting locally, the default port is 4000. For more information, see the [LiteLLM Prometheus documentation](https://docs.litellm.ai/docs/proxy/prometheus).

Note: The listed metrics can only be collected if they are available. Some metrics are generated only when certain actions are performed. For example, the `litellm.auth.failed_requests.count` metric might only be exposed after an authentication failed request has occurred.

#### Host-based{% #host-based %}

1. Edit the `litellm.d/conf.yaml` file in the `conf.d/` folder at the root of your Agent's configuration directory to start collecting your LiteLLM performance data. See the [sample litellm.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/litellm/datadog_checks/litellm/data/conf.yaml.example) for all available configuration options. Example config:

   ```yaml
   init_config:
   
   instances:
     - openmetrics_endpoint: http://localhost:4000/metrics
       # If authorization is required to access the endpoint, use the settings below.
       # headers:
       #  Authorization: Bearer sk-1234
   ```

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands.md#start-stop-and-restart-the-agent).

#### Kubernetes-based{% #kubernetes-based %}

For LiteLLM Proxy running on Kubernetes, configuration can be easily done via pod annotations. See the example below:

```yaml
apiVersion: v1
kind: Pod
# (...)
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/<CONTAINER_NAME>.checks: | # <CONTAINER_NAME> must match the container name specified in the containers section below.
      {
        "litellm": {
          "init_config": {},
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:4000/metrics"
            }
          ]
        }
      }
    # (...)
spec:
  containers:
    - name: <CONTAINER_NAME>
# (...)
```

For more information and alternative ways to configure the check in Kubernetes-based environments, see the [Kubernetes Integration Setup documentation](https://docs.datadoghq.com/agent/kubernetes/integrations.md).

#### Logs{% #logs %}

LiteLLM can send logs to Datadog through its callback system. You can configure various logging settings in LiteLLM to customize log formatting and delivery to Datadog for ingestion. For detailed configuration options and setup instructions, refer to the [LiteLLM Logging Documentation](https://docs.litellm.ai/docs/proxy/logging).

### Validation{% #validation %}

Run the Agent's status subcommand ([see documentation](https://docs.datadoghq.com/agent/guide/agent-commands.md#agent-status-and-information)) and look for `litellm` under the Checks section.
{% /tab %}

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **litellm.api.key.budget.remaining\_hours.metric**(gauge)                         | Remaining hours for api key budget to be reset*Shown as hour*                                                                                                                                                      |
| **litellm.api.key.max\_budget.metric**(gauge)                                     | Maximum budget set for api key                                                                                                                                                                                     |
| **litellm.auth.failed\_requests.count**(count)                                    | Number of failed requests for auth service in the time period*Shown as error*                                                                                                                                      |
| **litellm.auth.latency.bucket**(count)                                            | Number of observations that fall into each upper_bound latency bucket for auth service                                                                                                                             |
| **litellm.auth.latency.count**(count)                                             | Number of latency observations for auth service in the time period                                                                                                                                                 |
| **litellm.auth.latency.sum**(count)                                               | Latency for auth service*Shown as millisecond*                                                                                                                                                                     |
| **litellm.auth.total\_requests.count**(count)                                     | Number of requests for auth service in the time period*Shown as request*                                                                                                                                           |
| **litellm.batch\_write\_to\_db.failed\_requests.count**(count)                    | Number of failed requests for batch_write_to_db service in the time period*Shown as error*                                                                                                                         |
| **litellm.batch\_write\_to\_db.latency.bucket**(count)                            | Number of observations that fall into each upper_bound latency bucket for batch_write_to_db service                                                                                                                |
| **litellm.batch\_write\_to\_db.latency.count**(count)                             | Number of latency observations for batch_write_to_db service in the time period                                                                                                                                    |
| **litellm.batch\_write\_to\_db.latency.sum**(count)                               | Latency for batch_write_to_db service*Shown as millisecond*                                                                                                                                                        |
| **litellm.batch\_write\_to\_db.total\_requests.count**(count)                     | Number of requests for batch_write_to_db service in the time period*Shown as request*                                                                                                                              |
| **litellm.deployment.cooled\_down.count**(count)                                  | Number of times a deployment has been cooled down by LiteLLM load balancing logic in the time period. exception_status is the status of the exception that caused the deployment to be cooled down*Shown as event* |
| **litellm.deployment.failed\_fallbacks.count**(count)                             | Number of failed fallback requests from primary model -> fallback model in the time period*Shown as error*                                                                                                         |
| **litellm.deployment.failure\_by\_tag\_responses.count**(count)                   | Number of failed LLM API calls for a specific LLM deployment by custom metadata tags in the time period*Shown as error*                                                                                            |
| **litellm.deployment.failure\_responses.count**(count)                            | Number of failed LLM API calls for a specific LLM deployment in the time period. exception_status is the status of the exception from the LLM API*Shown as error*                                                  |
| **litellm.deployment.latency\_per\_output\_token.bucket**(count)                  | Number of observations that fall into each upper_bound latency per output token bucket for deployment                                                                                                              |
| **litellm.deployment.latency\_per\_output\_token.count**(count)                   | Number of latency per output token observations for deployment in the time period                                                                                                                                  |
| **litellm.deployment.latency\_per\_output\_token.sum**(count)                     | Latency per output token*Shown as millisecond*                                                                                                                                                                     |
| **litellm.deployment.state**(gauge)                                               | The state of the deployment: 0 = healthy,1 = partial outage,2 = complete outage*Shown as unit*                                                                                                                     |
| **litellm.deployment.success\_responses.count**(count)                            | Number of successful LLM API calls via litellm in the time period*Shown as response*                                                                                                                               |
| **litellm.deployment.successful\_fallbacks.count**(count)                         | Number of successful fallback requests from primary model -> fallback model in the time period*Shown as response*                                                                                                  |
| **litellm.deployment.total\_requests.count**(count)                               | Number of LLM API calls via litellm in the time period - success + failure*Shown as request*                                                                                                                       |
| **litellm.in\_memory.daily\_spend\_update\_queue.size**(gauge)                    | Gauge for in_memory_daily_spend_update_queue service*Shown as item*                                                                                                                                                |
| **litellm.in\_memory.spend\_update\_queue.size**(gauge)                           | Gauge for in_memory_spend_update_queue service*Shown as item*                                                                                                                                                      |
| **litellm.input.tokens.count**(count)                                             | Number of input tokens from LLM requests in the time period*Shown as token*                                                                                                                                        |
| **litellm.llm.api.failed\_requests.metric.count**(count)                          | Deprecated - use litellm.proxy.failed_requests.metric. Number of failed responses from proxy in the time period - the client did not get a success response from litellm proxy*Shown as error*                     |
| **litellm.llm.api.latency.metric.bucket**(count)                                  | Number of observations that fall into each upper_bound latency bucket (seconds) for a model's LLM API call                                                                                                         |
| **litellm.llm.api.latency.metric.count**(count)                                   | Number of latency observations (seconds) for a model's LLM API call in the time period                                                                                                                             |
| **litellm.llm.api.latency.metric.sum**(count)                                     | Total latency (seconds) for a model's LLM API call*Shown as second*                                                                                                                                                |
| **litellm.llm.api.time\_to\_first\_token.metric.bucket**(count)                   | Number of observations that fall into each upper_bound time to first token bucket for a model's LLM API call                                                                                                       |
| **litellm.llm.api.time\_to\_first\_token.metric.count**(count)                    | Number of time to first token observations for a model's LLM API call in the time period                                                                                                                           |
| **litellm.llm.api.time\_to\_first\_token.metric.sum**(count)                      | Time to first token for a model's LLM API call*Shown as second*                                                                                                                                                    |
| **litellm.output.tokens.count**(count)                                            | Number of output tokens from LLM requests in the time period*Shown as token*                                                                                                                                       |
| **litellm.overhead\_latency.metric.bucket**(count)                                | Number of observations that fall into each upper_bound overhead latency bucket (milliseconds) added by LiteLLM processing                                                                                          |
| **litellm.overhead\_latency.metric.count**(count)                                 | Number of overhead latency observations (milliseconds) added by LiteLLM processing in the time period                                                                                                              |
| **litellm.overhead\_latency.metric.sum**(count)                                   | Latency overhead (milliseconds) added by LiteLLM processing*Shown as millisecond*                                                                                                                                  |
| **litellm.pod\_lock\_manager.size**(gauge)                                        | Gauge for pod_lock_manager service*Shown as item*                                                                                                                                                                  |
| **litellm.postgres.failed\_requests.count**(count)                                | Number of failed requests for Postgres service in the time period*Shown as error*                                                                                                                                  |
| **litellm.postgres.latency.bucket**(count)                                        | Number of observations that fall into each upper_bound latency bucket for Postgres service                                                                                                                         |
| **litellm.postgres.latency.count**(count)                                         | Number of latency observations for Postgres service in the time period                                                                                                                                             |
| **litellm.postgres.latency.sum**(count)                                           | Latency for Postgres service*Shown as millisecond*                                                                                                                                                                 |
| **litellm.postgres.total\_requests.count**(count)                                 | Number of requests for Postgres service in the time period*Shown as request*                                                                                                                                       |
| **litellm.process.uptime.seconds**(gauge)                                         | Start time of the process since unix epoch in seconds.*Shown as second*                                                                                                                                            |
| **litellm.provider.remaining\_budget.metric**(gauge)                              | Remaining budget for provider - used when you set provider budget limits                                                                                                                                           |
| **litellm.proxy.failed\_requests.metric.count**(count)                            | Number of failed responses from proxy in the time period - the client did not get a success response from litellm proxy*Shown as error*                                                                            |
| **litellm.proxy.pre\_call.failed\_requests.count**(count)                         | Number of failed requests for proxy_pre_call service in the time period*Shown as error*                                                                                                                            |
| **litellm.proxy.pre\_call.latency.bucket**(count)                                 | Number of observations that fall into each upper_bound latency bucket for proxy_pre_call service                                                                                                                   |
| **litellm.proxy.pre\_call.latency.count**(count)                                  | Number of latency observations for proxy_pre_call service in the time period                                                                                                                                       |
| **litellm.proxy.pre\_call.latency.sum**(count)                                    | Latency for proxy_pre_call service*Shown as millisecond*                                                                                                                                                           |
| **litellm.proxy.pre\_call.total\_requests.count**(count)                          | Number of requests for proxy_pre_call service in the time period*Shown as request*                                                                                                                                 |
| **litellm.proxy.total\_requests.metric.count**(count)                             | Number of requests made to the proxy server in the time period - track number of client side requests*Shown as request*                                                                                            |
| **litellm.redis.daily\_spend\_update\_queue.size**(gauge)                         | Gauge for redis_daily_spend_update_queue service*Shown as item*                                                                                                                                                    |
| **litellm.redis.daily\_tag\_spend\_update\_queue.failed\_requests.count**(count)  | Number of failed requests for redis_daily_tag_spend_update_queue service in the time period*Shown as error*                                                                                                        |
| **litellm.redis.daily\_tag\_spend\_update\_queue.latency.bucket**(count)          | Number of observations that fall into each upper_bound latency bucket for redis_daily_tag_spend_update_queue service                                                                                               |
| **litellm.redis.daily\_tag\_spend\_update\_queue.latency.count**(count)           | Number of latency observations for redis_daily_tag_spend_update_queue service in the time period                                                                                                                   |
| **litellm.redis.daily\_tag\_spend\_update\_queue.latency.sum**(count)             | Latency for redis_daily_tag_spend_update_queue service*Shown as millisecond*                                                                                                                                       |
| **litellm.redis.daily\_tag\_spend\_update\_queue.total\_requests.count**(count)   | Number of requests for redis_daily_tag_spend_update_queue service in the time period*Shown as request*                                                                                                             |
| **litellm.redis.daily\_team\_spend\_update\_queue.failed\_requests.count**(count) | Number of failed requests for redis_daily_team_spend_update_queue service in the time period*Shown as error*                                                                                                       |
| **litellm.redis.daily\_team\_spend\_update\_queue.latency.bucket**(count)         | Number of observations that fall into each upper_bound latency bucket for redis_daily_team_spend_update_queue service                                                                                              |
| **litellm.redis.daily\_team\_spend\_update\_queue.latency.count**(count)          | Number of latency observations for redis_daily_team_spend_update_queue service in the time period                                                                                                                  |
| **litellm.redis.daily\_team\_spend\_update\_queue.latency.sum**(count)            | Latency for redis_daily_team_spend_update_queue service*Shown as millisecond*                                                                                                                                      |
| **litellm.redis.daily\_team\_spend\_update\_queue.total\_requests.count**(count)  | Number of requests for redis_daily_team_spend_update_queue service in the time period*Shown as request*                                                                                                            |
| **litellm.redis.failed\_requests.count**(count)                                   | Number of failed requests for Redis service in the time period*Shown as error*                                                                                                                                     |
| **litellm.redis.latency.bucket**(count)                                           | Number of observations that fall into each upper_bound latency bucket for Redis service                                                                                                                            |
| **litellm.redis.latency.count**(count)                                            | Number of latency observations for Redis service in the time period                                                                                                                                                |
| **litellm.redis.latency.sum**(count)                                              | Total latency (milliseconds) for Redis service*Shown as millisecond*                                                                                                                                               |
| **litellm.redis.spend\_update\_queue.size**(gauge)                                | Gauge for redis_spend_update_queue service*Shown as item*                                                                                                                                                          |
| **litellm.redis.total\_requests.count**(count)                                    | Number of requests for Redis service in the time period*Shown as request*                                                                                                                                          |
| **litellm.remaining.api\_key.budget.metric**(gauge)                               | Remaining budget for api key                                                                                                                                                                                       |
| **litellm.remaining.api\_key.requests\_for\_model**(gauge)                        | Remaining requests API Key can make for model (model based rpm limit on key)*Shown as request*                                                                                                                     |
| **litellm.remaining.api\_key.tokens\_for\_model**(gauge)                          | Remaining tokens API Key can make for model (model based tpm limit on key)*Shown as token*                                                                                                                         |
| **litellm.remaining.requests**(gauge)                                             | Remaining requests for model, returned from LLM API Provider*Shown as request*                                                                                                                                     |
| **litellm.remaining.team\_budget.metric**(gauge)                                  | Remaining budget for team                                                                                                                                                                                          |
| **litellm.remaining\_requests.metric**(gauge)                                     | Track x-ratelimit-remaining-requests returned from LLM API Deployment*Shown as request*                                                                                                                            |
| **litellm.remaining\_tokens**(gauge)                                              | Remaining tokens for model, returned from LLM API Provider*Shown as token*                                                                                                                                         |
| **litellm.request.total\_latency.metric.bucket**(count)                           | Number of observations that fall into each upper_bound total latency bucket (seconds) for a request to LiteLLM                                                                                                     |
| **litellm.request.total\_latency.metric.count**(count)                            | Number of total latency observations (seconds) for a request to LiteLLM in the time period                                                                                                                         |
| **litellm.request.total\_latency.metric.sum**(count)                              | Total latency (seconds) for a request to LiteLLM*Shown as second*                                                                                                                                                  |
| **litellm.requests.metric.count**(count)                                          | Deprecated - use litellm.proxy.total_requests.metric.count. Number of LLM calls to litellm in the time period - track total per API Key, team, user*Shown as request*                                              |
| **litellm.reset\_budget\_job.failed\_requests.count**(count)                      | Number of failed requests for reset_budget_job service in the time period*Shown as error*                                                                                                                          |
| **litellm.reset\_budget\_job.latency.bucket**(count)                              | Latency for reset_budget_job service                                                                                                                                                                               |
| **litellm.reset\_budget\_job.total\_requests.count**(count)                       | Number of requests for reset_budget_job service in the time period*Shown as request*                                                                                                                               |
| **litellm.router.failed\_requests.count**(count)                                  | Number of failed requests for router service in the time period*Shown as error*                                                                                                                                    |
| **litellm.router.latency.bucket**(count)                                          | Number of observations that fall into each upper_bound latency bucket for router service                                                                                                                           |
| **litellm.router.latency.count**(count)                                           | Number of latency observations for router service in the time period                                                                                                                                               |
| **litellm.router.latency.sum**(count)                                             | Latency for router service*Shown as millisecond*                                                                                                                                                                   |
| **litellm.router.total\_requests.count**(count)                                   | Number of requests for router service in the time period*Shown as request*                                                                                                                                         |
| **litellm.self.failed\_requests.count**(count)                                    | Number of failed requests for self service in the time period*Shown as error*                                                                                                                                      |
| **litellm.self.latency.bucket**(count)                                            | Number of observations that fall into each upper_bound latency bucket for self service                                                                                                                             |
| **litellm.self.latency.count**(count)                                             | Number of latency observations for self service in the time period                                                                                                                                                 |
| **litellm.self.latency.sum**(count)                                               | Latency for self service*Shown as millisecond*                                                                                                                                                                     |
| **litellm.self.total\_requests.count**(count)                                     | Number of requests for self service in the time period*Shown as request*                                                                                                                                           |
| **litellm.spend.metric.count**(count)                                             | Spend on LLM requests in the time period                                                                                                                                                                           |
| **litellm.team.budget.remaining\_hours.metric**(gauge)                            | Remaining hours for team budget to be reset*Shown as hour*                                                                                                                                                         |
| **litellm.team.max\_budget.metric**(gauge)                                        | Maximum budget set for team                                                                                                                                                                                        |
| **litellm.total.tokens.count**(count)                                             | Number of input + output tokens from LLM requests in the time period*Shown as token*                                                                                                                               |

### Events{% #events %}

The LiteLLM integration does not include any events.

### Service Checks{% #service-checks %}

**litellm.openmetrics.health**

Returns `CRITICAL` if the Agent is unable to connect to the LiteLLM OpenMetrics endpoint, otherwise returns `OK`.

*Statuses: ok, critical*

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).
