- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Supported OS
Monitor, troubleshoot, and evaluate your LLM-powered applications built using LiteLLM: a lightweight, open-source proxy and analytics layer for large language model (LLM) APIs. It enables unified access, observability, and cost control across multiple LLM providers.
Use LLM Observability to investigate the root cause of issues, monitor operational performance, and evaluate the quality, privacy, and safety of your LLM applications.
See the LLM Observability tracing view video for an example of how you can investigate a trace.
Get cost estimation, prompt and completion sampling, error tracking, performance metrics, and more out of LiteLLM Python library requests using Datadog metrics and APM.
Key metrics such as request/response counts, latency, error rates, token usage, and spend per provider or deployment are monitored. This data enables customers to track usage patterns, detect anomalies, control costs, and troubleshoot issues quickly, ensuring efficient and reliable LLM operations through LiteLLM’s health check and Prometheus endpoints.
See the LiteLLM integration docs for details on how to get started with LLM Observability for LiteLLM.
Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.
Starting from Agent 7.68.0, the LiteLLM check is included in the Datadog Agent package. No additional installation is needed on your server.
This integration collects metrics through the Prometheus endpoint exposed by the LiteLLM Proxy. This feature is only available for enterprise users of LiteLLM. By default, the metrics are exposed on the /metrics
endpoint. If connecting locally, the default port is 4000. For more information, see the LiteLLM Prometheus documentation.
Note: The listed metrics can only be collected if they are available. Some metrics are generated only when certain actions are performed. For example, the litellm.auth.failed_requests.count
metric might only be exposed after an authentication failed request has occurred.
litellm.d/conf.yaml
file in the conf.d/
folder at the root of your Agent’s configuration directory to start collecting your LiteLLM performance data. See the sample litellm.d/conf.yaml for all available configuration options. Example config:init_config:
instances:
- openmetrics_endpoint: http://localhost:4000/metrics
# If authorization is required to access the endpoint, use the settings below.
# headers:
# Authorization: Bearer sk-1234
For LiteLLM Proxy running on Kubernetes, configuration can be easily done via pod annotations. See the example below:
apiVersion: v1
kind: Pod
# (...)
metadata:
name: '<POD_NAME>'
annotations:
ad.datadoghq.com/<CONTAINER_NAME>.checks: | # <CONTAINER_NAME> must match the container name specified in the containers section below.
{
"litellm": {
"init_config": {},
"instances": [
{
"openmetrics_endpoint": "http://%%host%%:4000/metrics"
}
]
}
}
# (...)
spec:
containers:
- name: <CONTAINER_NAME>
# (...)
For more information and alternative ways to configure the check in Kubernetes-based environments, see the Kubernetes Integration Setup documentation.
LiteLLM can send logs to Datadog through its callback system. You can configure various logging settings in LiteLLM to customize log formatting and delivery to Datadog for ingestion. For detailed configuration options and setup instructions, refer to the LiteLLM Logging Documentation.
Run the Agent’s status subcommand (see documentation) and look for litellm
under the Checks section.
litellm.api.key.budget.remaining_hours.metric (gauge) | Remaining hours for api key budget to be reset Shown as hour |
litellm.api.key.max_budget.metric (gauge) | Maximum budget set for api key |
litellm.auth.failed_requests.count (count) | Number of failed requests for auth service in the time period Shown as error |
litellm.auth.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for auth service |
litellm.auth.latency.count (count) | Number of latency observations for auth service in the time period |
litellm.auth.latency.sum (count) | Latency for auth service Shown as millisecond |
litellm.auth.total_requests.count (count) | Number of requests for auth service in the time period Shown as request |
litellm.batch_write_to_db.failed_requests.count (count) | Number of failed requests for batch_write_to_db service in the time period Shown as error |
litellm.batch_write_to_db.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for batch_write_to_db service |
litellm.batch_write_to_db.latency.count (count) | Number of latency observations for batch_write_to_db service in the time period |
litellm.batch_write_to_db.latency.sum (count) | Latency for batch_write_to_db service Shown as millisecond |
litellm.batch_write_to_db.total_requests.count (count) | Number of requests for batch_write_to_db service in the time period Shown as request |
litellm.deployment.cooled_down.count (count) | Number of times a deployment has been cooled down by LiteLLM load balancing logic in the time period. exception_status is the status of the exception that caused the deployment to be cooled down Shown as event |
litellm.deployment.failed_fallbacks.count (count) | Number of failed fallback requests from primary model -> fallback model in the time period Shown as error |
litellm.deployment.failure_by_tag_responses.count (count) | Number of failed LLM API calls for a specific LLM deployment by custom metadata tags in the time period Shown as error |
litellm.deployment.failure_responses.count (count) | Number of failed LLM API calls for a specific LLM deployment in the time period. exception_status is the status of the exception from the LLM API Shown as error |
litellm.deployment.latency_per_output_token.bucket (count) | Number of observations that fall into each upper_bound latency per output token bucket for deployment |
litellm.deployment.latency_per_output_token.count (count) | Number of latency per output token observations for deployment in the time period |
litellm.deployment.latency_per_output_token.sum (count) | Latency per output token Shown as millisecond |
litellm.deployment.state (gauge) | The state of the deployment: 0 = healthy,1 = partial outage,2 = complete outage Shown as unit |
litellm.deployment.success_responses.count (count) | Number of successful LLM API calls via litellm in the time period Shown as response |
litellm.deployment.successful_fallbacks.count (count) | Number of successful fallback requests from primary model -> fallback model in the time period Shown as response |
litellm.deployment.total_requests.count (count) | Number of LLM API calls via litellm in the time period - success + failure Shown as request |
litellm.in_memory.daily_spend_update_queue.size (gauge) | Gauge for in_memory_daily_spend_update_queue service Shown as item |
litellm.in_memory.spend_update_queue.size (gauge) | Gauge for in_memory_spend_update_queue service Shown as item |
litellm.input.tokens.count (count) | Number of input tokens from LLM requests in the time period Shown as token |
litellm.llm.api.failed_requests.metric.count (count) | Deprecated - use litellm.proxy.failed_requests.metric. Number of failed responses from proxy in the time period - the client did not get a success response from litellm proxy Shown as error |
litellm.llm.api.latency.metric.bucket (count) | Number of observations that fall into each upper_bound latency bucket (seconds) for a model’s LLM API call |
litellm.llm.api.latency.metric.count (count) | Number of latency observations (seconds) for a model’s LLM API call in the time period |
litellm.llm.api.latency.metric.sum (count) | Total latency (seconds) for a model’s LLM API call Shown as second |
litellm.llm.api.time_to_first_token.metric.bucket (count) | Number of observations that fall into each upper_bound time to first token bucket for a model’s LLM API call |
litellm.llm.api.time_to_first_token.metric.count (count) | Number of time to first token observations for a model’s LLM API call in the time period |
litellm.llm.api.time_to_first_token.metric.sum (count) | Time to first token for a model’s LLM API call Shown as second |
litellm.output.tokens.count (count) | Number of output tokens from LLM requests in the time period Shown as token |
litellm.overhead_latency.metric.bucket (count) | Number of observations that fall into each upper_bound overhead latency bucket (milliseconds) added by LiteLLM processing |
litellm.overhead_latency.metric.count (count) | Number of overhead latency observations (milliseconds) added by LiteLLM processing in the time period |
litellm.overhead_latency.metric.sum (count) | Latency overhead (milliseconds) added by LiteLLM processing Shown as millisecond |
litellm.pod_lock_manager.size (gauge) | Gauge for pod_lock_manager service Shown as item |
litellm.postgres.failed_requests.count (count) | Number of failed requests for Postgres service in the time period Shown as error |
litellm.postgres.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for Postgres service |
litellm.postgres.latency.count (count) | Number of latency observations for Postgres service in the time period |
litellm.postgres.latency.sum (count) | Latency for Postgres service Shown as millisecond |
litellm.postgres.total_requests.count (count) | Number of requests for Postgres service in the time period Shown as request |
litellm.process.uptime.seconds (gauge) | Start time of the process since unix epoch in seconds. Shown as second |
litellm.provider.remaining_budget.metric (gauge) | Remaining budget for provider - used when you set provider budget limits |
litellm.proxy.failed_requests.metric.count (count) | Number of failed responses from proxy in the time period - the client did not get a success response from litellm proxy Shown as error |
litellm.proxy.pre_call.failed_requests.count (count) | Number of failed requests for proxy_pre_call service in the time period Shown as error |
litellm.proxy.pre_call.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for proxy_pre_call service |
litellm.proxy.pre_call.latency.count (count) | Number of latency observations for proxy_pre_call service in the time period |
litellm.proxy.pre_call.latency.sum (count) | Latency for proxy_pre_call service Shown as millisecond |
litellm.proxy.pre_call.total_requests.count (count) | Number of requests for proxy_pre_call service in the time period Shown as request |
litellm.proxy.total_requests.metric.count (count) | Number of requests made to the proxy server in the time period - track number of client side requests Shown as request |
litellm.redis.daily_spend_update_queue.size (gauge) | Gauge for redis_daily_spend_update_queue service Shown as item |
litellm.redis.daily_tag_spend_update_queue.failed_requests.count (count) | Number of failed requests for redis_daily_tag_spend_update_queue service in the time period Shown as error |
litellm.redis.daily_tag_spend_update_queue.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for redis_daily_tag_spend_update_queue service |
litellm.redis.daily_tag_spend_update_queue.latency.count (count) | Number of latency observations for redis_daily_tag_spend_update_queue service in the time period |
litellm.redis.daily_tag_spend_update_queue.latency.sum (count) | Latency for redis_daily_tag_spend_update_queue service Shown as millisecond |
litellm.redis.daily_tag_spend_update_queue.total_requests.count (count) | Number of requests for redis_daily_tag_spend_update_queue service in the time period Shown as request |
litellm.redis.daily_team_spend_update_queue.failed_requests.count (count) | Number of failed requests for redis_daily_team_spend_update_queue service in the time period Shown as error |
litellm.redis.daily_team_spend_update_queue.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for redis_daily_team_spend_update_queue service |
litellm.redis.daily_team_spend_update_queue.latency.count (count) | Number of latency observations for redis_daily_team_spend_update_queue service in the time period |
litellm.redis.daily_team_spend_update_queue.latency.sum (count) | Latency for redis_daily_team_spend_update_queue service Shown as millisecond |
litellm.redis.daily_team_spend_update_queue.total_requests.count (count) | Number of requests for redis_daily_team_spend_update_queue service in the time period Shown as request |
litellm.redis.failed_requests.count (count) | Number of failed requests for Redis service in the time period Shown as error |
litellm.redis.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for Redis service |
litellm.redis.latency.count (count) | Number of latency observations for Redis service in the time period |
litellm.redis.latency.sum (count) | Total latency (milliseconds) for Redis service Shown as millisecond |
litellm.redis.spend_update_queue.size (gauge) | Gauge for redis_spend_update_queue service Shown as item |
litellm.redis.total_requests.count (count) | Number of requests for Redis service in the time period Shown as request |
litellm.remaining.api_key.budget.metric (gauge) | Remaining budget for api key |
litellm.remaining.api_key.requests_for_model (gauge) | Remaining requests API Key can make for model (model based rpm limit on key) Shown as request |
litellm.remaining.api_key.tokens_for_model (gauge) | Remaining tokens API Key can make for model (model based tpm limit on key) Shown as token |
litellm.remaining.requests (gauge) | Remaining requests for model, returned from LLM API Provider Shown as request |
litellm.remaining.team_budget.metric (gauge) | Remaining budget for team |
litellm.remaining_requests.metric (gauge) | Track x-ratelimit-remaining-requests returned from LLM API Deployment Shown as request |
litellm.remaining_tokens (gauge) | Remaining tokens for model, returned from LLM API Provider Shown as token |
litellm.request.total_latency.metric.bucket (count) | Number of observations that fall into each upper_bound total latency bucket (seconds) for a request to LiteLLM |
litellm.request.total_latency.metric.count (count) | Number of total latency observations (seconds) for a request to LiteLLM in the time period |
litellm.request.total_latency.metric.sum (count) | Total latency (seconds) for a request to LiteLLM Shown as second |
litellm.requests.metric.count (count) | Deprecated - use litellm.proxy.total_requests.metric.count. Number of LLM calls to litellm in the time period - track total per API Key, team, user Shown as request |
litellm.reset_budget_job.failed_requests.count (count) | Number of failed requests for reset_budget_job service in the time period Shown as error |
litellm.reset_budget_job.latency.bucket (count) | Latency for reset_budget_job service |
litellm.reset_budget_job.total_requests.count (count) | Number of requests for reset_budget_job service in the time period Shown as request |
litellm.router.failed_requests.count (count) | Number of failed requests for router service in the time period Shown as error |
litellm.router.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for router service |
litellm.router.latency.count (count) | Number of latency observations for router service in the time period |
litellm.router.latency.sum (count) | Latency for router service Shown as millisecond |
litellm.router.total_requests.count (count) | Number of requests for router service in the time period Shown as request |
litellm.self.failed_requests.count (count) | Number of failed requests for self service in the time period Shown as error |
litellm.self.latency.bucket (count) | Number of observations that fall into each upper_bound latency bucket for self service |
litellm.self.latency.count (count) | Number of latency observations for self service in the time period |
litellm.self.latency.sum (count) | Latency for self service Shown as millisecond |
litellm.self.total_requests.count (count) | Number of requests for self service in the time period Shown as request |
litellm.spend.metric.count (count) | Spend on LLM requests in the time period |
litellm.team.budget.remaining_hours.metric (gauge) | Remaining hours for team budget to be reset Shown as hour |
litellm.team.max_budget.metric (gauge) | Maximum budget set for team |
litellm.total.tokens.count (count) | Number of input + output tokens from LLM requests in the time period Shown as token |
The LiteLLM integration does not include any events.
litellm.openmetrics.health
Returns CRITICAL
if the Agent is unable to connect to the LiteLLM OpenMetrics endpoint, otherwise returns OK
.
Statuses: ok, critical
Need help? Contact Datadog support.