---
title: Hugging Face TGI
description: >-
  Monitor the model serving performance and system health of your Hugging Face
  TGI servers.
breadcrumbs: Docs > Integrations > Hugging Face TGI
---

# Hugging Face TGI
Supported OS Integration version1.5.0Hugging Face TGI - Overview
## Overview{% #overview %}

This check monitors [Hugging Face Text Generation Inference (TGI)](https://huggingface.co/docs/text-generation-inference/index) through the Datadog Agent. TGI is a library for deploying and serving large language models (LLMs) optimized for text generation. It provides features such as continuous batching, tensor parallelism, token streaming, and optimizations for production use.

The integration provides comprehensive monitoring of your TGI servers by collecting:

- Request performance metrics, including latency, throughput, and token generation rates
- Batch processing metrics for inference optimization insights
- Queue depth and request flow monitoring
- Model serving health and operational metrics

This enables teams to optimize LLM inference performance, track resource utilization, troubleshoot bottlenecks, and ensure reliable model serving at scale.

**Minimum Agent version:** 7.70.1

## Setup{% #setup %}

Follow these instructions to install and configure this check for an Agent running on a host. For containerized environments, see the [Autodiscovery Integration Templates](https://docs.datadoghq.com/agent/kubernetes/integrations/) for guidance on applying these instructions.

### Installation{% #installation %}

The Hugging Face TGI check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package. No additional installation is needed on your server.

### Configuration{% #configuration %}

#### Metrics{% #metrics %}

1. Ensure that your TGI server exposes Prometheus metrics on the default `/metrics` endpoint. For more information, see the [TGI monitoring documentation](https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/monitoring).

1. Edit `hugging_face_tgi.d/conf.yaml`, located in the `conf.d/` folder at the root of your [Agent's configuration directory](https://docs.datadoghq.com/agent/configuration/agent-configuration-files/#agent-configuration-directory), to start collecting Hugging Face TGI performance data. See the [sample configuration file](https://github.com/DataDog/integrations-core/blob/master/hugging_face_tgi/datadog_checks/hugging_face_tgi/data/conf.yaml.example) for all available options.

   ```yaml
   instances:
     - openmetrics_endpoint: http://localhost:80/metrics
   ```

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands/#start-stop-and-restart-the-agent).

#### Logs{% #logs %}

The Hugging Face TGI integration can collect logs from the server container and forward them to Datadog. The TGI server container needs to be started with the environment variable `NO_COLOR=1` and the option `--json-output` for the logs output to be correctly parsed by Datadog. After setting these variables, the server must be restarted to enable log ingestion.

{% tab title="Host" %}

1. Collecting logs is disabled by default in the Datadog Agent. Enable it in your `datadog.yaml` file:

   ```yaml
   logs_enabled: true
   ```

1. Uncomment and edit the logs configuration block in your `hugging_face_tgi.d/conf.yaml` file. Here's an example:

   ```yaml
   logs:
     - type: docker
       source: hugging_face_tgi
       service: text-generation-inference
       auto_multi_line_detection: true
   ```

{% /tab %}

{% tab title="Kubernetes" %}
Collecting logs is disabled by default in the Datadog Agent. To enable it, see [Kubernetes Log Collection](https://docs.datadoghq.com/agent/kubernetes/log/#setup).

Then, set Log Integrations as pod annotations. This can also be configured with a file, a configmap, or a key-value store. For more information, see the configuration section of [Kubernetes Log Collection](https://docs.datadoghq.com/agent/kubernetes/log/#configuration).
{% /tab %}

### Validation{% #validation %}

[Run the Agent's status subcommand](https://docs.datadoghq.com/agent/guide/agent-commands/#agent-status-and-information) and look for `hugging_face_tgi` under the Checks section.

## Data Collected{% #data-collected %}

### Metrics{% #metrics-1 %}

|  |
|  |
| **hugging\_face\_tgi.batch.concat.count**(count)                             | Number of batch concatenates                                       |
| **hugging\_face\_tgi.batch.concat.duration.bucket**(count)                   | Batch concatenation duration distribution                          |
| **hugging\_face\_tgi.batch.concat.duration.count**(count)                    | Number of batch concatenation duration measurements                |
| **hugging\_face\_tgi.batch.concat.duration.sum**(count)                      | Total batch concatenation duration*Shown as second*                |
| **hugging\_face\_tgi.batch.current.max\_tokens**(gauge)                      | Maximum tokens the current batch will grow to*Shown as token*      |
| **hugging\_face\_tgi.batch.current.size**(gauge)                             | Current batch size*Shown as request*                               |
| **hugging\_face\_tgi.batch.decode.duration.bucket**(count)                   | Batch decode duration distribution                                 |
| **hugging\_face\_tgi.batch.decode.duration.count**(count)                    | Number of batch decode duration measurements                       |
| **hugging\_face\_tgi.batch.decode.duration.sum**(count)                      | Total batch decode duration*Shown as second*                       |
| **hugging\_face\_tgi.batch.filter.duration.bucket**(count)                   | Batch filtering duration distribution                              |
| **hugging\_face\_tgi.batch.filter.duration.count**(count)                    | Number of batch filter duration measurements                       |
| **hugging\_face\_tgi.batch.filter.duration.sum**(count)                      | Total batch filter duration*Shown as second*                       |
| **hugging\_face\_tgi.batch.forward.duration.bucket**(count)                  | Batch forward duration distribution                                |
| **hugging\_face\_tgi.batch.forward.duration.count**(count)                   | Number of batch forward duration measurements                      |
| **hugging\_face\_tgi.batch.forward.duration.sum**(count)                     | Total batch forward duration*Shown as second*                      |
| **hugging\_face\_tgi.batch.inference.count**(count)                          | Total number of batch inferences                                   |
| **hugging\_face\_tgi.batch.inference.duration.bucket**(count)                | Batch inference duration distribution                              |
| **hugging\_face\_tgi.batch.inference.duration.count**(count)                 | Number of batch inference duration measurements                    |
| **hugging\_face\_tgi.batch.inference.duration.sum**(count)                   | Total batch inference duration*Shown as second*                    |
| **hugging\_face\_tgi.batch.inference.success.count**(count)                  | Number of successful batch inferences                              |
| **hugging\_face\_tgi.batch.next.size.bucket**(count)                         | Next batch size distribution                                       |
| **hugging\_face\_tgi.batch.next.size.count**(count)                          | Number of next batch size measurements                             |
| **hugging\_face\_tgi.batch.next.size.sum**(count)                            | Total next batch size*Shown as request*                            |
| **hugging\_face\_tgi.queue.size**(gauge)                                     | Number of requests waiting in the internal queue*Shown as request* |
| **hugging\_face\_tgi.request.count**(count)                                  | Total number of requests received*Shown as request*                |
| **hugging\_face\_tgi.request.duration.bucket**(count)                        | Request duration distribution                                      |
| **hugging\_face\_tgi.request.duration.count**(count)                         | Number of request duration measurements                            |
| **hugging\_face\_tgi.request.duration.sum**(count)                           | Total request duration*Shown as second*                            |
| **hugging\_face\_tgi.request.failure.count**(count)                          | Number of failed requests*Shown as request*                        |
| **hugging\_face\_tgi.request.generated\_tokens.bucket**(count)               | Generated tokens per request distribution                          |
| **hugging\_face\_tgi.request.generated\_tokens.count**(count)                | Number of generated token measurements                             |
| **hugging\_face\_tgi.request.generated\_tokens.sum**(count)                  | Total generated tokens*Shown as token*                             |
| **hugging\_face\_tgi.request.inference.duration.bucket**(count)              | Request inference duration distribution                            |
| **hugging\_face\_tgi.request.inference.duration.count**(count)               | Number of request inference duration measurements                  |
| **hugging\_face\_tgi.request.inference.duration.sum**(count)                 | Total request inference duration*Shown as second*                  |
| **hugging\_face\_tgi.request.input\_length.bucket**(count)                   | Input token length per request distribution                        |
| **hugging\_face\_tgi.request.input\_length.count**(count)                    | Number of input length measurements                                |
| **hugging\_face\_tgi.request.input\_length.sum**(count)                      | Total input length*Shown as token*                                 |
| **hugging\_face\_tgi.request.max\_new\_tokens.bucket**(count)                | Maximum new tokens per request distribution                        |
| **hugging\_face\_tgi.request.max\_new\_tokens.count**(count)                 | Number of max new tokens measurements                              |
| **hugging\_face\_tgi.request.max\_new\_tokens.sum**(count)                   | Total max new tokens*Shown as token*                               |
| **hugging\_face\_tgi.request.mean\_time\_per\_token.duration.bucket**(count) | Mean time per token duration distribution                          |
| **hugging\_face\_tgi.request.mean\_time\_per\_token.duration.count**(count)  | Number of mean time per token measurements                         |
| **hugging\_face\_tgi.request.mean\_time\_per\_token.duration.sum**(count)    | Total mean time per token duration*Shown as second*                |
| **hugging\_face\_tgi.request.queue.duration.bucket**(count)                  | Request queue duration distribution                                |
| **hugging\_face\_tgi.request.queue.duration.count**(count)                   | Number of request queue duration measurements                      |
| **hugging\_face\_tgi.request.queue.duration.sum**(count)                     | Total request queue duration*Shown as second*                      |
| **hugging\_face\_tgi.request.skipped\_tokens.count**(count)                  | Number of skipped token measurements                               |
| **hugging\_face\_tgi.request.skipped\_tokens.quantile**(gauge)               | Skipped tokens per request quantile*Shown as token*                |
| **hugging\_face\_tgi.request.skipped\_tokens.sum**(count)                    | Total skipped tokens*Shown as token*                               |
| **hugging\_face\_tgi.request.success.count**(count)                          | Number of successful requests*Shown as request*                    |
| **hugging\_face\_tgi.request.validation.duration.bucket**(count)             | Request validation duration distribution                           |
| **hugging\_face\_tgi.request.validation.duration.count**(count)              | Number of request validation duration measurements                 |
| **hugging\_face\_tgi.request.validation.duration.sum**(count)                | Total request validation duration*Shown as second*                 |

Key metrics include:

- **Request metrics**: Total requests, successful requests, failed requests, and request duration
- **Queue metrics**: Queue size and queue duration for monitoring throughput bottlenecks
- **Token metrics**: Generated tokens, input length, and mean time per token for performance analysis
- **Batch metrics**: Batch size, batch concatenation, and batch processing durations for optimization insights
- **Inference metrics**: Forward pass duration, decode duration, and filter duration for model performance monitoring

### Events{% #events %}

The Hugging Face TGI integration does not include any events.

### Service Checks{% #service-checks %}

See [service_checks.json](https://github.com/DataDog/integrations-core/blob/master/hugging_face_tgi/assets/service_checks.json) for a list of service checks provided by this integration.

## Troubleshooting{% #troubleshooting %}

In containerized environments, ensure that the Agent has network access to the TGI metrics endpoint specified in `hugging_face_tgi.d/conf.yaml`.

If you wish to ingest non JSON TGI logs, use the following logs configuration:

```yaml
   logs:
     - type: docker
       source: hugging_face_tgi
       service: text-generation-inference
       auto_multi_line_detection: true
       log_processing_rules:
         - type: mask_sequences
           name: strip_ansi
           pattern: "\\x1B\\[[0-9;]*m"
           replace_placeholder: ""
```

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).