TorchServe

Supported OS Linux Windows Mac OS

Integration version2.2.0

Overview

This check monitors TorchServe through the Datadog Agent.

Setup

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.

Installation

Starting from Agent release 7.47.0, the TorchServe check is included in the Datadog Agent package. No additional installation is needed on your server.

This check uses OpenMetrics to collect metrics from the OpenMetrics endpoint TorchServe can expose, which requires Python 3.

Prerequisites

The TorchServe check collects TorchServe’s metrics and performance data using three different endpoints:

You can configure these endpoints using the config.properties file, as described in the TorchServe documentation. For example:

inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
metrics_mode=prometheus
number_of_netty_threads=32
default_workers_per_model=10
job_queue_size=1000
model_store=/home/model-server/model-store
workflow_store=/home/model-server/wf-store
load_models=all

This configuration file exposes the three different endpoints that can be used by the integration to monitor your instance.

OpenMetrics endpoint

To enable the Prometheus endpoint, you need to configure two options:

  • metrics_address: Metrics API binding address. Defaults to http://127.0.0.1:8082
  • metrics_mode: Two metric modes are supported by TorchServe: log and prometheus. Defaults to log. You have to set it to prometheus to collect metrics from this endpoint.

For instance:

metrics_address=http://0.0.0.0:8082
metrics_mode=prometheus

In this case, the OpenMetrics endpoint is exposed at this URL: http://<TORCHSERVE_ADDRESS>:8082/metrics.

Configuration

These three different endpoints can be monitored independently and must be configured separately in the configuration file, one API per instance. See the sample torchserve.d/conf.yaml for all available configuration options.

Configure the OpenMetrics endpoint

Configuration options for the OpenMetrics endpoint can be found in the configuration file under the TorchServe OpenMetrics endpoint configuration section. The minimal configuration only requires the openmetrics_endpoint option:

init_config:
  ...
instances:
  - openmetrics_endpoint: http://<TORCHSERVE_ADDRESS>:8082/metrics

For more options, see the sample torchserve.d/conf.yaml file.

TorchServe allows the custom service code to emit metrics that will be available based on the configured metrics_mode. You can configure this integration to collect these metrics using the extra_metrics option. These metrics will have the torchserve.openmetrics prefix, just like any other metrics coming from this endpoint.

These custom TorchServe metrics are considered standard metrics in Datadog.

Configure the Inference API

This integration relies on the Inference API to get the overall status of your TorchServe instance. Configuration options for the Inference API can be found in the configuration file under the TorchServe Inference API endpoint configuration section. The minimal configuration only requires the inference_api_url option:

init_config:
  ...
instances:
  - inference_api_url: http://<TORCHSERVE_ADDRESS>:8080

This integration leverages the Ping endpoint to collect the overall health status of your TorchServe server.

Configure the Management API

You can collect metrics related to the models that are currently running in your TorchServe server using the Management API. Configuration options for the Inference API can be found in the configuration file under the TorchServe Management API endpoint configuration section. The minimal configuration only requires the management_api_url option:

init_config:
  ...
instances:
  - management_api_url: http://<TORCHSERVE_ADDRESS>:8081

By default, the integration collects data from every single models, up to 100 models. This can be modified using the limit, include, and exclude options. For example:

init_config:
  ...
instances:
  - management_api_url: http://<TORCHSERVE_ADDRESS>:8081
    limit: 25
    include: 
      - my_model.* 

This configuration only collects metrics for model names that match the my_model.* regular expression, up to 25 models.

You can also exclude some models:

init_config:
  ...
instances:
  - management_api_url: http://<TORCHSERVE_ADDRESS>:8081
    exclude: 
      - test.* 

This configuration collects metrics for every model name that does not match the test.* regular expression, up to 100 models.

You can use the `include` and `exclude` options in the same configuration. The `exclude` filters are applied after the `include` ones.

By default, the integration retrieves the full list of the models every time the check runs. You can cache this list by using the interval option for increased performance of this check.

Using the `interval` option can also delay some metrics and events.

Complete configuration

This example demonstrates the complete configuration leveraging the three different APIs described in the previous sections:

init_config:
  ...
instances:
  - openmetrics_endpoint: http://<TORCHSERVE_ADDRESS>:8082/metrics
    # Also collect your own TorchServe metrics
    extra_metrics:
      - my_custom_torchserve_metric
  - inference_api_url: http://<TORCHSERVE_ADDRESS>:8080
  - management_api_url: http://<TORCHSERVE_ADDRESS>:8081
    # Include all the model names that match this regex   
    include:
      - my_models.*
    # But exclude all the ones that finish with `-test`
    exclude: 
      - .*-test 
    # Refresh the list of models only every hour
    interval: 3600

Restart the Agent after modifying the configuration.

This example demonstrates the complete configuration leveraging the three different APIs described in the previous sections as a Docker label inside docker-compose.yml:

labels:
  com.datadoghq.ad.checks: '{"torchserve":{"instances":[{"openmetrics_endpoint":"http://%%host%%:8082/metrics","extra_metrics":["my_custom_torchserve_metric"]},{"inference_api_url":"http://%%host%%:8080"},{"management_api_url":"http://%%host%%:8081","include":["my_models.*"],"exclude":[".*-test"],"interval":3600}]}}'

This example demonstrates the complete configuration leveraging the three different APIs described in the previous sections as Kubernetes annotations on your Torchserve pods:

apiVersion: v1
kind: Pod
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/torchserve.checks: |-
      {
        "torchserve": {
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:8082/metrics",
              "extra_metrics": [
                "my_custom_torchserve_metric"
              ]
            },
            {
              "inference_api_url": "http://%%host%%:8080"
            },
            {
              "management_api_url": "http://%%host%%:8081",
              "include": [
                ".*"
              ],
              "exclude": [
                ".*-test"
              ],
              "interval": 3600
            }
          ]
        }
      }      
    # (...)
spec:
  containers:
    - name: 'torchserve'
# (...)

Validation

Run the Agent’s status subcommand and look for torchserve under the Checks section.

Data Collected

Metrics

torchserve.management_api.model.batch_size
(gauge)
Maximum batch size that a model is expected to handle.
torchserve.management_api.model.is_loaded_at_startup
(gauge)
Whether or not the model was loaded when TorchServe started. 1 if true, 0 otherwise.
torchserve.management_api.model.max_batch_delay
(gauge)
The maximum batch delay time in ms TorchServe waits to receive batch_size number of requests.
Shown as millisecond
torchserve.management_api.model.version.is_default
(gauge)
Whether or not this version of the model is the default one. 1 if true, 0 otherwise.
torchserve.management_api.model.versions
(gauge)
Total number of versions for a given model.
torchserve.management_api.model.worker.is_gpu
(gauge)
Whether or not this worker is using a GPU. 1 if true, 0 otherwise.
torchserve.management_api.model.worker.memory_usage
(gauge)
Memory used by the worker in byte.
Shown as byte
torchserve.management_api.model.worker.status
(gauge)
The status of a given worker. 1 if ready, 2 if loading, 3 if unloading, 0 otherwise.
torchserve.management_api.model.workers.current
(gauge)
Current number of workers of a given model.
torchserve.management_api.model.workers.max
(gauge)
Maximum number of workers defined of a given model.
torchserve.management_api.model.workers.min
(gauge)
Minimum number of workers defined of a given model.
torchserve.management_api.models
(gauge)
Total number of models.
torchserve.openmetrics.cpu.utilization
(gauge)
CPU utilization on host.
Shown as percent
torchserve.openmetrics.disk.available
(gauge)
Disk available on host.
Shown as gigabyte
torchserve.openmetrics.disk.used
(gauge)
Memory used on host.
Shown as gigabyte
torchserve.openmetrics.disk.utilization
(gauge)
Disk utilization on host.
Shown as percent
torchserve.openmetrics.gpu.memory.used
(gauge)
GPU memory used on host.
Shown as megabyte
torchserve.openmetrics.gpu.memory.utilization
(gauge)
GPU memory utilization on host.
Shown as percent
torchserve.openmetrics.gpu.utilization
(gauge)
GPU utilization on host.
Shown as percent
torchserve.openmetrics.handler_time
(gauge)
Time spent in backend handler.
Shown as millisecond
torchserve.openmetrics.inference.count
(count)
Total number of inference requests received.
Shown as request
torchserve.openmetrics.inference.latency.count
(count)
Total inference latency in Microseconds.
Shown as microsecond
torchserve.openmetrics.memory.available
(gauge)
Memory available on host.
Shown as megabyte
torchserve.openmetrics.memory.used
(gauge)
Memory used on host.
Shown as megabyte
torchserve.openmetrics.memory.utilization
(gauge)
Memory utilization on host.
Shown as percent
torchserve.openmetrics.prediction_time
(gauge)
Backend prediction time.
Shown as millisecond
torchserve.openmetrics.queue.latency.count
(count)
Total queue latency in Microseconds.
Shown as microsecond
torchserve.openmetrics.queue.time
(gauge)
Time spent by a job in request queue in Milliseconds.
Shown as millisecond
torchserve.openmetrics.requests.2xx.count
(count)
Total number of requests with response in 200-300 status code range.
Shown as request
torchserve.openmetrics.requests.4xx.count
(count)
Total number of requests with response in 400-500 status code range.
Shown as request
torchserve.openmetrics.requests.5xx.count
(count)
Total number of requests with response status code above 500.
Shown as request
torchserve.openmetrics.worker.load_time
(gauge)
Time taken by worker to load model in Milliseconds.
Shown as millisecond
torchserve.openmetrics.worker.thread_time
(gauge)
Time spent in worker thread excluding backend response time in Milliseconds.
Shown as millisecond

Metrics are prefixed using the API they are coming from:

  • torchserve.openmetrics.* for metrics coming from the OpenMetrics endpoint.
  • torchserve.inference_api.* for metrics coming from the Inference API.
  • torchserve.management_api.* for metrics coming from the Management API.

Events

The TorchServe integration include three events using the Management API:

  • torchserve.management_api.model_added: This event fires when a new model has been added.
  • torchserve.management_api.model_removed: This event fires when a model has been removed.
  • torchserve.management_api.default_version_changed: This event fires when a default version has been set for a given model.
You can disable the events setting the `submit_events` option to `false` in your configuration file.

Service Checks

torchserve.openmetrics.health
Returns CRITICAL if the Agent is unable to connect to the OpenMetrics endpoint, otherwise returns OK.
Statuses: ok, critical

torchserve.inference_api.health
Returns CRITICAL if the Agent is unable to connect to the Inference API endpoint or if it is unhealthy, otherwise returns OK.
Statuses: ok, critical

torchserve.management_api.health
Returns CRITICAL if the Agent is unable to connect to the Management API endpoint, otherwise returns OK.
Statuses: ok, critical

Logs

The TorchServe integration can collect logs from the TorchServe service and forward them to Datadog.

  1. Collecting logs is disabled by default in the Datadog Agent. Enable it in your datadog.yaml file:

    logs_enabled: true
    
  2. Uncomment and edit the logs configuration block in your torchserve.d/conf.yaml file. Here’s an example:

    logs:
      - type: file
        path: /var/log/torchserve/model_log.log
        source: torchserve
        service: torchserve
      - type: file
        path: /var/log/torchserve/ts_log.log
        source: torchserve
        service: torchserve
    

See the example configuration file on how to collect all logs.

For more information about the logging configuration with TorchServe, see the official TorchServe documentation.

You can also collect logs from the `access_log.log` file. However, these logs are included in the `ts_log.log` file, leading you to duplicated logs in Datadog if you configure both files.

Troubleshooting

Need help? Contact Datadog support.