Nvidia Triton

Supported OS Linux

Versión de la integración2.2.0

Información general

Este check monitoriza Nvidia Triton a través del Datadog Agent.

Configuración

Sigue las instrucciones a continuación para instalar y configurar este check para un Agent que se ejecuta en un host. Para entornos en contenedores, consulta las plantillas de integración de Autodiscovery para obtener orientación sobre la aplicación de estas instrucciones.

Instalación

El check de Nvidia Triton está incluido en el paquete del Datadog Agent. No es necesaria ninguna instalación adicional en tu servidor.

Endpoint de OpenMetrics

Por defecto, el servidor Nvidia Triton expone todas las métricas a través del endpoint Prometheus. Para habilitar todos los informes de métricas:

tritonserver --allow-metrics=true

Para cambiar el endpoint de métricas, utiliza la opción --métricas-address.

Ejemplo:

tritonserver --metrics-address=http://0.0.0.0:8002

En este caso, el endpoint de OpenMetrics se expone en esta URL: http://<NVIDIA_TRITON_ADDRESS>:8002/metrics.

Las métricas de resumen de latencia están desactivadas por defecto. Para activar las métricas de resumen de latencia, utiliza el siguiente comando:

tritonserver --metrics-config summary_latencies=true

Las métricas de caché de respuesta no se informan por defecto. Es necesario habilitar una implementación de caché del lado del servidor especificando una <cache_implementation> y la configuración correspondiente.

Por ejemplo:

tritonserver --cache-config local,size=1048576

Nvidia Triton también ofrece la posibilidad de exponer métricas personalizadas a través de su endpoint Openemtrics. Datadog también puede recopilar estas métricas personalizadas utilizando la opción extra_metrics.

Estas métricas Nvidia Triton personalizadas se consideran métricas estándar en Datadog.

Configuración

  1. Edita el archivo nvidia_triton.d/conf.yaml, que se encuentra en la carpeta conf.d/ en la raíz del directorio de configuración del Agent, para empezar a recopilar los datos de rendimiento de tu nvidia_triton. Para conocer todas las opciones de configuración disponibles, consulta el nvidia_triton.d/conf.yaml de ejemplo.

  2. Reinicia el Agent.

Validación

Ejecuta el subcomando de estado del Agent y busca nvidia_triton en la sección Checks.

Datos recopilados

Métricas

nvidia_triton.cache.insertion.duration
(gauge)
Total cache insertion duration, in microseconds
Shown as microsecond
nvidia_triton.cache.lookup.duration
(gauge)
Total cache lookup duration (hit and miss), in microseconds
Shown as microsecond
nvidia_triton.cache.num.entries
(gauge)
Number of responses stored in response cache
nvidia_triton.cache.num.evictions
(gauge)
Number of cache evictions in response cache
nvidia_triton.cache.num.hits
(gauge)
Number of cache hits in response cache
nvidia_triton.cache.num.lookups
(gauge)
Number of cache lookups in response cache
nvidia_triton.cache.num.misses
(gauge)
Number of cache misses in response cache
nvidia_triton.cache.util
(gauge)
Cache utilization [0.0 - 1.0]
nvidia_triton.cpu.memory.total_bytes
(gauge)
CPU total memory (RAM), in bytes
Shown as byte
nvidia_triton.cpu.memory.used_bytes
(gauge)
CPU used memory (RAM), in bytes
Shown as byte
nvidia_triton.cpu.utilization
(gauge)
CPU utilization rate [0.0 - 1.0]
nvidia_triton.energy.consumption.count
(count)
GPU energy consumption in joules since the Triton Server started
nvidia_triton.gpu.memory.total_bytes
(gauge)
GPU total memory, in bytes
Shown as byte
nvidia_triton.gpu.memory.used_bytes
(gauge)
GPU used memory, in bytes
Shown as byte
nvidia_triton.gpu.power.limit
(gauge)
GPU power management limit in watts
Shown as watt
nvidia_triton.gpu.power.usage
(gauge)
GPU power usage in watts
Shown as watt
nvidia_triton.gpu.utilization
(gauge)
GPU utilization rate [0.0 - 1.0)
nvidia_triton.inference.compute.infer.duration_us.count
(count)
Cumulative compute inference duration in microseconds (does not include cached requests)
Shown as microsecond
nvidia_triton.inference.compute.infer.summary_us.count
(count)
Cumulative compute inference duration in microseconds (count) (does not include cached requests)
Shown as microsecond
nvidia_triton.inference.compute.infer.summary_us.quantile
(gauge)
Cumulative compute inference duration in microseconds (quantile)(does not include cached requests)
Shown as microsecond
nvidia_triton.inference.compute.infer.summary_us.sum
(count)
Cumulative compute inference duration in microseconds (sum) (does not include cached requests)
Shown as microsecond
nvidia_triton.inference.compute.input.duration_us.count
(count)
Cumulative compute input duration in microseconds (does not include cached requests)
Shown as microsecond
nvidia_triton.inference.compute.input.summary_us.count
(count)
Cumulative compute input duration in microseconds (sum) (does not include cached requests)
Shown as microsecond
nvidia_triton.inference.compute.input.summary_us.quantile
(gauge)
Cumulative compute input duration in microseconds (quantile) (does not include cached requests)
Shown as microsecond
nvidia_triton.inference.compute.input.summary_us.sum
(count)
Cumulative compute input duration in microseconds (count) (does not include cached requests)
Shown as microsecond
nvidia_triton.inference.compute.output.duration_us.count
(count)
Cumulative inference compute output duration in microseconds (does not include cached requests)
Shown as microsecond
nvidia_triton.inference.compute.output.summary_us.count
(count)
Cumulative inference compute output duration in microseconds (count) (does not include cached requests)
Shown as microsecond
nvidia_triton.inference.compute.output.summary_us.quantile
(gauge)
Cumulative inference compute output duration in microseconds (quantile) (does not include cached requests)
Shown as microsecond
nvidia_triton.inference.compute.output.summary_us.sum
(count)
Cumulative inference compute output duration in microseconds (sum) (does not include cached requests)
Shown as microsecond
nvidia_triton.inference.count.count
(count)
Number of inferences performed (does not include cached requests)
nvidia_triton.inference.exec.count.count
(count)
Number of model executions performed (does not include cached requests)
nvidia_triton.inference.pending.request.count
(gauge)
Instantaneous number of pending requests awaiting execution per-model.
nvidia_triton.inference.queue.duration_us.count
(count)
Cumulative inference queuing duration in microseconds (includes cached requests)
Shown as microsecond
nvidia_triton.inference.queue.summary_us.count
(count)
Summary of inference queuing duration in microseconds (count) (includes cached requests)
Shown as microsecond
nvidia_triton.inference.queue.summary_us.quantile
(gauge)
Summary of inference queuing duration in microseconds (quantile) (includes cached requests)
Shown as microsecond
nvidia_triton.inference.queue.summary_us.sum
(count)
Summary of inference queuing duration in microseconds (sum) (includes cached requests)
Shown as microsecond
nvidia_triton.inference.request.duration_us.count
(count)
Cumulative inference request duration in microseconds (includes cached requests)
Shown as microsecond
nvidia_triton.inference.request.summary_us.count
(count)
Summary of inference request duration in microseconds (count) (includes cached requests)
Shown as microsecond
nvidia_triton.inference.request.summary_us.quantile
(gauge)
Summary of inference request duration in microseconds (quantile) (includes cached requests)
Shown as microsecond
nvidia_triton.inference.request.summary_us.sum
(count)
Summary of inference request duration in microseconds (sum) (includes cached requests)
Shown as microsecond
nvidia_triton.inference.request_failure.count
(count)
Number of failed inference requests, all batch sizes
nvidia_triton.inference.request_success.count
(count)
Number of successful inference requests, all batch sizes

Eventos

La integración Nvidia Triton no incluye eventos.

Checks de servicio

nvidia_triton.openmetrics.health
Returns CRITICAL if the Agent is unable to connect to the Nvidia Triton OpenMetrics endpoint, otherwise returns OK.
Statuses: ok, critical

nvidia_triton.health.status
Returns CRITICAL if the Server is having a 4xx or 5xx response, OK if the response is 200, and unknown for everything else.
Statuses: ok, warning, critical

Logs

La integración Nvidia Triton puede recopilar logs del servidor Nvidia Triton y reenviarlos a Datadog.

  1. La recopilación de logs está desactivada por defecto en el Datadog Agent . Actívala en tu archivo datadog.yaml:

    logs_enabled: true
    
  2. Descomenta y edita el bloque de configuración de logs en tu archivo nvidia_triton.d/conf.yaml. A continuación podrás ver un ejemplo:

    logs:
      - type: docker
        source: nvidia_triton
        service: nvidia_triton
    

La recopilación de logs se encuentra deshabilitada de manera predeterminada en el Datadog Agent. Para habilitarla, consulta Recopilación de logs de Kubernetes.

A continuación, configura las Integraciones de logs como anotaciones de pod. Esto también se puede configurar con un archivo, un configmap o un almacén de valores clave. Para obtener más información, consulta la sección Recopilación de logs de Kubernetes.

Anotaciones v1/v2

apiVersion: v1
kind: Pod
metadata:
  name: nvidia_triton
  annotations:
    ad.datadoghq.com/apache.logs: '[{"source":"nvidia_triton","service":"nvidia_triton"}]'
spec:
  containers:
    - name: ray

Solucionar problemas

¿Necesitas ayuda? Ponte en contacto con el equipo de asistencia de Datadog.