Google Machine Learning

Información general

Google Cloud Machine Learning es un servicio gestionado que permite crear fácilmente modelos de Machine Learning, que funcionen con cualquier tipo de datos, de cualquier tamaño.

Obtén métricas de Google Machine Learning para:

  • Visualizar el rendimiento de tus servicios de ML.
  • Correlacionar el rendimiento de tus servicios de ML con tus aplicaciones.

Ajuste

Instalación

Si aún no lo has hecho, configura la integración Google Cloud Platform. No es necesario realizar ningún otro paso de instalación.

APM

Los logs de Google Cloud Machine Learning se recopilan con Google Cloud Logging y se envían a una tarea de Dataflow a través de un tema Cloud Pub/Sub. Si aún no lo has hecho, configura la generación de logs con la plantilla Dataflow de Datadog.

Una vez hecho esto, exporta tus logs de Google Cloud Machine Learning de Google Cloud Logging al tema Pub/Sub:

  1. Ve a la página de Google Cloud Logging y filtra logs de Google Cloud Machine Learning.
  2. Haz clic en Create Export (Crear exportación) y asigna un nombre al sumidero.
  3. Elige “Cloud Pub/Sub” como destino y selecciona el tema Pub/Sub creado para tal fin. Nota: El tema Pub/Sub puede encontrarse en un proyecto diferente.
  4. Haz clic en Create (Crear) y espera a que aparezca el mensaje de confirmación.

Datos recopilados

Métricas

gcp.ml.prediction.error_count
(count)
Cumulative count of prediction errors.
gcp.ml.prediction.latencies.avg
(count)
The average latency of a certain type.
Shown as microsecond
gcp.ml.prediction.latencies.samplecount
(count)
The sample count for latency of a certain type.
Shown as microsecond
gcp.ml.prediction.latencies.sumsqdev
(count)
The sum of squared deviation for latency of a certain type.
Shown as microsecond
gcp.ml.prediction.online.accelerator.duty_cycle
(gauge)
Average fraction of time over the past sample period during which the accelerator(s) were actively processing.
gcp.ml.prediction.online.accelerator.memory.bytes_used
(gauge)
Amount of accelerator memory allocated by the model replica.
Shown as byte
gcp.ml.prediction.online.cpu.utilization
(gauge)
Fraction of CPU allocated by the model replica and currently in use. May exceed 100% if the machine type has multiple CPUs.
gcp.ml.prediction.online.memory.bytes_used
(gauge)
Amount of memory allocated by the model replica and currently in use.
Shown as byte
gcp.ml.prediction.online.network.bytes_received
(count)
Number of bytes received over the network by the model replica.
Shown as byte
gcp.ml.prediction.online.network.bytes_sent
(count)
Number of bytes sent over the network by the model replica.
Shown as byte
gcp.ml.prediction.online.replicas
(gauge)
Number of active model replicas.
gcp.ml.prediction.online.target_replicas
(gauge)
Aspired number of active model replicas.
gcp.ml.prediction.prediction_count
(count)
Cumulative count of predictions.
gcp.ml.prediction.response_count
(count)
Cumulative count of different response codes.
gcp.ml.training.accelerator.memory.utilization
(gauge)
Fraction of allocated accelerator memory that is currently in use. Values are numbers between 0.0 and 1.0, charts display the values as a percentage between 0% and 100%.
gcp.ml.training.accelerator.utilization
(gauge)
Fraction of allocated accelerator that is currently in use. Values are numbers between 0.0 and 1.0, charts display the values as a percentage between 0% and 100%.
gcp.ml.training.cpu.utilization
(gauge)
Fraction of allocated CPU that is currently in use. Values are numbers between 0.0 and 1.0, charts display the values as a percentage between 0% and 100%.
gcp.ml.training.memory.utilization
(gauge)
Fraction of allocated memory that is currently in use. Values are numbers between 0.0 and 1.0, charts display the values as a percentage between 0% and 100%.
gcp.ml.training.network.received_bytes_count
(count)
Number of bytes received by the training job over the network.
Shown as byte
gcp.ml.training.network.sent_bytes_count
(count)
Number of bytes sent by the training job over the network.
Shown as byte

Eventos

La integración Google Cloud Machine Learning no incluye eventos.

Checks de servicio

La integración Google Cloud Machine Learning no incluye checks de servicio.

Solucionar problemas

¿Necesitas ayuda? Ponte en contacto con el servicio de asistencia de Datadog.

Lectura adicional