Google Cloud Vertex AI

Información general

Google Cloud Vertex AI permite a los desarrolladores, científicos de datos e ingenieros de datos de Machine Learning llevar sus proyectos desde la ideación hasta el despliegue, de forma rápida y rentable. Entrena modelos de Machine Learning personalizados de alta calidad con un mínimo de experiencia en Machine Learning y de esfuerzo.

Configuración

Instalación

Recopilación de métricas

Google Cloud Vertex AI se incluye en el paquete de la integración Google Cloud Platform. Si aún no lo has hecho, configura primero la integración Google Cloud Platform para empezar a recopilar métricas predefinidas.

Configuración

Para recopilar etiquetas (labels) de Vertex AI como etiquetas (tags), activa el rol de Visor de recursos en la nube.

Puedes utilizar la suplantación de cuentas de servicio y la detección automática de proyectos para integrar Datadog con Google Cloud.

Este método te permite monitorizar todos los proyectos visibles para una cuenta de servicio, asignando roles IAM en los proyectos pertinentes. Puedes asignar estos roles a proyectos individualmente o puedes configurar Datadog para monitorizar grupos de proyectos, asignando estos roles a nivel de organización o de carpeta. Asignar roles de esta manera permite a Datadog detectar automáticamente y monitorizar todos los proyectos en el contexto determinado, incluyendo los nuevos proyectos que puedan añadirse al grupo en el futuro.

Recopilación de logs

Los logs de Google Cloud Vertex AI se recopilan con Google Cloud Logging y se envían a un trabajo Dataflow a través de un tema Cloud Pub/Sub. Si aún no lo has hecho, configura la generación de logs con la plantilla Dataflow Datadog.

Una vez hecho esto, exporta tus logs de Google Cloud Vertex AI desde Google Cloud Logging al tema Pub/Sub:

  1. Ve a la página de Google Cloud Logging y filtra los logs de Google Cloud Vertex AI.
  2. Haz clic en Create sink (Crear sumidero) y asigna al sumidero el nombre correspondiente.
  3. Elige “Cloud Pub/Sub” como destino y selecciona el tema Pub/Sub creado para tal fin. Nota: El tema Pub/Sub puede encontrarse en un proyecto diferente.
  4. Haz clic en Create (Crear) y espera a que aparezca el mensaje de confirmación.

Datos recopilados

Métricas

gcp.aiplatform.executing_vertexai_pipeline_jobs
(gauge)
Number of pipeline jobs being executed.
gcp.aiplatform.executing_vertexai_pipeline_tasks
(gauge)
Number of pipeline tasks being executed.
gcp.aiplatform.featureonlinestore.online_serving.request_count
(count)
Number of requests received.
gcp.aiplatform.featureonlinestore.online_serving.serving_bytes_count
(count)
Serving response bytes count.
Shown as byte
gcp.aiplatform.featureonlinestore.online_serving.serving_latencies.avg
(count)
The average server side request latency.
Shown as millisecond
gcp.aiplatform.featureonlinestore.online_serving.serving_latencies.samplecount
(count)
The sample count for server side request latency.
Shown as millisecond
gcp.aiplatform.featureonlinestore.online_serving.serving_latencies.sumsqdev
(count)
The sum of squared deviation for server side request latency.
Shown as millisecond
gcp.aiplatform.featureonlinestore.running_sync
(gauge)
Number of running syncs at given point of time.
gcp.aiplatform.featureonlinestore.serving_data_ages.avg
(count)
The average measure of the serving data age in seconds. Current time minus synced time.
Shown as second
gcp.aiplatform.featureonlinestore.serving_data_ages.samplecount
(count)
The sample count for measure of the serving data age in seconds. Current time minus synced time.
Shown as second
gcp.aiplatform.featureonlinestore.serving_data_ages.sumsqdev
(count)
The sum of squared deviation for measure of the serving data age in seconds. Current time minus synced time.
Shown as second
gcp.aiplatform.featureonlinestore.serving_data_by_sync_time
(gauge)
Breakdown of data in Feature Online Store by synced timestamp.
gcp.aiplatform.featureonlinestore.storage.bigtable_cpu_load
(gauge)
The average CPU load of nodes in the Feature Online Store.
gcp.aiplatform.featureonlinestore.storage.bigtable_cpu_load_hottest_node
(gauge)
The CPU load of the hottest node in the Feature Online Store.
gcp.aiplatform.featureonlinestore.storage.bigtable_nodes
(gauge)
The number of nodes for the Feature Online Store (Bigtable).
gcp.aiplatform.featureonlinestore.storage.multi_region_bigtable_cpu_load
(gauge)
The average CPU load of nodes in the Feature Online Store with multi-regional replicas.
gcp.aiplatform.featureonlinestore.storage.multi_region_bigtable_nodes
(gauge)
The number of nodes for the Feature Online Store (Bigtable) with multi-regional replicas.
gcp.aiplatform.featureonlinestore.storage.optimized_nodes
(gauge)
The number of nodes for the Feature Online Store (Optimized).
gcp.aiplatform.featureonlinestore.storage.stored_bytes
(gauge)
Bytes stored in the Feature Online Store.
Shown as byte
gcp.aiplatform.featurestore.cpu_load
(gauge)
The average CPU load for a node in the Featurestore online storage.
gcp.aiplatform.featurestore.cpu_load_hottest_node
(gauge)
The CPU load for the hottest node in the Featurestore online storage.
gcp.aiplatform.featurestore.node_count
(gauge)
The number of nodes for the Featurestore online storage.
gcp.aiplatform.featurestore.online_entities_updated
(count)
Number of entities updated on the Featurestore online storage.
Shown as byte
gcp.aiplatform.featurestore.online_serving.latencies.avg
(count)
The average online serving latencies by EntityType.
Shown as millisecond
gcp.aiplatform.featurestore.online_serving.latencies.samplecount
(count)
The sample count for online serving latencies by EntityType.
Shown as millisecond
gcp.aiplatform.featurestore.online_serving.latencies.sumsqdev
(count)
The sum of squared deviation for online serving latencies by EntityType.
Shown as millisecond
gcp.aiplatform.featurestore.online_serving.request_bytes_count
(count)
Request size by EntityType.
Shown as byte
gcp.aiplatform.featurestore.online_serving.request_count
(count)
Featurestore online serving count by EntityType.
gcp.aiplatform.featurestore.online_serving.response_size
(count)
Response size by EntityType.
Shown as byte
gcp.aiplatform.featurestore.storage.billable_processed_bytes
(gauge)
Number of bytes billed for offline data processed.
Shown as byte
gcp.aiplatform.featurestore.storage.stored_bytes
(gauge)
Bytes stored in Featurestore.
Shown as byte
gcp.aiplatform.featurestore.streaming_write.offline_processed_count
(count)
Number of streaming write requests processed for offline storage.
gcp.aiplatform.featurestore.streaming_write.offline_write_delays.avg
(count)
The average time (in seconds) since the write API is called until it is written to offline storage.
Shown as second
gcp.aiplatform.featurestore.streaming_write.offline_write_delays.samplecount
(count)
The sample count for time (in seconds) since the write API is called until it is written to offline storage.
Shown as second
gcp.aiplatform.featurestore.streaming_write.offline_write_delays.sumsqdev
(count)
The sum of squared deviation for time (in seconds) since the write API is called until it is written to offline storage.
Shown as second
gcp.aiplatform.generate_content_input_tokens_per_minute_per_base_model
(count)
Generate content input tokens per minute per project per base model.
gcp.aiplatform.generate_content_requests_per_minute_per_project_per_base_model
(count)
Generate content requests per minute per project per base model.
gcp.aiplatform.matching_engine.cpu.request_utilization
(gauge)
The fraction of the requested CPU that is currently in use on a match server container.
gcp.aiplatform.matching_engine.current_replicas
(gauge)
Number of active replicas used by the DeployedIndex.
gcp.aiplatform.matching_engine.current_shards
(gauge)
Number of shards of the DeployedIndex.
gcp.aiplatform.matching_engine.memory.used_bytes
(gauge)
The memory used in bytes for a match server container.
Shown as byte
gcp.aiplatform.matching_engine.query.latencies.avg
(count)
The average server side request latency.
Shown as millisecond
gcp.aiplatform.matching_engine.query.latencies.samplecount
(count)
The sample count for server side request latency.
Shown as millisecond
gcp.aiplatform.matching_engine.query.latencies.sumsqdev
(count)
The sum of squared deviation for server side request latency.
Shown as millisecond
gcp.aiplatform.matching_engine.query.request_count
(count)
Number of requests received.
gcp.aiplatform.matching_engine.stream_update.datapoint_count
(count)
Number of successfully upserted or removed datapoints.
gcp.aiplatform.matching_engine.stream_update.latencies.avg
(count)
The average the latencies between the user receives a UpsertDatapointsResponse or RemoveDatapointsResponse and that update takes effect.
Shown as millisecond
gcp.aiplatform.matching_engine.stream_update.latencies.samplecount
(count)
The sample count for the latencies between the user receives a UpsertDatapointsResponse or RemoveDatapointsResponse and that update takes effect.
Shown as millisecond
gcp.aiplatform.matching_engine.stream_update.latencies.sumsqdev
(count)
The sum of squared deviation for the latencies between the user receives a UpsertDatapointsResponse or RemoveDatapointsResponse and that update takes effect.
Shown as millisecond
gcp.aiplatform.matching_engine.stream_update.request_count
(count)
Number of stream update requests.
gcp.aiplatform.online_prediction_dedicated_requests_per_base_model_version
(count)
Online prediction dedicated requests per minute per project per base model version.
gcp.aiplatform.online_prediction_dedicated_tokens_per_base_model_version
(count)
Online prediction dedicated tokens per minute per project per base model version.
gcp.aiplatform.online_prediction_requests_per_base_model
(count)
Online prediction requests per minute per project per base model.
Shown as request
gcp.aiplatform.online_prediction_tokens_per_minute_per_base_model
(count)
Online prediction tokens per minute per project per base model.
gcp.aiplatform.pipelinejob.duration
(gauge)
Runtime seconds of the pipeline job being executed (from creation to end).
Shown as second
gcp.aiplatform.pipelinejob.task_completed_count
(count)
Cumulative number of completed PipelineTasks.
gcp.aiplatform.prediction.online.accelerator.duty_cycle
(gauge)
Fraction of CPU allocated by the deployed model replica and currently in use. May exceed 100% if the machine type has multiple CPUs. Sampled every 60 seconds. After sampling data is not visible for up to 360 seconds.
Shown as fraction
gcp.aiplatform.prediction.online.accelerator.memory.bytes_used
(gauge)
Amount of accelerator memory allocated by the deployed model replica.
Shown as byte
gcp.aiplatform.prediction.online.cpu.utilization
(gauge)
Fraction of CPU allocated by the deployed model replica and currently in use. May exceed 100% if the machine type has multiple CPUs. Sampled every 60 seconds. After sampling data is not visible for up to 360 seconds.
Shown as fraction
gcp.aiplatform.prediction.online.deployment_resource_pool.accelerator.duty_cycle
(gauge)
Average fraction of time over the past sample period during which the accelerator(s) were actively processing.
gcp.aiplatform.prediction.online.deployment_resource_pool.accelerator.memory.bytes_used
(gauge)
Amount of accelerator memory allocated by the deployment resource pool replica.
Shown as byte
gcp.aiplatform.prediction.online.deployment_resource_pool.cpu.utilization
(gauge)
Fraction of CPU allocated by the deployment resource pool replica and currently in use. May exceed 100% if the machine type has multiple CPUs.
Shown as percent
gcp.aiplatform.prediction.online.deployment_resource_pool.memory.bytes_used
(gauge)
Amount of memory allocated by the deployment resource pool replica and currently in use.
Shown as byte
gcp.aiplatform.prediction.online.deployment_resource_pool.network.received_bytes_count
(count)
Number of bytes received over the network by the deployment resource pool replica.
Shown as byte
gcp.aiplatform.prediction.online.deployment_resource_pool.network.sent_bytes_count
(count)
Number of bytes sent over the network by the deployment resource pool replica.
Shown as byte
gcp.aiplatform.prediction.online.deployment_resource_pool.replicas
(gauge)
Number of active replicas used by the deployment resource pool.
gcp.aiplatform.prediction.online.deployment_resource_pool.target_replicas
(gauge)
Target number of active replicas needed for the deployment resource pool.
gcp.aiplatform.prediction.online.error_count
(count)
Number of online prediction errors.
Shown as error
gcp.aiplatform.prediction.online.memory.bytes_used
(gauge)
Amount of memory allocated by the deployed model replica and currently in use. Sampled every 60 seconds. After sampling data is not visible for up to 360 seconds.
Shown as byte
gcp.aiplatform.prediction.online.network.received_bytes_count
(count)
Number of bytes received over the network by the deployed model replica. Sampled every 60 seconds. After sampling data is not visible for up to 360 seconds.
Shown as byte
gcp.aiplatform.prediction.online.network.sent_bytes_count
(count)
Number of bytes sent over the network by the deployed model replica. Sampled every 60 seconds. After sampling data is not visible for up to 360 seconds.
Shown as byte
gcp.aiplatform.prediction.online.prediction_count
(count)
Number of online predictions.
Shown as prediction
gcp.aiplatform.prediction.online.prediction_latencies.avg
(gauge)
Average Online prediction latency of the deployed model.
Shown as microsecond
gcp.aiplatform.prediction.online.prediction_latencies.samplecount
(count)
Online prediction latency of the public deployed model. Sampled every 60 seconds. After sampling data is not visible for up to 360 seconds.
Shown as microsecond
gcp.aiplatform.prediction.online.private.prediction_latencies.avg
(gauge)
Average Online prediction latency of the private deployed model.
Shown as microsecond
gcp.aiplatform.prediction.online.private.prediction_latencies.samplecount
(count)
Online prediction latency of the private deployed model. Sampled every 60 seconds. After sampling data is not visible for up to 360 seconds.
Shown as microsecond
gcp.aiplatform.prediction.online.private.response_count
(count)
Online prediction response count of the private deployed model.
Shown as response
gcp.aiplatform.prediction.online.replicas
(count)
Number of active replicas used by the deployed model. Sampled every 60 seconds. After sampling data is not visible for up to 120 seconds.
Shown as worker
gcp.aiplatform.prediction.online.response_count
(count)
Number of different online prediction response codes.
Shown as response
gcp.aiplatform.prediction.online.target_replicas
(count)
Target number of active replicas needed for the deployed model. Sampled every 60 seconds. After sampling data is not visible for up to 120 seconds.
Shown as worker
gcp.aiplatform.publisher.online_serving.character_count
(count)
Accumulated input/output character count.
gcp.aiplatform.publisher.online_serving.characters.avg
(count)
The average input/output character count distribution.
gcp.aiplatform.publisher.online_serving.characters.samplecount
(count)
The sample count for input/output character count distribution.
gcp.aiplatform.publisher.online_serving.characters.sumsqdev
(count)
The sum of squared deviation for input/output character count distribution.
gcp.aiplatform.publisher.online_serving.consumed_throughput
(count)
Overall throughput used (accounting for burndown rate) in terms of characters.
gcp.aiplatform.publisher.online_serving.first_token_latencies.avg
(count)
The average duration from request received to first token sent back to the client.
Shown as millisecond
gcp.aiplatform.publisher.online_serving.first_token_latencies.samplecount
(count)
The sample count for duration from request received to first token sent back to the client.
Shown as millisecond
gcp.aiplatform.publisher.online_serving.first_token_latencies.sumsqdev
(count)
The sum of squared deviation for duration from request received to first token sent back to the client.
Shown as millisecond
gcp.aiplatform.publisher.online_serving.model_invocation_count
(count)
Number of model invocations (prediction requests).
gcp.aiplatform.publisher.online_serving.model_invocation_latencies.avg
(count)
The average model invocation latencies (prediction latencies).
Shown as millisecond
gcp.aiplatform.publisher.online_serving.model_invocation_latencies.samplecount
(count)
The sample count for model invocation latencies (prediction latencies).
Shown as millisecond
gcp.aiplatform.publisher.online_serving.model_invocation_latencies.sumsqdev
(count)
The sum of squared deviation for model invocation latencies (prediction latencies).
Shown as millisecond
gcp.aiplatform.publisher.online_serving.token_count
(count)
Accumulated input/output token count.
gcp.aiplatform.publisher.online_serving.tokens.avg
(count)
The average input/output token count distribution.
gcp.aiplatform.publisher.online_serving.tokens.samplecount
(count)
The sample count for input/output token count distribution.
gcp.aiplatform.publisher.online_serving.tokens.sumsqdev
(count)
The sum of squared deviation for input/output token count distribution.
gcp.aiplatform.quota.generate_content_input_tokens_per_minute_per_base_model.exceeded
(count)
Number of attempts to exceed the limit on quota metric aiplatform.googleapis.com/generate_content_input_tokens_per_minute_per_base_model.
gcp.aiplatform.quota.generate_content_input_tokens_per_minute_per_base_model.limit
(gauge)
Current limit on quota metric aiplatform.googleapis.com/generate_content_input_tokens_per_minute_per_base_model.
gcp.aiplatform.quota.generate_content_input_tokens_per_minute_per_base_model.usage
(count)
Current usage on quota metric aiplatform.googleapis.com/generate_content_input_tokens_per_minute_per_base_model.
gcp.aiplatform.quota.generate_content_requests_per_minute_per_project_per_base_model.exceeded
(count)
Number of attempts to exceed the limit on quota metric aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model.
gcp.aiplatform.quota.generate_content_requests_per_minute_per_project_per_base_model.limit
(gauge)
Current limit on quota metric aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model.
gcp.aiplatform.quota.generate_content_requests_per_minute_per_project_per_base_model.usage
(count)
Current usage on quota metric aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model.
gcp.aiplatform.quota.online_prediction_dedicated_requests_per_base_model_version.exceeded
(count)
Number of attempts to exceed the limit on quota metric aiplatform.googleapis.com/online_prediction_dedicated_requests_per_base_model_version.
gcp.aiplatform.quota.online_prediction_dedicated_requests_per_base_model_version.limit
(gauge)
Current limit on quota metric aiplatform.googleapis.com/online_prediction_dedicated_requests_per_base_model_version.
gcp.aiplatform.quota.online_prediction_dedicated_requests_per_base_model_version.usage
(count)
Current usage on quota metric aiplatform.googleapis.com/online_prediction_dedicated_requests_per_base_model_version.
gcp.aiplatform.quota.online_prediction_dedicated_tokens_per_base_model_version.exceeded
(count)
Number of attempts to exceed the limit on quota metric aiplatform.googleapis.com/online_prediction_dedicated_tokens_per_base_model_version.
gcp.aiplatform.quota.online_prediction_dedicated_tokens_per_base_model_version.limit
(gauge)
Current limit on quota metric aiplatform.googleapis.com/online_prediction_dedicated_tokens_per_base_model_version.
gcp.aiplatform.quota.online_prediction_dedicated_tokens_per_base_model_version.usage
(count)
Current usage on quota metric aiplatform.googleapis.com/online_prediction_dedicated_tokens_per_base_model_version.
gcp.aiplatform.quota.online_prediction_requests_per_base_model.exceeded
(count)
Number of attempts to exceed the limit on quota metric aiplatform.googleapis.com/online_prediction_requests_per_base_model.
Shown as error
gcp.aiplatform.quota.online_prediction_requests_per_base_model.limit
(gauge)
Current limit on quota metric aiplatform.googleapis.com/online_prediction_requests_per_base_model.
Shown as request
gcp.aiplatform.quota.online_prediction_requests_per_base_model.usage
(count)
Current usage on quota metric aiplatform.googleapis.com/online_prediction_requests_per_base_model.
Shown as request
gcp.aiplatform.quota.online_prediction_tokens_per_minute_per_base_model.exceeded
(count)
Number of attempts to exceed the limit on quota metric aiplatform.googleapis.com/online_prediction_tokens_per_minute_per_base_model.
gcp.aiplatform.quota.online_prediction_tokens_per_minute_per_base_model.limit
(gauge)
Current limit on quota metric aiplatform.googleapis.com/online_prediction_tokens_per_minute_per_base_model.
gcp.aiplatform.quota.online_prediction_tokens_per_minute_per_base_model.usage
(count)
Current usage on quota metric aiplatform.googleapis.com/online_prediction_tokens_per_minute_per_base_model.

Checks de servicios

Google Cloud Vertex AI no incluye checks de servicios.

Eventos

Google Cloud Vertex AI no incluye eventos.

Solucionar problemas

¿Necesitas ayuda? Ponte en contacto con el soporte de Datadog.

Referencias adicionales

Más enlaces, artículos y documentación útiles: