---
title: Google Cloud ML
description: >-
  A managed service for easily building machine learning models for data of any
  type or size.
breadcrumbs: Docs > Integrations > Google Cloud ML
---

> For the complete documentation index, see [llms.txt](https://docs.datadoghq.com/llms.txt).

# Google Cloud ML

{% callout %}
# Important note for users on the following Datadog sites: us2.ddog-gov.com

{% alert level="info" %}
To find out if this integration is available in your organization, see your [Datadog Integrations](https://app.datadoghq.com/integrations) page or ask your organization administrator.

To initiate an exception request to enable this integration for your organization, email [support@ddog-gov.com](mailto:support@ddog-gov.com).
{% /alert %}

{% /callout %}

## Overview{% #overview %}

Google Cloud Machine Learning is a managed service that enables you to easily build machine learning models, that work on any type of data, of any size.

Get metrics from Google Machine Learning to:

- Visualize the performance of your ML Services.
- Correlate the performance of your ML Services with your applications.

## Setup{% #setup %}

### Installation{% #installation %}

If you haven't already, set up the [Google Cloud Platform integration first](https://docs.datadoghq.com/integrations/google-cloud-platform.md). There are no other installation steps that need to be performed.

### Log collection{% #log-collection %}

Google Cloud Machine Learning logs are collected with Google Cloud Logging and sent to a Dataflow job through a Cloud Pub/Sub topic. If you haven't already, [set up logging with the Datadog Dataflow template](https://docs.datadoghq.com/integrations/google-cloud-platform.md#log-collection).

Once this is done, export your Google Cloud Machine Learning logs from Google Cloud Logging to the Pub/Sub topic:

1. Go to the [Google Cloud Logging page](https://console.cloud.google.com/logs/viewer) and filter the Google Cloud Machine Learning logs.
1. Click **Create Export** and name the sink.
1. Choose "Cloud Pub/Sub" as the destination and select the Pub/Sub topic that was created for that purpose. **Note**: The Pub/Sub topic can be located in a different project.
1. Click **Create** and wait for the confirmation message to show up.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **gcp.ml.prediction.error\_count**(count)                          | Cumulative count of prediction errors.                                                                                                                                    |
| **gcp.ml.prediction.latencies.avg**(count)                         | The average latency of a certain type.*Shown as microsecond*                                                                                                              |
| **gcp.ml.prediction.latencies.samplecount**(count)                 | The sample count for latency of a certain type.*Shown as microsecond*                                                                                                     |
| **gcp.ml.prediction.latencies.sumsqdev**(count)                    | The sum of squared deviation for latency of a certain type.*Shown as microsecond*                                                                                         |
| **gcp.ml.prediction.online.accelerator.duty\_cycle**(gauge)        | Average fraction of time over the past sample period during which the accelerator(s) were actively processing.                                                            |
| **gcp.ml.prediction.online.accelerator.memory.bytes\_used**(gauge) | Amount of accelerator memory allocated by the model replica.*Shown as byte*                                                                                               |
| **gcp.ml.prediction.online.cpu.utilization**(gauge)                | Fraction of CPU allocated by the model replica and currently in use. May exceed 100% if the machine type has multiple CPUs.                                               |
| **gcp.ml.prediction.online.memory.bytes\_used**(gauge)             | Amount of memory allocated by the model replica and currently in use.*Shown as byte*                                                                                      |
| **gcp.ml.prediction.online.network.bytes\_received**(count)        | Number of bytes received over the network by the model replica.*Shown as byte*                                                                                            |
| **gcp.ml.prediction.online.network.bytes\_sent**(count)            | Number of bytes sent over the network by the model replica.*Shown as byte*                                                                                                |
| **gcp.ml.prediction.online.replicas**(gauge)                       | Number of active model replicas.                                                                                                                                          |
| **gcp.ml.prediction.online.target\_replicas**(gauge)               | Aspired number of active model replicas.                                                                                                                                  |
| **gcp.ml.prediction.prediction\_count**(count)                     | Cumulative count of predictions.                                                                                                                                          |
| **gcp.ml.prediction.response\_count**(count)                       | Cumulative count of different response codes.                                                                                                                             |
| **gcp.ml.training.accelerator.memory.utilization**(gauge)          | Fraction of allocated accelerator memory that is currently in use. Values are numbers between 0.0 and 1.0, charts display the values as a percentage between 0% and 100%. |
| **gcp.ml.training.accelerator.utilization**(gauge)                 | Fraction of allocated accelerator that is currently in use. Values are numbers between 0.0 and 1.0, charts display the values as a percentage between 0% and 100%.        |
| **gcp.ml.training.cpu.utilization**(gauge)                         | Fraction of allocated CPU that is currently in use. Values are numbers between 0.0 and 1.0, charts display the values as a percentage between 0% and 100%.                |
| **gcp.ml.training.memory.utilization**(gauge)                      | Fraction of allocated memory that is currently in use. Values are numbers between 0.0 and 1.0, charts display the values as a percentage between 0% and 100%.             |
| **gcp.ml.training.network.received\_bytes\_count**(count)          | Number of bytes received by the training job over the network.*Shown as byte*                                                                                             |
| **gcp.ml.training.network.sent\_bytes\_count**(count)              | Number of bytes sent by the training job over the network.*Shown as byte*                                                                                                 |

### Events{% #events %}

The Google Cloud Machine Learning integration does not include any events.

### Service Checks{% #service-checks %}

The Google Cloud Machine Learning integration does not include any service checks.

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).

## Further Reading{% #further-reading %}

- [Machine learning model monitoring: Best practices](https://www.datadoghq.com/blog/ml-model-monitoring-in-production-best-practices/)