---
title: Google Cloud TPU
description: >-
  The benefits of Tensor Processing Units via scalable, user-friendly cloud
  resources for ML model development.
breadcrumbs: Docs > Integrations > Google Cloud TPU
---

# Google Cloud TPU
Integration version1.0.0
{% callout %}
# Important note for users on the following Datadog sites: us2.ddog-gov.com

{% alert level="info" %}
To find out if this integration is available in your organization, see your [Datadog Integrations](https://app.datadoghq.com/integrations) page or ask your organization administrator.

To initiate an exception request to enable this integration for your organization, email [support@ddog-gov.com](mailto:support@ddog-gov.com).
{% /alert %}

{% /callout %}

## Overview{% #overview %}

Google Cloud TPU products make the benefits of Tensor Processing Units (TPUs) available through scalable and easy-to-use cloud computing resource for all ML researchers, ML engineers, developers, and data scientists running cutting-edge ML models.

Use the Datadog Google Cloud Platform integration to collect metrics from Google Cloud TPU.

## Setup{% #setup %}

### Installation{% #installation %}

To use Google Cloud TPU, you only need to set up the [Google Cloud Platform integration](https://docs.datadoghq.com/integrations/google-cloud-platform.md).

### Log collection{% #log-collection %}

Google Cloud TPU logs are collected with Google Cloud Logging and sent to a Dataflow job through a Cloud Pub/Sub topic. If you haven't already, [set up logging with the Datadog Dataflow template](https://docs.datadoghq.com/integrations/google-cloud-platform.md#log-collection).

Once this is done, export your Google Cloud TPU logs from Google Cloud Logging to the Pub/Sub topic:

1. Go to the [Google Cloud Logging page](https://console.cloud.google.com/logs/viewer) and filter the Google Cloud TPU logs.
1. Click **Create Export** and name the sink.
1. Choose "Cloud Pub/Sub" as the destination and select the Pub/Sub topic that was created for that purpose. **Note**: The Pub/Sub topic can be located in a different project.
1. Click **Create** and wait for the confirmation message to show up.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **gcp.tpu.cpu.utilization**(gauge)                                      | Utilization of CPUs on the TPU Worker as a percent.*Shown as percent*                                                     |
| **gcp.tpu.memory.usage**(gauge)                                         | Memory usage in bytes.*Shown as byte*                                                                                     |
| **gcp.tpu.network.received\_bytes\_count**(count)                       | Cumulative bytes of data this server has received over the network.*Shown as byte*                                        |
| **gcp.tpu.network.sent\_bytes\_count**(count)                           | Cumulative bytes of data this server has sent over the network.*Shown as byte*                                            |
| **gcp.tpu.accelerator.duty\_cycle**(count)                              | Percentage of time over the sample period during which the accelerator was actively processing*Shown as percent*          |
| **gcp.tpu.instance.uptime\_total**(count)                               | Elapsed time since the VM was started, in seconds.*Shown as second*                                                       |
| **gcp.gke.node.accelerator.tensorcore\_utilization**(count)             | Current percentage of the Tensorcore that is utilized.*Shown as percent*                                                  |
| **gcp.gke.node.accelerator.duty\_cycle**(count)                         | Percent of time over the past sample period (10s) during which the accelerator was actively processing.*Shown as percent* |
| **gcp.gke.node.accelerator.memory\_used**(count)                        | Total accelerator memory allocated in bytes.*Shown as byte*                                                               |
| **gcp.gke.node.accelerator.memory\_total**(count)                       | Total accelerator memory in bytes.*Shown as byte*                                                                         |
| **gcp.gke.node.accelerator.memory\_bandwidth\_utilization**(count)      | Current percentage of the accelerator memory bandwidth that is being used.*Shown as percent*                              |
| **gcp.gke.container.accelerator.tensorcore\_utilization**(count)        | Current percentage of the Tensorcore that is utilized.*Shown as percent*                                                  |
| **gcp.gke.container.accelerator.duty\_cycle**(count)                    | Percent of time over the past sample period (10s) during which the accelerator was actively processing.*Shown as percent* |
| **gcp.gke.container.accelerator.memory\_used**(count)                   | Total accelerator memory allocated in bytes.*Shown as byte*                                                               |
| **gcp.gke.container.accelerator.memory\_total**(count)                  | Total accelerator memory in bytes.*Shown as byte*                                                                         |
| **gcp.gke.container.accelerator.memory\_bandwidth\_utilization**(count) | Current percentage of the accelerator memory bandwidth that is being used.*Shown as percent*                              |

### Events{% #events %}

The Google Cloud TPU integration does not include any events.

### Service Checks{% #service-checks %}

The Google Cloud TPU integration does not include any service checks.

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).