- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Google Cloud TPU products make the benefits of Tensor Processing Units (TPUs) available through scalable and easy-to-use cloud computing resource for all ML researchers, ML engineers, developers, and data scientists running cutting-edge ML models.
Use the Datadog Google Cloud Platform integration to collect metrics from Google Cloud TPU.
To use Google Cloud TPU, you only need to set up the Google Cloud Platform integration.
Google Cloud TPU logs are collected with Google Cloud Logging and sent to a Dataflow job through a Cloud Pub/Sub topic. If you haven’t already, set up logging with the Datadog Dataflow template.
Once this is done, export your Google Cloud TPU logs from Google Cloud Logging to the Pub/Sub topic:
gcp.tpu.cpu.utilization (gauge) | Utilization of CPUs on the TPU Worker as a percent. Shown as percent |
gcp.tpu.memory.usage (gauge) | Memory usage in bytes. Shown as byte |
gcp.tpu.network.received_bytes_count (count) | Cumulative bytes of data this server has received over the network. Shown as byte |
gcp.tpu.network.sent_bytes_count (count) | Cumulative bytes of data this server has sent over the network. Shown as byte |
gcp.tpu.accelerator.duty_cycle (count) | Percentage of time over the sample period during which the accelerator was actively processing Shown as percent |
gcp.tpu.instance.uptime_total (count) | Elapsed time since the VM was started, in seconds. Shown as second |
gcp.gke.node.accelerator.tensorcore_utilization (count) | Current percentage of the Tensorcore that is utilized. Shown as percent |
gcp.gke.node.accelerator.duty_cycle (count) | Percent of time over the past sample period (10s) during which the accelerator was actively processing. Shown as percent |
gcp.gke.node.accelerator.memory_used (count) | Total accelerator memory allocated in bytes. Shown as byte |
gcp.gke.node.accelerator.memory_total (count) | Total accelerator memory in bytes. Shown as byte |
gcp.gke.node.accelerator.memory_bandwidth_utilization (count) | Current percentage of the accelerator memory bandwidth that is being used. Shown as percent |
gcp.gke.container.accelerator.tensorcore_utilization (count) | Current percentage of the Tensorcore that is utilized. Shown as percent |
gcp.gke.container.accelerator.duty_cycle (count) | Percent of time over the past sample period (10s) during which the accelerator was actively processing. Shown as percent |
gcp.gke.container.accelerator.memory_used (count) | Total accelerator memory allocated in bytes. Shown as byte |
gcp.gke.container.accelerator.memory_total (count) | Total accelerator memory in bytes. Shown as byte |
gcp.gke.container.accelerator.memory_bandwidth_utilization (count) | Current percentage of the accelerator memory bandwidth that is being used. Shown as percent |
The Google Cloud TPU integration does not include any events.
The Google Cloud TPU integration does not include any service checks.
Need help? Contact Datadog support.