---
title: OCI GPU
description: >-
  OCI GPUs deliver on-demand, high-performance computing for AI, ML, and HPC
  workloads.
breadcrumbs: Docs > Integrations > OCI GPU
---

# OCI GPU
Supported OS 
## Overview{% #overview %}

Monitoring Oracle Cloud Infrastructure (OCI) GPU instances is essential for ensuring optimal performance and reliability of your high-performance computing workloads. This integration provides a comprehensive set of GPU metrics through the gpu_infrastructure_health namespace, enabling you to track various aspects of GPU health and utilization.

This integration lets you monitor and alert on the health, capacity, throughput, status, and performance of your [GPU Instances](https://www.oracle.com/cloud/compute/#gpu).

It collects metrics and tags from the [gpu_infrastructure_health](https://docs.oracle.com/en-us/iaas/Content/Compute/References/computemetrics.htm#computemetrics_topic-Available_Metrics_oci_high_performance_compute) namespace.

## Setup{% #setup %}

After setting up the [Oracle Cloud Infrastructure](https://docs.datadoghq.com/integrations/oracle_cloud_infrastructure.md) integration, ensure that any namespaces mentioned above are included in your [Connector Hub](https://cloud.oracle.com/connector-hub/service-connectors).

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **oci.gpu\_infrastructure\_health.gpu\_ecc\_double\_bit\_errors**(count) | The number of GPU double-bit ECC errors reported.*Shown as error*                                                                                               |
| **oci.gpu\_infrastructure\_health.gpu\_ecc\_single\_bit\_errors**(count) | The number of GPU single-bit ECC errors reported.*Shown as error*                                                                                               |
| **oci.gpu\_infrastructure\_health.gpu\_memory\_utilization**(gauge)      | The percentage of the GPU memory resource in use.*Shown as percent*                                                                                             |
| **oci.gpu\_infrastructure\_health.gpu\_power\_draw**(gauge)              | The amount of GPU power used.                                                                                                                                   |
| **oci.gpu\_infrastructure\_health.gpu\_temperature**(gauge)              | The GPU temperature reported.                                                                                                                                   |
| **oci.gpu\_infrastructure\_health.gpu\_utilization**(gauge)              | Activity level from GPU. Expressed as a percentage of total time. For instance pools, the value is averaged across all instances in the pool.*Shown as percent* |

### Service Checks{% #service-checks %}

OCI GPU does not include any service checks.

### Events{% #events %}

OCI GPU does not include any events.

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).
