---
title: Amazon SageMaker
description: Amazon SageMaker is a fully managed machine learning service.
breadcrumbs: Docs > Integrations > Amazon SageMaker
---

# Amazon SageMaker

## Overview{% #overview %}

Amazon SageMaker is a fully managed machine learning service. With Amazon SageMaker, data scientists and developers can build and train machine learning models, and then directly deploy them into a production-ready hosted environment.

Enable this integration to see all your SageMaker metrics in Datadog.

## Setup{% #setup %}

### Installation{% #installation %}

If you haven't already, set up the [Amazon Web Services integration](https://docs.datadoghq.com/integrations/amazon_web_services/) first.

### Metric collection{% #metric-collection %}

1. In the [AWS integration page](https://app.datadoghq.com/integrations/amazon-web-services), ensure that `SageMaker` is enabled under the `Metric Collection` tab.
1. Install the [Datadog - Amazon SageMaker integration](https://app.datadoghq.com/integrations/amazon-sagemaker).

### Log collection{% #log-collection %}

#### Enable logging{% #enable-logging %}

Configure Amazon SageMaker to send logs either to a S3 bucket or to CloudWatch.

**Note**: If you log to a S3 bucket, make sure that `amazon_sagemaker` is set as *Target prefix*.

#### Send logs to Datadog{% #send-logs-to-datadog %}

1. If you haven't already, set up the [Datadog log collection AWS Lambda function](https://docs.datadoghq.com/integrations/amazon_web_services/?tab=automaticcloudformation#log-collection).

1. Once the lambda function is installed, manually add a trigger on the S3 bucket or CloudWatch log group that contains your Amazon SageMaker logs in the AWS console:

   - [Add a manual trigger on the S3 bucket](https://docs.datadoghq.com/logs/guide/send-aws-services-logs-with-the-datadog-lambda-function/#collecting-logs-from-s3-buckets)
   - [Add a manual trigger on the CloudWatch Log Group](https://docs.datadoghq.com/logs/guide/send-aws-services-logs-with-the-datadog-lambda-function/#collecting-logs-from-cloudwatch-log-group)

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **aws.sagemaker.consumed\_read\_requests\_units**(count)                       | The average number of consumed read units over the specified time period.                                                                                                                       |
| **aws.sagemaker.consumed\_read\_requests\_units.maximum**(count)               | The maximum number of consumed read units over the specified time period.                                                                                                                       |
| **aws.sagemaker.consumed\_read\_requests\_units.minimum**(count)               | The minimum number of consumed read units over the specified time period.                                                                                                                       |
| **aws.sagemaker.consumed\_read\_requests\_units.p90**(count)                   | The 90th percentile of consumed read units over the specified time period.                                                                                                                      |
| **aws.sagemaker.consumed\_read\_requests\_units.p95**(count)                   | The 95th percentile of consumed read units over the specified time period.                                                                                                                      |
| **aws.sagemaker.consumed\_read\_requests\_units.p99**(count)                   | The 99th percentile of consumed read units over the specified time period.                                                                                                                      |
| **aws.sagemaker.consumed\_read\_requests\_units.samplecount**(count)           | The sample count of consumed read units over the specified time period.                                                                                                                         |
| **aws.sagemaker.consumed\_read\_requests\_units.sum**(count)                   | The sum of consumed read units over the specified time period.                                                                                                                                  |
| **aws.sagemaker.consumed\_write\_requests\_units**(count)                      | The average number of consumed write units over the specified time period.                                                                                                                      |
| **aws.sagemaker.consumed\_write\_requests\_units.maximum**(count)              | The maximum number of consumed write units over the specified time period.                                                                                                                      |
| **aws.sagemaker.consumed\_write\_requests\_units.minimum**(count)              | The minimum number of consumed write units over the specified time period.                                                                                                                      |
| **aws.sagemaker.consumed\_write\_requests\_units.p90**(count)                  | The 90th percentile of consumed write units over the specified time period.                                                                                                                     |
| **aws.sagemaker.consumed\_write\_requests\_units.p95**(count)                  | The 95th percentile of consumed write units over the specified time period.                                                                                                                     |
| **aws.sagemaker.consumed\_write\_requests\_units.p99**(count)                  | The 99th percentile of consumed write units over the specified time period.                                                                                                                     |
| **aws.sagemaker.consumed\_write\_requests\_units.samplecount**(count)          | The sample count of consumed write units over the specified time period.                                                                                                                        |
| **aws.sagemaker.consumed\_write\_requests\_units.sum**(count)                  | The sum of consumed write units over the specified time period.                                                                                                                                 |
| **aws.sagemaker.invocation\_4xxerrors**(count)                                 | The average number of InvokeEndpoint requests where the model returned a 4xx HTTP response code.*Shown as request*                                                                              |
| **aws.sagemaker.invocation\_5xxerrors**(count)                                 | The average number of InvokeEndpoint requests where the model returned a 5xx HTTP response code.*Shown as request*                                                                              |
| **aws.sagemaker.invocation\_model\_errors**(count)                             | The number of model invocation requests which did not result in 2XX HTTP response. This includes 4XX/5XX status codes, low-level socket errors, malformed HTTP responses, and request timeouts. |
| **aws.sagemaker.invocations**(count)                                           | The number of InvokeEndpoint requests sent to a model endpoint.*Shown as request*                                                                                                               |
| **aws.sagemaker.invocations\_per\_instance**(count)                            | The number of invocations sent to a model normalized by InstanceCount in each ProductionVariant.                                                                                                |
| **aws.sagemaker.model\_cache\_hit**(count)                                     | The number of InvokeEndpoint requests sent to the multi-model endpoint for which the model was already loaded.*Shown as request*                                                                |
| **aws.sagemaker.model\_cache\_hit.sum**(count)                                 | The sum of InvokeEndpoint requests sent to the multi-model endpoint for which the model was already loaded.*Shown as request*                                                                   |
| **aws.sagemaker.model\_downloading\_time**(gauge)                              | The interval of time that it takes to download the model from Amazon Simple Storage Service (Amazon S3).*Shown as microsecond*                                                                  |
| **aws.sagemaker.model\_downloading\_time.maximum**(gauge)                      | The maximum interval of time that it takes to download the model from Amazon Simple Storage Service (Amazon S3).*Shown as microsecond*                                                          |
| **aws.sagemaker.model\_downloading\_time.minimum**(gauge)                      | The minimum interval of time that it takes to download the model from Amazon Simple Storage Service (Amazon S3).*Shown as microsecond*                                                          |
| **aws.sagemaker.model\_downloading\_time.samplecount**(count)                  | The sample count interval of time that it takes to download the model from Amazon Simple Storage Service (Amazon S3).*Shown as microsecond*                                                     |
| **aws.sagemaker.model\_downloading\_time.sum**(gauge)                          | The sum interval of time that it takes to download the model from Amazon Simple Storage Service (Amazon S3).*Shown as microsecond*                                                              |
| **aws.sagemaker.model\_latency**(gauge)                                        | The average interval of time taken by a model to respond as viewed from Amazon SageMaker.*Shown as microsecond*                                                                                 |
| **aws.sagemaker.model\_latency.maximum**(gauge)                                | The maximum interval of time taken by a model to respond as viewed from Amazon SageMaker.*Shown as microsecond*                                                                                 |
| **aws.sagemaker.model\_latency.minimum**(gauge)                                | The minimum interval of time taken by a model to respond as viewed from Amazon SageMaker.*Shown as microsecond*                                                                                 |
| **aws.sagemaker.model\_latency.samplecount**(count)                            | The sample count interval of time taken by a model to respond as viewed from Amazon SageMaker.*Shown as microsecond*                                                                            |
| **aws.sagemaker.model\_latency.sum**(gauge)                                    | The sum of the interval of time taken by a model to respond as viewed from Amazon SageMaker.*Shown as microsecond*                                                                              |
| **aws.sagemaker.model\_loading\_time**(gauge)                                  | The interval of time that it takes to load the model through the container's LoadModel API call.*Shown as microsecond*                                                                          |
| **aws.sagemaker.model\_loading\_time.maximum**(gauge)                          | The maximum interval of time that it takes to load the model through the container's LoadModel API call.*Shown as microsecond*                                                                  |
| **aws.sagemaker.model\_loading\_time.minimum**(gauge)                          | The minimum interval of time that it takes to load the model through the container's LoadModel API call.*Shown as microsecond*                                                                  |
| **aws.sagemaker.model\_loading\_time.samplecount**(count)                      | The sample count interval of time that it takes to load the model through the container's LoadModel API call.*Shown as microsecond*                                                             |
| **aws.sagemaker.model\_loading\_time.sum**(gauge)                              | The sum interval of time that it takes to load the model through the container's LoadModel API call.*Shown as microsecond*                                                                      |
| **aws.sagemaker.model\_loading\_wait\_time**(gauge)                            | The interval of time that an invocation request has waited for the target model to be downloaded, or loaded, or both in order to perform inference.*Shown as microsecond*                       |
| **aws.sagemaker.model\_loading\_wait\_time.maximum**(gauge)                    | The maximum interval of time that an invocation request has waited for the target model to be downloaded, or loaded, or both in order to perform inference.*Shown as microsecond*               |
| **aws.sagemaker.model\_loading\_wait\_time.minimum**(gauge)                    | The minimum interval of time that an invocation request has waited for the target model to be downloaded, or loaded, or both in order to perform inference.*Shown as microsecond*               |
| **aws.sagemaker.model\_loading\_wait\_time.samplecount**(count)                | The sample count interval of time that an invocation request has waited for the target model to be downloaded, or loaded, or both in order to perform inference.*Shown as microsecond*          |
| **aws.sagemaker.model\_loading\_wait\_time.sum**(gauge)                        | The sum interval of time that an invocation request has waited for the target model to be downloaded, or loaded, or both in order to perform inference.*Shown as microsecond*                   |
| **aws.sagemaker.model\_setup\_time**(gauge)                                    | The average time it takes to launch new compute resources for a serverless endpoint.*Shown as microsecond*                                                                                      |
| **aws.sagemaker.model\_setup\_time.maximum**(gauge)                            | The maximum interval of time it takes to launch new compute resources for a serverless endpoint.*Shown as microsecond*                                                                          |
| **aws.sagemaker.model\_setup\_time.minimum**(gauge)                            | The minimum interval of time it takes to launch new compute resources for a serverless endpoint.*Shown as microsecond*                                                                          |
| **aws.sagemaker.model\_setup\_time.samplecount**(count)                        | The sample_count of the amount of time it takes to launch new compute resources for a serverless endpoint.*Shown as microsecond*                                                                |
| **aws.sagemaker.model\_setup\_time.sum**(gauge)                                | The total amount of time takes to launch new compute resources for a serverless endpoint.*Shown as microsecond*                                                                                 |
| **aws.sagemaker.model\_unloading\_time**(gauge)                                | The interval of time that it takes to unload the model through the container's UnloadModel API call.*Shown as microsecond*                                                                      |
| **aws.sagemaker.model\_unloading\_time.maximum**(gauge)                        | The maximum interval of time that it takes to unload the model through the container's UnloadModel API call.*Shown as microsecond*                                                              |
| **aws.sagemaker.model\_unloading\_time.minimum**(gauge)                        | The minimum interval of time that it takes to unload the model through the container's UnloadModel API call.*Shown as microsecond*                                                              |
| **aws.sagemaker.model\_unloading\_time.samplecount**(count)                    | The sample count interval of time that it takes to unload the model through the container's UnloadModel API call.*Shown as microsecond*                                                         |
| **aws.sagemaker.model\_unloading\_time.sum**(gauge)                            | The sum interval of time that it takes to unload the model through the container's UnloadModel API call.*Shown as microsecond*                                                                  |
| **aws.sagemaker.overhead\_latency**(gauge)                                     | The average interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.*Shown as microsecond*                                                        |
| **aws.sagemaker.overhead\_latency.maximum**(gauge)                             | The maximum interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.*Shown as microsecond*                                                        |
| **aws.sagemaker.overhead\_latency.minimum**(gauge)                             | The minimum interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.*Shown as microsecond*                                                        |
| **aws.sagemaker.overhead\_latency.samplecount**(count)                         | The sample count of the interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.*Shown as microsecond*                                            |
| **aws.sagemaker.overhead\_latency.sum**(gauge)                                 | The sum of the interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.*Shown as microsecond*                                                     |
| **aws.sagemaker.endpoints.cpuutilization**(gauge)                              | The average percentage of CPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.endpoints.cpuutilization.maximum**(gauge)                      | The maximum percentage of CPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.endpoints.cpuutilization.minimum**(gauge)                      | The minimum percentage of CPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.endpoints.disk\_utilization**(gauge)                           | The average percentage of disk space used by the containers on an instance uses.*Shown as percent*                                                                                              |
| **aws.sagemaker.endpoints.disk\_utilization.maximum**(gauge)                   | The maximum percentage of disk space used by the containers on an instance uses.*Shown as percent*                                                                                              |
| **aws.sagemaker.endpoints.disk\_utilization.minimum**(gauge)                   | The minimum percentage of disk space used by the containers on an instance uses.*Shown as percent*                                                                                              |
| **aws.sagemaker.endpoints.gpumemory\_utilization**(gauge)                      | The average percentage of GPU memory used by the containers on an instance.*Shown as percent*                                                                                                   |
| **aws.sagemaker.endpoints.gpumemory\_utilization.maximum**(gauge)              | The maximum percentage of GPU memory used by the containers on an instance.*Shown as percent*                                                                                                   |
| **aws.sagemaker.endpoints.gpumemory\_utilization.minimum**(gauge)              | The minimum percentage of GPU memory used by the containers on an instance.*Shown as percent*                                                                                                   |
| **aws.sagemaker.endpoints.gpuutilization**(gauge)                              | The average percentage of GPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.endpoints.gpuutilization.maximum**(gauge)                      | The maximum percentage of GPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.endpoints.gpuutilization.minimum**(gauge)                      | The minimum percentage of GPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.endpoints.loaded\_model\_count**(count)                        | The number of models loaded in the containers of the multi-model endpoint. This metric is emitted per instance.                                                                                 |
| **aws.sagemaker.endpoints.loaded\_model\_count.sum**(count)                    | The sum of models loaded in the containers of the multi-model endpoint. This metric is emitted per instance.                                                                                    |
| **aws.sagemaker.endpoints.memory\_utilization**(gauge)                         | The average percentage of memory that is used by the containers on an instance.*Shown as percent*                                                                                               |
| **aws.sagemaker.endpoints.memory\_utilization.maximum**(gauge)                 | The maximum percentage of memory that is used by the containers on an instance.*Shown as percent*                                                                                               |
| **aws.sagemaker.endpoints.memory\_utilization.minimum**(gauge)                 | The minimum percentage of memory that is used by the containers on an instance.*Shown as percent*                                                                                               |
| **aws.sagemaker.processingjobs.cpuutilization**(gauge)                         | The average percentage of CPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.processingjobs.cpuutilization.maximum**(gauge)                 | The maximum percentage of CPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.processingjobs.cpuutilization.minimum**(gauge)                 | The minimum percentage of CPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.processingjobs.disk\_utilization**(gauge)                      | The average percentage of disk space used by the containers on an instance uses.*Shown as percent*                                                                                              |
| **aws.sagemaker.processingjobs.disk\_utilization.maximum**(gauge)              | The maximum percentage of disk space used by the containers on an instance uses.*Shown as percent*                                                                                              |
| **aws.sagemaker.processingjobs.disk\_utilization.minimum**(gauge)              | The minimum percentage of disk space used by the containers on an instance uses.*Shown as percent*                                                                                              |
| **aws.sagemaker.processingjobs.gpumemory\_utilization**(gauge)                 | The average percentage of GPU memory used by the containers on an instance.*Shown as percent*                                                                                                   |
| **aws.sagemaker.processingjobs.gpumemory\_utilization.maximum**(gauge)         | The maximum percentage of GPU memory used by the containers on an instance.*Shown as percent*                                                                                                   |
| **aws.sagemaker.processingjobs.gpumemory\_utilization.minimum**(gauge)         | The minimum percentage of GPU memory used by the containers on an instance.*Shown as percent*                                                                                                   |
| **aws.sagemaker.processingjobs.gpuutilization**(gauge)                         | The average percentage of GPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.processingjobs.gpuutilization.maximum**(gauge)                 | The maximum percentage of GPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.processingjobs.gpuutilization.minimum**(gauge)                 | The minimum percentage of GPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.processingjobs.memory\_utilization**(gauge)                    | The average percentage of memory that is used by the containers on an instance.*Shown as percent*                                                                                               |
| **aws.sagemaker.processingjobs.memory\_utilization.maximum**(gauge)            | The maximum percentage of memory that is used by the containers on an instance.*Shown as percent*                                                                                               |
| **aws.sagemaker.processingjobs.memory\_utilization.minimum**(gauge)            | The minimum percentage of memory that is used by the containers on an instance.*Shown as percent*                                                                                               |
| **aws.sagemaker.trainingjobs.cpuutilization**(gauge)                           | The average percentage of CPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.trainingjobs.cpuutilization.maximum**(gauge)                   | The maximum percentage of CPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.trainingjobs.cpuutilization.minimum**(gauge)                   | The minimum percentage of CPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.trainingjobs.disk\_utilization**(gauge)                        | The average percentage of disk space used by the containers on an instance uses.*Shown as percent*                                                                                              |
| **aws.sagemaker.trainingjobs.disk\_utilization.maximum**(gauge)                | The maximum percentage of disk space used by the containers on an instance uses.*Shown as percent*                                                                                              |
| **aws.sagemaker.trainingjobs.disk\_utilization.minimum**(gauge)                | The minimum percentage of disk space used by the containers on an instance uses.*Shown as percent*                                                                                              |
| **aws.sagemaker.trainingjobs.gpumemory\_utilization**(gauge)                   | The average percentage of GPU memory used by the containers on an instance.*Shown as percent*                                                                                                   |
| **aws.sagemaker.trainingjobs.gpumemory\_utilization.maximum**(gauge)           | The maximum percentage of GPU memory used by the containers on an instance.*Shown as percent*                                                                                                   |
| **aws.sagemaker.trainingjobs.gpumemory\_utilization.minimum**(gauge)           | The minimum percentage of GPU memory used by the containers on an instance.*Shown as percent*                                                                                                   |
| **aws.sagemaker.trainingjobs.gpuutilization**(gauge)                           | The average percentage of GPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.trainingjobs.gpuutilization.maximum**(gauge)                   | The maximum percentage of GPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.trainingjobs.gpuutilization.minimum**(gauge)                   | The minimum percentage of GPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.trainingjobs.memory\_utilization**(gauge)                      | The average percentage of memory that is used by the containers on an instance.*Shown as percent*                                                                                               |
| **aws.sagemaker.trainingjobs.memory\_utilization.maximum**(gauge)              | The maximum percentage of memory that is used by the containers on an instance.*Shown as percent*                                                                                               |
| **aws.sagemaker.trainingjobs.memory\_utilization.minimum**(gauge)              | The minimum percentage of memory that is used by the containers on an instance.*Shown as percent*                                                                                               |
| **aws.sagemaker.transformjobs.cpuutilization**(gauge)                          | The average percentage of CPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.transformjobs.cpuutilization.maximum**(gauge)                  | The maximum percentage of CPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.transformjobs.cpuutilization.minimum**(gauge)                  | The minimum percentage of CPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.transformjobs.disk\_utilization**(gauge)                       | The average percentage of disk space used by the containers on an instance uses.*Shown as percent*                                                                                              |
| **aws.sagemaker.transformjobs.disk\_utilization.maximum**(gauge)               | The maximum percentage of disk space used by the containers on an instance uses.*Shown as percent*                                                                                              |
| **aws.sagemaker.transformjobs.disk\_utilization.minimum**(gauge)               | The minimum percentage of disk space used by the containers on an instance uses.*Shown as percent*                                                                                              |
| **aws.sagemaker.transformjobs.gpumemory\_utilization**(gauge)                  | The average percentage of GPU memory used by the containers on an instance.*Shown as percent*                                                                                                   |
| **aws.sagemaker.transformjobs.gpumemory\_utilization.maximum**(gauge)          | The maximum percentage of GPU memory used by the containers on an instance.*Shown as percent*                                                                                                   |
| **aws.sagemaker.transformjobs.gpumemory\_utilization.minimum**(gauge)          | The minimum percentage of GPU memory used by the containers on an instance.*Shown as percent*                                                                                                   |
| **aws.sagemaker.transformjobs.gpuutilization**(gauge)                          | The average percentage of GPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.transformjobs.gpuutilization.maximum**(gauge)                  | The maximum percentage of GPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.transformjobs.gpuutilization.minimum**(gauge)                  | The minimum percentage of GPU units that are used by the containers on an instance.*Shown as percent*                                                                                           |
| **aws.sagemaker.transformjobs.memory\_utilization**(gauge)                     | The average percentage of memory that is used by the containers on an instance.*Shown as percent*                                                                                               |
| **aws.sagemaker.transformjobs.memory\_utilization.maximum**(gauge)             | The maximum percentage of memory that is used by the containers on an instance.*Shown as percent*                                                                                               |
| **aws.sagemaker.transformjobs.memory\_utilization.minimum**(gauge)             | The minimum percentage of memory that is used by the containers on an instance.*Shown as percent*                                                                                               |
| **aws.sagemaker.labelingjobs.dataset\_objects\_auto\_annotated**(count)        | The average number of dataset objects auto-annotated in a labeling job.                                                                                                                         |
| **aws.sagemaker.labelingjobs.dataset\_objects\_human\_annotated**(count)       | The average number of dataset objects annotated by a human in a labeling job.                                                                                                                   |
| **aws.sagemaker.labelingjobs.dataset\_objects\_labeling\_failed**(count)       | The number of dataset objects that failed labeling in a labeling job.                                                                                                                           |
| **aws.sagemaker.labelingjobs.jobs\_succeeded.samplecount**(count)              | The sample count of occurrences a single labeling job succeeded.*Shown as job*                                                                                                                  |
| **aws.sagemaker.labelingjobs.jobs\_succeeded.sum**(count)                      | The sum of occurrences a single labeling job succeeded.*Shown as job*                                                                                                                           |
| **aws.sagemaker.labelingjobs.total\_dataset\_objects\_labeled**(count)         | The average number of dataset objects labeled successfully in a labeling job.                                                                                                                   |
| **aws.sagemaker.workteam.active\_workers**(count)                              | The average number of single active workers on a private work team that submitted, released, or declined a task.                                                                                |
| **aws.sagemaker.workteam.active\_workers.samplecount**(count)                  | The sample count of single active workers on a private work team that submitted, released, or declined a task.                                                                                  |
| **aws.sagemaker.workteam.active\_workers.sum**(count)                          | The sum of single active workers on a private work team that submitted, released, or declined a task.                                                                                           |
| **aws.sagemaker.workteam.tasks\_accepted.samplecount**(count)                  | The sample count of occurrences a single task was accepted by a worker.                                                                                                                         |
| **aws.sagemaker.workteam.tasks\_accepted.sum**(count)                          | The sum of occurrences a single task was accepted by a worker.                                                                                                                                  |
| **aws.sagemaker.workteam.tasks\_declined.samplecount**(count)                  | The sample count of occurrences a single task was declined by a worker.                                                                                                                         |
| **aws.sagemaker.workteam.tasks\_declined.sum**(count)                          | The sum of occurrences a single task was declined by a worker.                                                                                                                                  |
| **aws.sagemaker.workteam.tasks\_submitted.samplecount**(count)                 | The average number of occurrences a single task was submitted/completed by a private worker.                                                                                                    |
| **aws.sagemaker.workteam.tasks\_submitted.sum**(count)                         | The average number of occurrences a single task was submitted/completed by a private worker.                                                                                                    |
| **aws.sagemaker.modelbuildingpipeline.execution\_duration**(gauge)             | The average duration in milliseconds that the pipeline execution ran.*Shown as millisecond*                                                                                                     |
| **aws.sagemaker.modelbuildingpipeline.execution\_duration.maximum**(gauge)     | The maximum duration in milliseconds that the pipeline execution ran.*Shown as millisecond*                                                                                                     |
| **aws.sagemaker.modelbuildingpipeline.execution\_duration.minimum**(gauge)     | The minimum duration in milliseconds that the pipeline execution ran.*Shown as millisecond*                                                                                                     |
| **aws.sagemaker.modelbuildingpipeline.execution\_duration.samplecount**(count) | The sample count duration in milliseconds that the pipeline execution ran.*Shown as millisecond*                                                                                                |
| **aws.sagemaker.modelbuildingpipeline.execution\_duration.sum**(gauge)         | The sum duration in milliseconds that the pipeline execution ran.*Shown as millisecond*                                                                                                         |
| **aws.sagemaker.modelbuildingpipeline.execution\_failed**(count)               | The average number of steps that failed.                                                                                                                                                        |
| **aws.sagemaker.modelbuildingpipeline.execution\_failed.sum**(count)           | The sum of steps that failed.                                                                                                                                                                   |
| **aws.sagemaker.modelbuildingpipeline.execution\_started**(count)              | The average number of pipeline executions that started.                                                                                                                                         |
| **aws.sagemaker.modelbuildingpipeline.execution\_started.sum**(count)          | The sum of pipeline executions that started.                                                                                                                                                    |
| **aws.sagemaker.modelbuildingpipeline.execution\_stopped**(count)              | The average number of pipeline executions that stopped.                                                                                                                                         |
| **aws.sagemaker.modelbuildingpipeline.execution\_stopped.sum**(count)          | The sum of pipeline executions that stopped.                                                                                                                                                    |
| **aws.sagemaker.modelbuildingpipeline.execution\_succeeded**(count)            | The average number of pipeline executions that succeeded.                                                                                                                                       |
| **aws.sagemaker.modelbuildingpipeline.execution\_succeeded.sum**(count)        | The sum of pipeline executions that succeeded.                                                                                                                                                  |
| **aws.sagemaker.modelbuildingpipeline.step\_duration**(gauge)                  | The average duration in milliseconds that the step ran.*Shown as millisecond*                                                                                                                   |
| **aws.sagemaker.modelbuildingpipeline.step\_duration.maximum**(gauge)          | The maximum duration in milliseconds that the step ran.*Shown as millisecond*                                                                                                                   |
| **aws.sagemaker.modelbuildingpipeline.step\_duration.minimum**(gauge)          | The minimum duration in milliseconds that the step ran.*Shown as millisecond*                                                                                                                   |
| **aws.sagemaker.modelbuildingpipeline.step\_duration.samplecount**(count)      | The sample count duration in milliseconds that the step ran.*Shown as millisecond*                                                                                                              |
| **aws.sagemaker.modelbuildingpipeline.step\_duration.sum**(gauge)              | The sum duration in milliseconds that the step ran.*Shown as millisecond*                                                                                                                       |
| **aws.sagemaker.modelbuildingpipeline.step\_failed**(count)                    | The average number of steps that failed.                                                                                                                                                        |
| **aws.sagemaker.modelbuildingpipeline.step\_failed.sum**(count)                | The sum of steps that failed.                                                                                                                                                                   |
| **aws.sagemaker.modelbuildingpipeline.step\_started**(count)                   | The average number of steps that started.                                                                                                                                                       |
| **aws.sagemaker.modelbuildingpipeline.step\_started.sum**(count)               | The sum of steps that started.                                                                                                                                                                  |
| **aws.sagemaker.modelbuildingpipeline.step\_stopped**(count)                   | The average number of steps that stopped.                                                                                                                                                       |
| **aws.sagemaker.modelbuildingpipeline.step\_stopped.sum**(count)               | The sum of steps that stopped.                                                                                                                                                                  |
| **aws.sagemaker.modelbuildingpipeline.step\_succeeded**(count)                 | The average number of steps that succeeded.                                                                                                                                                     |
| **aws.sagemaker.modelbuildingpipeline.step\_succeeded.sum**(count)             | The sum of steps that succeeded.                                                                                                                                                                |

### Events{% #events %}

The Amazon SageMaker integration does not include any events.

### Service Checks{% #service-checks %}

The Amazon SageMaker integration does not include any service checks.

## Out-of-the-box monitoring{% #out-of-the-box-monitoring %}

Datadog provides out-of-the-box dashboards for your SageMaker endpoints and jobs.

### SageMaker endpoints{% #sagemaker-endpoints %}

Use the [SageMaker endpoints dashboard](https://app.datadoghq.com/dash/integration/31076/amazon-sagemaker-endpoints) to help you immediately start monitoring the health and performance of your SageMaker endpoints with no additional configuration. Determine which endpoints have errors, higher-than-expected latency, or traffic spikes. Review and correct your instance type and scaling policy selections using CPU, GPU, memory, and disk utilization metrics.


### SageMaker jobs{% #sagemaker-jobs %}

You can use the [SageMaker jobs dashboard](https://app.datadoghq.com/dash/integration/31077/amazon-sagemaker-jobs) to gain insight into the resource utilization (for example, finding CPU, GPU, and storage bottlenecks) of your training, processing, or transform jobs. Use this information to optimize your compute instances.


## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).

## Further Reading{% #further-reading %}

- [CloudHealth + Datadog: Effectively manage your cloud assets](https://www.datadoghq.com/blog/monitor-cloudhealth-assets-datadog/)