New announcements for Serverless, Network, RUM, and more from Dash! New announcements from Dash!

Amazon SageMaker

Crawler Crawler

Overview

Amazon SageMaker is a fully managed machine learning service. With Amazon SageMaker, data scientists and developers can build and train machine learning models, and then directly deploy them into a production-ready hosted environment.

Enable this integration to see all your SageMaker metrics in Datadog.

Setup

Installation

If you haven’t already, set up the Amazon Web Services integration first.

Metric collection

  1. In the AWS integration tile, ensure that SageMaker is checked under metric collection.

  2. Install the Datadog - Amazon SageMaker integration.

Data Collected

Metrics

aws.sagemaker.invocation_4xx_errors
(count)
The average number of InvokeEndpoint requests where the model returned a 4xx HTTP response code.
aws.sagemaker.invocation_4xx_errors.sum
(count)
The sum of the number of InvokeEndpoint requests where the model returned a 4xx HTTP response code.
aws.sagemaker.invocation_5xx_errors
(count)
The average number of InvokeEndpoint requests where the model returned a 5xx HTTP response code.
aws.sagemaker.invocation_5xx_errors.sum
(count)
The sum of the number of InvokeEndpoint requests where the model returned a 5xx HTTP response code.
aws.sagemaker.invocations
(count)
The sum of the number of InvokeEndpoint requests sent to a model endpoint.
aws.sagemaker.invocations.sample_count
(count)
The sample count of the number of InvokeEndpoint requests sent to a model endpoint.
aws.sagemaker.invocations_per_instance
(count)
The number of invocations sent to a model normalized by InstanceCount in each ProductionVariant.
aws.sagemaker.model_latency
(count)
The average interval of time taken by a model to respond as viewed from Amazon SageMaker.
shown as microsecond
aws.sagemaker.model_latency.sum
(count)
The sum of the interval of time taken by a model to respond as viewed from Amazon SageMaker.
shown as microsecond
aws.sagemaker.model_latency.mininmum
(count)
The minimum interval of time taken by a model to respond as viewed from Amazon SageMaker.
shown as microsecond
aws.sagemaker.model_latency.maximum
(count)
The maximum interval of time taken by a model to respond as viewed from Amazon SageMaker.
shown as microsecond
aws.sagemaker.model_latency.sample_count
(count)
The sample count interval of time taken by a model to respond as viewed from Amazon SageMaker.
shown as microsecond
aws.sagemaker.overhead_latency
(count)
The average interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.
shown as microsecond
aws.sagemaker.overhead_latency.sum
(count)
The sum of the interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.
shown as microsecond
aws.sagemaker.overhead_latency.minimum
(count)
The minimum interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.
shown as microsecond
aws.sagemaker.overhead_latency.maximum
(count)
The maximum interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.
shown as microsecond
aws.sagemaker.overhead_latency.sample_count
(count)
The sample count of the interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.
shown as microsecond
aws.sagemaker.cpu_utilization
(count)
The percentage of CPU units that are used by the containers on an instance.
shown as percent
aws.sagemaker.memory_utilization
(count)
The percentage of memory that is used by the containers on an instance.
shown as percent
aws.sagemaker.gpu_utilization
(count)
The percentage of GPU units that are used by the containers on an instance.
shown as percent
aws.sagemaker.gpu_memory_utilization
(count)
The percentage of GPU memory used by the containers on an instance.
shown as percent
aws.sagemaker.disk_utilization
(count)
The percentage of disk space used by the containers on an instance uses.
shown as percent
aws.sagemaker.dataset_objects_auto_annotated
(count)
The number of dataset objects auto-annotated in a labeling job.
aws.sagemaker.dataset_objects_human_annotated
(count)
The number of dataset objects annotated by a human in a labeling job.
aws.sagemaker.dataset_objects_labeling_failed
(count)
The number of dataset objects that failed labeling in a labeling job.
aws.sagemaker.jobs_failed
(count)
The sum of the number of labeling jobs that failed.
aws.sagemaker.jobs_failed.sample_count
(count)
The sample count of the number of labeling jobs that failed.
aws.sagemaker.jobs_succeeded
(count)
The sum of the number of labeling jobs that succeeded.
aws.sagemaker.jobs_succeeded.sample_count
(count)
The sample count number of labeling jobs that succeeded.
aws.sagemaker.jobs_stopped
(count)
The sum of the number of labeling jobs that were stopped.
aws.sagemaker.jobs_stopped.sample_count
(count)
The sample count of the number of labeling jobs that were stopped.
aws.sagemaker.total_dataset_objects_labeled
(count)
The maximum number of dataset objects labeled successfully in a labeling job.

Events

The Amazon SageMaker integration does not include any events.

Service Checks

The Amazon SageMaker integration does not include any service checks.

Troubleshooting

Need help? Contact Datadog support.