Amazon SageMaker
Rapport de recherche Datadog : Bilan sur l'adoption de l'informatique sans serveur Rapport : Bilan sur l'adoption de l'informatique sans serveur

Amazon SageMaker

Crawler Crawler

Présentation

Amazon SageMaker est un service entièrement géré permettant aux développeurs et aux spécialistes des données de créer et former des modèles de machine learning, puis de les déployer directement dans un environnement hébergé prêt pour la production.

Activez cette intégration pour visualiser dans Datadog toutes vos métriques de SageMaker.

Implémentation

Installation

Si vous ne l’avez pas déjà fait, configurez d’abord l’intégration Amazon Web Services.

Collecte de métriques

  1. Dans le carré d’intégration AWS, assurez-vous que l’option SageMaker est cochée dans la section concernant la collecte des métriques.
  2. Installez l’intégration Datadog/Amazon SageMaker.

Collecte de logs

Activer le logging

Configurez Amazon SageMaker de façon à ce que ses logs soient envoyés vers un compartiment S3 ou vers Cloudwatch.

Remarque : si vous envoyez vos logs vers un compartiment S3, assurez-vous que amazon_sagemaker est défini en tant que Target prefix.

Envoyer des logs à Datadog

  1. Si vous ne l’avez pas déjà fait, configurez la fonction Lambda de collecte de logs AWS avec Datadog.
  2. Une fois la fonction Lambda installée, ajoutez manuellement un déclencheur sur le compartiment S3 ou sur le groupe de logs Cloudwatch qui contient vos logs Amazon SageMaker dans la console AWS :

Données collectées

Métriques

aws.sagemaker.invocation_4xx_errors
(count)
The average number of InvokeEndpoint requests where the model returned a 4xx HTTP response code.
aws.sagemaker.invocation_4xx_errors.sum
(count)
The sum of the number of InvokeEndpoint requests where the model returned a 4xx HTTP response code.
aws.sagemaker.invocation_5xx_errors
(count)
The average number of InvokeEndpoint requests where the model returned a 5xx HTTP response code.
aws.sagemaker.invocation_5xx_errors.sum
(count)
The sum of the number of InvokeEndpoint requests where the model returned a 5xx HTTP response code.
aws.sagemaker.invocations
(count)
The sum of the number of InvokeEndpoint requests sent to a model endpoint.
aws.sagemaker.invocations.sample_count
(count)
The sample count of the number of InvokeEndpoint requests sent to a model endpoint.
aws.sagemaker.invocations_per_instance
(count)
The number of invocations sent to a model normalized by InstanceCount in each ProductionVariant.
aws.sagemaker.model_latency
(count)
The average interval of time taken by a model to respond as viewed from Amazon SageMaker.
Shown as microsecond
aws.sagemaker.model_latency.sum
(count)
The sum of the interval of time taken by a model to respond as viewed from Amazon SageMaker.
Shown as microsecond
aws.sagemaker.model_latency.mininmum
(count)
The minimum interval of time taken by a model to respond as viewed from Amazon SageMaker.
Shown as microsecond
aws.sagemaker.model_latency.maximum
(count)
The maximum interval of time taken by a model to respond as viewed from Amazon SageMaker.
Shown as microsecond
aws.sagemaker.model_latency.sample_count
(count)
The sample count interval of time taken by a model to respond as viewed from Amazon SageMaker.
Shown as microsecond
aws.sagemaker.overhead_latency
(count)
The average interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.
Shown as microsecond
aws.sagemaker.overhead_latency.sum
(count)
The sum of the interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.
Shown as microsecond
aws.sagemaker.overhead_latency.minimum
(count)
The minimum interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.
Shown as microsecond
aws.sagemaker.overhead_latency.maximum
(count)
The maximum interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.
Shown as microsecond
aws.sagemaker.overhead_latency.sample_count
(count)
The sample count of the interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.
Shown as microsecond
aws.sagemaker.cpu_utilization
(count)
The percentage of CPU units that are used by the containers on an instance.
Shown as percent
aws.sagemaker.memory_utilization
(count)
The percentage of memory that is used by the containers on an instance.
Shown as percent
aws.sagemaker.gpu_utilization
(count)
The percentage of GPU units that are used by the containers on an instance.
Shown as percent
aws.sagemaker.gpu_memory_utilization
(count)
The percentage of GPU memory used by the containers on an instance.
Shown as percent
aws.sagemaker.disk_utilization
(count)
The percentage of disk space used by the containers on an instance uses.
Shown as percent
aws.sagemaker.dataset_objects_auto_annotated
(count)
The number of dataset objects auto-annotated in a labeling job.
aws.sagemaker.dataset_objects_human_annotated
(count)
The number of dataset objects annotated by a human in a labeling job.
aws.sagemaker.dataset_objects_labeling_failed
(count)
The number of dataset objects that failed labeling in a labeling job.
aws.sagemaker.jobs_failed
(count)
The sum of the number of labeling jobs that failed.
aws.sagemaker.jobs_failed.sample_count
(count)
The sample count of the number of labeling jobs that failed.
aws.sagemaker.jobs_succeeded
(count)
The sum of the number of labeling jobs that succeeded.
aws.sagemaker.jobs_succeeded.sample_count
(count)
The sample count number of labeling jobs that succeeded.
aws.sagemaker.jobs_stopped
(count)
The sum of the number of labeling jobs that were stopped.
aws.sagemaker.jobs_stopped.sample_count
(count)
The sample count of the number of labeling jobs that were stopped.
aws.sagemaker.total_dataset_objects_labeled
(count)
The maximum number of dataset objects labeled successfully in a labeling job.

Événements

L’intégration Amazon SageMaker n’inclut aucun événement.

Checks de service

L’intégration Amazon SageMaker n’inclut aucun check de service.

Dépannage

Besoin d’aide ? Contactez l’assistance Datadog.