Amazon SageMaker
セキュリティモニタリングが使用可能です セキュリティモニタリングが使用可能です

Amazon SageMaker

Crawler Crawler

概要

Amazon SageMaker は、フルマネージド型の機械学習サービスです。Amazon SageMaker を使用して、データサイエンティストや開発者は、機械学習モデルを構築およびトレーニングした後に、実稼働準備ができたホスト環境にモデルを直接デプロイすることができます。

このインテグレーションを有効にすると、Datadog にすべての SageMaker メトリクスを表示できます。

セットアップ

インストール

Amazon Web Services インテグレーションをまだセットアップしていない場合は、最初にセットアップします。

メトリクスの収集

  1. AWS インテグレーションタイルのメトリクス収集で、SageMaker をオンにします。
  2. Datadog - Amazon SageMaker インテグレーションをインストールします。

ログの収集

ログの有効化

Amazon SageMaker から S3 バケットまたは CloudWatch のいずれかにログを送信するよう構成します。

: S3 バケットにログを送る場合は、Target prefixamazon_sagemaker に設定されているかを確認してください。

Datadog へのログの送信

  1. Datadog ログコレクション AWS Lambda 関数 をまだ設定していない場合は、設定を行ってください。
  2. lambda 関数がインストールされたら、AWS コンソールから、Amazon SageMaker ログを含む S3 バケットまたは CloudWatch のロググループに手動でトリガーを追加します。

収集データ

メトリクス

aws.sagemaker.invocation_4xx_errors
(count)
The average number of InvokeEndpoint requests where the model returned a 4xx HTTP response code.
aws.sagemaker.invocation_4xx_errors.sum
(count)
The sum of the number of InvokeEndpoint requests where the model returned a 4xx HTTP response code.
aws.sagemaker.invocation_5xx_errors
(count)
The average number of InvokeEndpoint requests where the model returned a 5xx HTTP response code.
aws.sagemaker.invocation_5xx_errors.sum
(count)
The sum of the number of InvokeEndpoint requests where the model returned a 5xx HTTP response code.
aws.sagemaker.invocations
(count)
The sum of the number of InvokeEndpoint requests sent to a model endpoint.
aws.sagemaker.invocations.sample_count
(count)
The sample count of the number of InvokeEndpoint requests sent to a model endpoint.
aws.sagemaker.invocations_per_instance
(count)
The number of invocations sent to a model normalized by InstanceCount in each ProductionVariant.
aws.sagemaker.model_latency
(count)
The average interval of time taken by a model to respond as viewed from Amazon SageMaker.
Shown as microsecond
aws.sagemaker.model_latency.sum
(count)
The sum of the interval of time taken by a model to respond as viewed from Amazon SageMaker.
Shown as microsecond
aws.sagemaker.model_latency.mininmum
(count)
The minimum interval of time taken by a model to respond as viewed from Amazon SageMaker.
Shown as microsecond
aws.sagemaker.model_latency.maximum
(count)
The maximum interval of time taken by a model to respond as viewed from Amazon SageMaker.
Shown as microsecond
aws.sagemaker.model_latency.sample_count
(count)
The sample count interval of time taken by a model to respond as viewed from Amazon SageMaker.
Shown as microsecond
aws.sagemaker.overhead_latency
(count)
The average interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.
Shown as microsecond
aws.sagemaker.overhead_latency.sum
(count)
The sum of the interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.
Shown as microsecond
aws.sagemaker.overhead_latency.minimum
(count)
The minimum interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.
Shown as microsecond
aws.sagemaker.overhead_latency.maximum
(count)
The maximum interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.
Shown as microsecond
aws.sagemaker.overhead_latency.sample_count
(count)
The sample count of the interval of time added to the time taken to respond to a client request by Amazon SageMaker overheads.
Shown as microsecond
aws.sagemaker.cpu_utilization
(count)
The percentage of CPU units that are used by the containers on an instance.
Shown as percent
aws.sagemaker.memory_utilization
(count)
The percentage of memory that is used by the containers on an instance.
Shown as percent
aws.sagemaker.gpu_utilization
(count)
The percentage of GPU units that are used by the containers on an instance.
Shown as percent
aws.sagemaker.gpu_memory_utilization
(count)
The percentage of GPU memory used by the containers on an instance.
Shown as percent
aws.sagemaker.disk_utilization
(count)
The percentage of disk space used by the containers on an instance uses.
Shown as percent
aws.sagemaker.dataset_objects_auto_annotated
(count)
The number of dataset objects auto-annotated in a labeling job.
aws.sagemaker.dataset_objects_human_annotated
(count)
The number of dataset objects annotated by a human in a labeling job.
aws.sagemaker.dataset_objects_labeling_failed
(count)
The number of dataset objects that failed labeling in a labeling job.
aws.sagemaker.jobs_failed
(count)
The sum of the number of labeling jobs that failed.
aws.sagemaker.jobs_failed.sample_count
(count)
The sample count of the number of labeling jobs that failed.
aws.sagemaker.jobs_succeeded
(count)
The sum of the number of labeling jobs that succeeded.
aws.sagemaker.jobs_succeeded.sample_count
(count)
The sample count number of labeling jobs that succeeded.
aws.sagemaker.jobs_stopped
(count)
The sum of the number of labeling jobs that were stopped.
aws.sagemaker.jobs_stopped.sample_count
(count)
The sample count of the number of labeling jobs that were stopped.
aws.sagemaker.total_dataset_objects_labeled
(count)
The maximum number of dataset objects labeled successfully in a labeling job.

イベント

Amazon SageMaker インテグレーションには、イベントは含まれません。

サービスのチェック

Amazon SageMaker インテグレーションには、サービスのチェック機能は含まれません。

トラブルシューティング

ご不明な点は、Datadog のサポートチームまでお問合せください。