Microsoft Azure Machine Learning

概要

Azure Machine Learning サービスでは、開発者やデータサイエンティストに向けに、機械学習モデルをより速く構築、トレーニング、展開するための生産的な機能を数多く提供しています。Datadog を使用して、他のアプリケーションやインフラストラクチャーに応じて Azure Machine Learning のパフォーマンスと使用状況を監視します。

Azure Machine Learning からメトリクスを取得すると、以下のことができます。

  • 実行数、実行ステータス、モデルのデプロイメント数、モデルのデプロイメントステータスを追跡。
  • 機械学習ノードの使用状況を監視。
  • 対コストパフォーマンスの最適化。

セットアップ

インストール

Microsoft Azure インテグレーションをまだセットアップしていない場合は、最初にセットアップします。それ以上のインストール手順はありません。

収集データ

メトリクス

azure.machinelearningservices_workspaces.completed_runs
(gauge)
The number of runs completed successfully for this workspace.
Shown as operation
azure.machinelearningservices_workspaces.started_runs
(gauge)
The number of runs started for this workspace.
Shown as operation
azure.machinelearningservices_workspaces.failed_runs
(gauge)
The number of runs failed for this workspace.
Shown as operation
azure.machinelearningservices_workspaces.model_register_succeeded
(gauge)
The number of model registrations that succeeded in this workspace.
azure.machinelearningservices_workspaces.model_register_failed
(gauge)
The number of model registrations that failed in this workspace.
azure.machinelearningservices_workspaces.model_deploy_started
(gauge)
The number of model deployments started in this workspace.
azure.machinelearningservices_workspaces.model_deploy_succeeded
(gauge)
The number of model deployments that succeeded in this workspace.
azure.machinelearningservices_workspaces.moddel_deploy_failed
(gauge)
The number of model deployments that failed in this workspace.
azure.machinelearningservices_workspaces.total_nodes
(gauge)
The number of total nodes. This total includes some of Active Nodes, Idle Nodes, Unusable Nodes, Premepted Nodes, Leaving Nodes.
Shown as node
azure.machinelearningservices_workspaces.active_nodes
(gauge)
The number of Acitve nodes. These are the nodes which are actively running a job.
Shown as node
azure.machinelearningservices_workspaces.idle_nodes
(gauge)
The number of idle nodes. Idle nodes are the nodes which are not running any jobs but can accept new job if available.
Shown as node
azure.machinelearningservices_workspaces.unusable_nodes
(gauge)
The number of unusable nodes. Unusable nodes are not functional due to some unresolvable issue. Azure will recycle these nodes.
Shown as node
azure.machinelearningservices_workspaces.preempted_nodes
(gauge)
The number of preempted nodes. These nodes are the low priority nodes which are taken away from the available node pool.
Shown as node
azure.machinelearningservices_workspaces.leaving_nodes
(gauge)
The number of leaving nodes. Leaving nodes are the nodes which just finished processing a job and will go to Idle state.
Shown as node
azure.machinelearningservices_workspaces.total_cores
(gauge)
The number of total cores.
Shown as core
azure.machinelearningservices_workspaces.active_cores
(gauge)
The number of active cores.
Shown as core
azure.machinelearningservices_workspaces.idle_cores
(gauge)
The number of idle cores.
Shown as core
azure.machinelearningservices_workspaces.unusable_cores
(gauge)
The number of unusable cores.
Shown as core
azure.machinelearningservices_workspaces.preempted_cores
(gauge)
The number of preempted cores.
Shown as core
azure.machinelearningservices_workspaces.leaving_cores
(gauge)
The number of leaving cores.
Shown as core
azure.machinelearningservices_workspaces.quota_utilization_percentage
(gauge)
The percent of quota utilized.
Shown as percent
azure.machinelearningservices_workspaces.cpuutilization
(gauge)
CPU utilization
Shown as percent
azure.machinelearningservices_workspaces.gpuutilization
(gauge)
GPU utilization
Shown as percent

イベント

Azure Machine Learning インテグレーションには、イベントは含まれません。

サービスチェック

Azure Machine Learning インテグレーションには、サービスのチェック機能は含まれません。

トラブルシューティング

ご不明な点は、Datadog のサポートチームまでお問合せください。

その他の参考資料