Microsoft Azure Machine Learning

개요

Azure Machine Learning 서비스는 개발자와 데이터 과학자에게 기계 학습을 구축하고, 훈련하며, 더 빨리 배포하는 데 도움이 되도록 다양한 생산적인 경험을 제공하는 서비스입니다. Datadog를 사용해 Azure Machine Learning 성능과 내 애플리케이션 및 인프라스트럭처 컨텍스트 내 활용도를 모니터링할 수 있습니다.

Azure Machine Learning 메트릭을 얻으면 다음을 할 수 있습니다.

  • 모델 배포 및 실행 상태 횟수 추적
  • 기계 학습 노드 활용도 모니터링
  • 성능 대비 비용 최적화

설정

설치

아직 설정하지 않았다면, 먼저 Microsoft Azure 통합을 설정하세요. 그 외 다른 설치 단계는 없습니다.

수집한 데이터

메트릭

azure.machinelearningservices_workspaces.completed_runs
(gauge)
The number of runs completed successfully for this workspace.
Shown as operation
azure.machinelearningservices_workspaces.started_runs
(gauge)
The number of runs started for this workspace.
Shown as operation
azure.machinelearningservices_workspaces.failed_runs
(gauge)
The number of runs failed for this workspace.
Shown as operation
azure.machinelearningservices_workspaces.model_register_succeeded
(gauge)
The number of model registrations that succeeded in this workspace.
azure.machinelearningservices_workspaces.model_register_failed
(gauge)
The number of model registrations that failed in this workspace.
azure.machinelearningservices_workspaces.model_deploy_started
(gauge)
The number of model deployments started in this workspace.
azure.machinelearningservices_workspaces.model_deploy_succeeded
(gauge)
The number of model deployments that succeeded in this workspace.
azure.machinelearningservices_workspaces.moddel_deploy_failed
(gauge)
The number of model deployments that failed in this workspace.
azure.machinelearningservices_workspaces.total_nodes
(gauge)
The number of total nodes. This total includes some of Active Nodes, Idle Nodes, Unusable Nodes, Premepted Nodes, Leaving Nodes.
Shown as node
azure.machinelearningservices_workspaces.active_nodes
(gauge)
The number of Acitve nodes. These are the nodes which are actively running a job.
Shown as node
azure.machinelearningservices_workspaces.idle_nodes
(gauge)
The number of idle nodes. Idle nodes are the nodes which are not running any jobs but can accept new job if available.
Shown as node
azure.machinelearningservices_workspaces.unusable_nodes
(gauge)
The number of unusable nodes. Unusable nodes are not functional due to some unresolvable issue. Azure will recycle these nodes.
Shown as node
azure.machinelearningservices_workspaces.preempted_nodes
(gauge)
The number of preempted nodes. These nodes are the low priority nodes which are taken away from the available node pool.
Shown as node
azure.machinelearningservices_workspaces.leaving_nodes
(gauge)
The number of leaving nodes. Leaving nodes are the nodes which just finished processing a job and will go to Idle state.
Shown as node
azure.machinelearningservices_workspaces.total_cores
(gauge)
The number of total cores.
Shown as core
azure.machinelearningservices_workspaces.active_cores
(gauge)
The number of active cores.
Shown as core
azure.machinelearningservices_workspaces.idle_cores
(gauge)
The number of idle cores.
Shown as core
azure.machinelearningservices_workspaces.unusable_cores
(gauge)
The number of unusable cores.
Shown as core
azure.machinelearningservices_workspaces.preempted_cores
(gauge)
The number of preempted cores.
Shown as core
azure.machinelearningservices_workspaces.leaving_cores
(gauge)
The number of leaving cores.
Shown as core
azure.machinelearningservices_workspaces.quota_utilization_percentage
(gauge)
The percent of quota utilized.
Shown as percent
azure.machinelearningservices_workspaces.cpuutilization
(gauge)
CPU utilization
Shown as percent
azure.machinelearningservices_workspaces.gpuutilization
(gauge)
GPU utilization
Shown as percent

이벤트

Azure Machine Learning 통합에는 이벤트가 포함되어 있지 않습니다.

서비스 점검

Azure Machine Learning 통합에는 서비스 점검이 포함되어 있지 않습니다.

트러블슈팅

도움이 필요하신가요? Datadog 지원팀에 문의하세요.

참고 자료