Microsoft Azure Machine Learning

Présentation

Le service Azure Machine Learning offre aux développeurs et aux data scientists un large éventail d’expériences productives pour créer, former et déployer des modèles d’apprentissage automatique plus rapidement. Utilisez Datadog pour surveiller les performances et l’utilisation d’Azure Machine Learning par rapport au reste de vos applications et de votre infrastructure.

Recueillez des métriques d’Azure Machine Learning pour :

  • Surveiller le nombre d’exécutions et de déploiements de modèle ainsi que leur statut
  • Surveiller l’utilisation de vos nœuds d’apprentissage automatique
  • Optimiser votre rapport performances/coûts

Configuration

Installation

Si vous ne l’avez pas déjà fait, configurez d’abord l’intégration Microsoft Azure. Aucune autre procédure d’installation n’est requise.

Données collectées

Métriques

azure.machinelearningservices_workspaces.completed_runs
(gauge)
The number of runs completed successfully for this workspace.
Shown as operation
azure.machinelearningservices_workspaces.started_runs
(gauge)
The number of runs started for this workspace.
Shown as operation
azure.machinelearningservices_workspaces.failed_runs
(gauge)
The number of runs failed for this workspace.
Shown as operation
azure.machinelearningservices_workspaces.model_register_succeeded
(gauge)
The number of model registrations that succeeded in this workspace.
azure.machinelearningservices_workspaces.model_register_failed
(gauge)
The number of model registrations that failed in this workspace.
azure.machinelearningservices_workspaces.model_deploy_started
(gauge)
The number of model deployments started in this workspace.
azure.machinelearningservices_workspaces.model_deploy_succeeded
(gauge)
The number of model deployments that succeeded in this workspace.
azure.machinelearningservices_workspaces.moddel_deploy_failed
(gauge)
The number of model deployments that failed in this workspace.
azure.machinelearningservices_workspaces.total_nodes
(gauge)
The number of total nodes. This total includes some of Active Nodes, Idle Nodes, Unusable Nodes, Premepted Nodes, Leaving Nodes.
Shown as node
azure.machinelearningservices_workspaces.active_nodes
(gauge)
The number of Acitve nodes. These are the nodes which are actively running a job.
Shown as node
azure.machinelearningservices_workspaces.idle_nodes
(gauge)
The number of idle nodes. Idle nodes are the nodes which are not running any jobs but can accept new job if available.
Shown as node
azure.machinelearningservices_workspaces.unusable_nodes
(gauge)
The number of unusable nodes. Unusable nodes are not functional due to some unresolvable issue. Azure will recycle these nodes.
Shown as node
azure.machinelearningservices_workspaces.preempted_nodes
(gauge)
The number of preempted nodes. These nodes are the low priority nodes which are taken away from the available node pool.
Shown as node
azure.machinelearningservices_workspaces.leaving_nodes
(gauge)
The number of leaving nodes. Leaving nodes are the nodes which just finished processing a job and will go to Idle state.
Shown as node
azure.machinelearningservices_workspaces.total_cores
(gauge)
The number of total cores.
Shown as core
azure.machinelearningservices_workspaces.active_cores
(gauge)
The number of active cores.
Shown as core
azure.machinelearningservices_workspaces.idle_cores
(gauge)
The number of idle cores.
Shown as core
azure.machinelearningservices_workspaces.unusable_cores
(gauge)
The number of unusable cores.
Shown as core
azure.machinelearningservices_workspaces.preempted_cores
(gauge)
The number of preempted cores.
Shown as core
azure.machinelearningservices_workspaces.leaving_cores
(gauge)
The number of leaving cores.
Shown as core
azure.machinelearningservices_workspaces.quota_utilization_percentage
(gauge)
The percent of quota utilized.
Shown as percent
azure.machinelearningservices_workspaces.cpuutilization
(gauge)
CPU utilization
Shown as percent
azure.machinelearningservices_workspaces.gpuutilization
(gauge)
GPU utilization
Shown as percent

Événements

L’intégration Azure Machine Learning n’inclut aucun événement.

Checks de service

L’intégration Azure Machine Learning n’inclut aucun check de service.

Dépannage

Besoin d’aide ? Contactez l’assistance Datadog.