Azure IoT Edge

Supported OS

Integration version4.2.1

Información general

Azure IoT Edge es un servicio totalmente gestionado servicio que permite implementar cargas de trabajo en la nube para que se ejecuten en dispositivos Edge de Internet de las Cosas (IoT) utilizando contenedores estándar.

Utiliza la integración de Datadog y Azure IoT Edge para recopilar métricas y el estado de los dispositivos IoT Edge.

Nota: Este integración requiere un tiempo de ejecución de IoT Edge versión 1.0.10 o superior.

Configuración

Sigue las instrucciones siguientes para instalar y configurar este check para un dispositivo IoT Edge que se ejecuta en un host de dispositivo.

Instalación

El check de Azure IoT Edge se incluye en el paquete del Datadog Agent.

No es necesaria ninguna instalación adicional en tu dispositivo.

Configuración

Configura el dispositivo IoT Edge de modo que el Agent se ejecute como un módulo personalizado. Sigue la documentación de Microsoft sobre implementación de módulos de Azure IoT Edge para obtener información sobre cómo instalar y trabajar con módulos personalizados para Azure IoT Edge.

Sigue los pasos que se indican a continuación para configurar el dispositivo de IoT Edge, los módulos del tiempo de ejecución y el Datadog Agent para empezar a recopilar métricas de IoT Edge.

  1. Configura el módulo del tiempo de ejecución de Edge Agent de la siguiente manera:

    • La versión de la imagen debe ser 1.0.10 o superior.

    • En “Crear opciones”, añade las siguientes Labels. Edita la etiqueta com.datadoghq.ad.instances según proceda. Consulta el ejemplo de azure_iot_edge.d/conf.yaml para ver todas las opciones disponibles de configuración. Consulta la documentación sobre Autodiscovery de integraciones de Docker para obtener más información sobre la configuración de integración basada en etiquetas.

      "Labels": {
          "com.datadoghq.ad.check_names": "[\"azure_iot_edge\"]",
          "com.datadoghq.ad.init_configs": "[{}]",
          "com.datadoghq.ad.instances": "[{\"edge_hub_prometheus_url\": \"http://edgeHub:9600/metrics\", \"edge_agent_prometheus_url\": \"http://edgeAgent:9600/metrics\"}]"
      }
      
  2. Configura el módulo del tiempo de ejecución de Edge Hub de la siguiente manera:

    • La versión de la imagen debe ser 1.0.10 o superior.
  3. Instala y configura el Datadog Agent como módulo personalizado:

    • Configura el nombre del módulo. Por ejemplo: datadog-agent.

    • Configura la URL de la imagen del Agent. Por ejemplo: datadog/agent:7.

    • En “Variables de entorno”, configura tu DD_API_KEY. También puedes configurar aquí una configuración adicional del Agent (consulta Variables de entorno del Agent).

    • En “Opciones para crear el contenedor”, introduce la siguiente configuración en función del sistema operativo de tu dispositivo. Nota: NetworkId debe corresponder al nombre de la red configurado en el archivo config.yaml del dispositivo.

      • Linux:
        {
            "HostConfig": {
                "NetworkMode": "default",
                "Env": ["NetworkId=azure-iot-edge"],
                "Binds": ["/var/run/docker.sock:/var/run/docker.sock"]
            }
        }
        
      • Windows:
        {
            "HostConfig": {
                "NetworkMode": "default",
                "Env": ["NetworkId=nat"],
                "Binds": ["//./pipe/iotedge_moby_engine:/./pipe/docker_engine"]
            }
        }
        
    • Guarda el módulo personalizado del Datadog Agent.

  4. Guarda e implementa los cambios en la configuración de tu dispositivo.

Recopilación de logs

  1. La recopilación de logs está desactivada de forma predeterminada en el Datadog Agent, actívala configurando tu módulo personalizado del Datadog Agent:

    • En “Variables de entorno”, configura la variable de entorno DD_LOGS_ENABLED:

      DD_LOGS_ENABLED: true
      
  2. Configura los módulos de Edge Agent y Edge Hub: en “Crear opciones”, añade la siguiente etiqueta:

    "Labels": {
        "com.datadoghq.ad.logs": "[{\"source\": \"azure.iot_edge\", \"service\": \"<SERVICE>\"}]",
        "...": "..."
    }
    

    Cambia el service en función de tu entorno.

    Repite esta operación para cualquier módulo personalizado para el que desees recopilar logs.

  3. Guarda e implementa los cambios en la configuración de tu dispositivo.

Validación

Una vez que Agent se haya implementado en el dispositivo, ejecuta el subcomando de estado del Agent y busca azure_iot_edge en la sección Checks.

Datos recopilados

Métricas

azure.iot_edge.edge_agent.available_disk_space_bytes
(gauge)
Amount of space left on the disk `disk_name.
Shown as byte
azure.iot_edge.edge_agent.command_latency_seconds.count
(gauge)
Count of how long it took for Docker to execute the given command. Possible commands are: create, update, remove, start, stop, restart.
azure.iot_edge.edge_agent.command_latency_seconds.quantile
(gauge)
Quantile of how long it took for Docker to execute the given command. Possible commands are: create, update, remove, start, stop, restart.
Shown as second
azure.iot_edge.edge_agent.command_latency_seconds.sum
(gauge)
Sum of how long it took for Docker to execute the given command. Possible commands are: create, update, remove, start, stop, restart.
Shown as second
azure.iot_edge.edge_agent.created_pids_total
(gauge)
Total number of processes the module module_name has created.
azure.iot_edge.edge_agent.deployment_time_seconds.count
(gauge)
Count of amount of time it took to complete a new deployment after receiving a change.
azure.iot_edge.edge_agent.deployment_time_seconds.quantile
(gauge)
Quantile of amount of time it took to complete a new deployment after receiving a change.
Shown as second
azure.iot_edge.edge_agent.deployment_time_seconds.sum
(gauge)
Sum of amount of time it took to complete a new deployment after receiving a change.
Shown as second
azure.iot_edge.edge_agent.direct_method_invocations_count
(count)
Total number of times a built-in Edge Agent direct method is called, such as Ping or Restart.
azure.iot_edge.edge_agent.host_uptime_seconds
(gauge)
How long the host has been running.
Shown as second
azure.iot_edge.edge_agent.iotedged_uptime_seconds
(gauge)
How long iotedged has been running.
Shown as second
azure.iot_edge.edge_agent.iothub_syncs_total
(count)
Total number of times the Edge Agent attempted to sync its twin with IoT Hub, both successful and unsuccessful. Includes both Edge Agent requesting a twin, and IoT Hub notifying of a twin update.
azure.iot_edge.edge_agent.module_start_total
(count)
Number of times the Edge Agent asked Docker to start the module module_name.
azure.iot_edge.edge_agent.module_stop_total
(count)
Number of times the Edge Agent asked Docker to stop the module module_name.
azure.iot_edge.edge_agent.total_disk_read_bytes
(count)
Total amount of bytes read from the disk by module module_name.
Shown as byte
azure.iot_edge.edge_agent.total_disk_space_bytes
(gauge)
Size of the disk `disk_name.
Shown as byte
azure.iot_edge.edge_agent.total_disk_write_bytes
(count)
Total amount of bytes written to the disk by module module_name.
Shown as byte
azure.iot_edge.edge_agent.total_memory_bytes
(gauge)
Total amount of RAM available to module module_name.
Shown as byte
azure.iot_edge.edge_agent.total_network_in_bytes
(count)
Total amount of bytes received from the network by module module_name.
Shown as byte
azure.iot_edge.edge_agent.total_network_out_bytes
(count)
Total amount of bytes sent to the network by module module_name.
Shown as byte
azure.iot_edge.edge_agent.total_time_expected_running_seconds
(gauge)
The amount of time the module module_name was specified in the deployment.
azure.iot_edge.edge_agent.total_time_running_correctly_seconds
(gauge)
The amount of time the module module_name was specified in the deployment and was in the running state.
azure.iot_edge.edge_agent.unsuccessful_iothub_syncs_total
(count)
Total number of times the Edge Agent failed to sync its twin with IoT Hub.
azure.iot_edge.edge_agent.used_cpu_percent.count
(gauge)
Count of percent of CPU used by all processes in module module_name.
azure.iot_edge.edge_agent.used_cpu_percent.quantile
(gauge)
Quantile of percent of CPU used by all processes in module module_name.
Shown as percent
azure.iot_edge.edge_agent.used_cpu_percent.sum
(gauge)
Sum of percent of CPU used by all processes in module module_name.
Shown as percent
azure.iot_edge.edge_agent.used_memory_bytes
(gauge)
Amount of RAM used by all processes in module module_name.
Shown as byte
azure.iot_edge.edge_hub.client_connect_failed_total
(count)
Total number of times clients failed to connect to Edge Hub.
azure.iot_edge.edge_hub.direct_method_duration_seconds.count
(gauge)
Count of time taken to resolve a direct message.
azure.iot_edge.edge_hub.direct_method_duration_seconds.quantile
(gauge)
Quantile of time taken to resolve a direct message.
Shown as second
azure.iot_edge.edge_hub.direct_method_duration_seconds.sum
(gauge)
Sum of time taken to resolve a direct message.
Shown as second
azure.iot_edge.edge_hub.direct_methods_total
(count)
Total number of direct messages sent.
azure.iot_edge.edge_hub.gettwin_duration_seconds.count
(gauge)
Count of time taken for get twin operations.
azure.iot_edge.edge_hub.gettwin_duration_seconds.quantile
(gauge)
Quantile of time taken for get twin operations.
Shown as second
azure.iot_edge.edge_hub.gettwin_duration_seconds.sum
(gauge)
Sum of time taken for get twin operations.
Shown as second
azure.iot_edge.edge_hub.gettwin_total
(count)
Total number of GetTwin calls.
azure.iot_edge.edge_hub.message_process_duration_seconds.count
(gauge)
Count of time taken to process a message from the queue.
azure.iot_edge.edge_hub.message_process_duration_seconds.quantile
(gauge)
Quantile of time taken to process a message from the queue.
Shown as second
azure.iot_edge.edge_hub.message_process_duration_seconds.sum
(gauge)
Sum of time taken to process a message from the queue.
Shown as second
azure.iot_edge.edge_hub.message_send_duration_seconds.count
(gauge)
Count of time taken to send a message.
azure.iot_edge.edge_hub.message_send_duration_seconds.quantile
(gauge)
Quantile of time taken to send a message.
Shown as second
azure.iot_edge.edge_hub.message_send_duration_seconds.sum
(gauge)
Sum of time taken to send a message.
Shown as second
azure.iot_edge.edge_hub.message_size_bytes.count
(gauge)
Count of message size from clients.
azure.iot_edge.edge_hub.message_size_bytes.quantile
(gauge)
Quantile of message size from clients.
Shown as byte
azure.iot_edge.edge_hub.message_size_bytes.sum
(gauge)
Sum of message size from clients.
Shown as byte
azure.iot_edge.edge_hub.messages_dropped_total
(count)
Total number of messages removed because of reason.
azure.iot_edge.edge_hub.messages_received_total
(count)
Total number of messages received from clients.
azure.iot_edge.edge_hub.messages_sent_total
(count)
Total number of messages sent to clients of upstream.
azure.iot_edge.edge_hub.messages_unack_total
(count)
Total number of messages unack because of storage failure.
azure.iot_edge.edge_hub.offline_count_total
(count)
Total number of times Edge Hub went offline.
azure.iot_edge.edge_hub.offline_duration_seconds.count
(gauge)
Count of time Edge Hub was offline.
azure.iot_edge.edge_hub.offline_duration_seconds.quantile
(gauge)
Quantile of time Edge Hub was offline.
Shown as second
azure.iot_edge.edge_hub.offline_duration_seconds.sum
(gauge)
Sum of time Edge Hub was offline.
Shown as second
azure.iot_edge.edge_hub.operation_retry_total
(count)
Total number of times Edge operations were retried.
azure.iot_edge.edge_hub.queue_length
(gauge)
Current length of Edge Hub's queue for a given priority.
azure.iot_edge.edge_hub.reported_properties_total
(count)
Total reported property updates calls.
azure.iot_edge.edge_hub.reported_properties_update_duration_seconds.count
(gauge)
Count of time taken to update reported properties.
azure.iot_edge.edge_hub.reported_properties_update_duration_seconds.quantile
(gauge)
Quantile of time taken to update reported properties.
Shown as second
azure.iot_edge.edge_hub.reported_properties_update_duration_seconds.sum
(gauge)
Sum of time taken to update reported properties.
Shown as second

Eventos

Azure IoT Edge no incluye ningún evento.

Checks de servicio

azure.iot_edge.edge_agent.prometheus.health
Returns CRITICAL if the Agent is unable to reach the Edge Agent metrics Prometheus endpoint. Returns OK otherwise.
Statuses: ok, critical

azure.iot_edge.edge_hub.prometheus.health
Returns CRITICAL if the Agent is unable to reach the Edge Hub metrics Prometheus endpoint. Returns OK otherwise.
Statuses: ok, critical

Solucionar problemas

¿Necesitas ayuda? Ponte en contacto con el equipo de asistencia de Datadog.

Referencias adicionales