Azure IoT Edge

Azure IoT Edge

Agent Check Agent Check

Linux Mac OS Windows OS Supported

Overview

Azure IoT Edge is a fully managed service to deploy Cloud workloads to run on Internet of Things (IoT) Edge devices via standard containers.

Use the Datadog-Azure IoT Edge integration to collect metrics and health status from IoT Edge devices.

Note: This integration requires IoT Edge runtime version 1.0.10 or above.

Setup

Follow the instructions below to install and configure this check for an IoT Edge device running on a device host.

Installation

The Azure IoT Edge check is included in the Datadog Agent package.

No additional installation is needed on your device.

Configuration

Configure the IoT Edge device so that the Agent runs as a custom module. Follow the Microsoft documentation on deploying Azure IoT Edge modules for information on installing and working with custom modules for Azure IoT Edge.

Follow the steps below to configure the IoT Edge device, runtime modules, and the Datadog Agent to start collecting IoT Edge metrics.

  1. Configure the Edge Agent runtime module as follows:

    • Image version must be 1.0.10 or above.

    • Under “Create Options”, add the following Labels. Edit the com.datadoghq.ad.instances label as appropriate. See the sample azure_iot_edge.d/conf.yaml for all available configuration options. See the documentation on Docker Integrations Autodiscovery for more information on labels-based integration configuration.

      "Labels": {
          "com.datadoghq.ad.check_names": "[\"azure_iot_edge\"]",
          "com.datadoghq.ad.init_configs": "[{}]",
          "com.datadoghq.ad.instances": "[{\"edge_hub_prometheus_url\": \"http://edgeHub:9600/metrics\", \"edge_agent_prometheus_url\": \"http://edgeAgent:9600/metrics\"}]"
      }
      
    • Under “Environment Variables”, enable experimental metrics by adding these environment variables (note the double underscores):

      • ExperimentalFeatures__Enabled: true
      • ExperimentalFeatures__EnableMetrics: true
  2. Configure the Edge Hub runtime module as follows:

    • Image version must be 1.0.10 or above.
    • Under “Environment Variables”, enable experimental metrics by adding these environment variables (note the double underscores):
      • ExperimentalFeatures__Enabled: true
      • ExperimentalFeatures__EnableMetrics: true
  3. Install and configure the Datadog Agent as a custom module:

    • Set the module name. For example: datadog-agent.

    • Set the Agent image URI. For example: datadog/agent:7.

    • Under “Environment Variables”, configure your DD_API_KEY. You may also set extra Agent configuration here (see Agent Environment Variables).

    • Under “Container Create Options”, enter the following configuration based on your device OS. Note: NetworkId must correspond to the network name set in the device config.yaml file.

      • Linux:
        {
            "HostConfig": {
                "NetworkMode": "default",
                "Env": ["NetworkId=azure-iot-edge"],
                "Binds": ["/var/run/docker.sock:/var/run/docker.sock"]
            }
        }
        
      • Windows:
        {
            "HostConfig": {
                "NetworkMode": "default",
                "Env": ["NetworkId=nat"],
                "Binds": ["//./pipe/iotedge_moby_engine:/./pipe/docker_engine"]
            }
        }
        
    • Save the Datadog Agent custom module.

  4. Save and deploy changes to your device configuration.

Log collection

  1. Collecting logs is disabled by default in the Datadog Agent, enable it by configuring your Datadog Agent custom module:

    • Under “Environment Variables”, set the DD_LOGS_ENABLED environment variable:

      DD_LOGS_ENABLED: true
      
  2. Configure the Edge Agent and Edge Hub modules: under “Create Options”, add the following label:

    "Labels": {
        "com.datadoghq.ad.logs": "[{\"source\": \"azure.iot_edge\", \"service\": \"<SERVICE>\"}]",
        "...": "..."
    }
    

    Change the service based on your environment.

    Repeat this operation for any custom modules you’d like to collect logs for.

  3. Save and deploy changes to your device configuration.

Validation

Once the Agent has been deployed to the device, run the Agent’s status subcommand and look for azure_iot_edge under the Checks section.

Data Collected

Metrics

azure.iot_edge.edge_hub.gettwin_total
(count)
Total number of GetTwin calls.
azure.iot_edge.edge_hub.messages_received_total
(count)
Total number of messages received from clients.
azure.iot_edge.edge_hub.messages_sent_total
(count)
Total number of messages sent to clients of upstream.
azure.iot_edge.edge_hub.reported_properties_total
(count)
Total reported property updates calls.
azure.iot_edge.edge_hub.message_size_bytes.count
(gauge)
Count of message size from clients.
azure.iot_edge.edge_hub.message_size_bytes.sum
(gauge)
Sum of message size from clients.
Shown as byte
azure.iot_edge.edge_hub.message_size_bytes.quantile
(gauge)
Quantile of message size from clients.
Shown as byte
azure.iot_edge.edge_hub.gettwin_duration_seconds.count
(gauge)
Count of time taken for get twin operations.
azure.iot_edge.edge_hub.gettwin_duration_seconds.sum
(gauge)
Sum of time taken for get twin operations.
Shown as second
azure.iot_edge.edge_hub.gettwin_duration_seconds.quantile
(gauge)
Quantile of time taken for get twin operations.
Shown as second
azure.iot_edge.edge_hub.message_send_duration_seconds.count
(gauge)
Count of time taken to send a message.
azure.iot_edge.edge_hub.message_send_duration_seconds.sum
(gauge)
Sum of time taken to send a message.
Shown as second
azure.iot_edge.edge_hub.message_send_duration_seconds.quantile
(gauge)
Quantile of time taken to send a message.
Shown as second
azure.iot_edge.edge_hub.message_process_duration_seconds.count
(gauge)
Count of time taken to process a message from the queue.
azure.iot_edge.edge_hub.message_process_duration_seconds.sum
(gauge)
Sum of time taken to process a message from the queue.
Shown as second
azure.iot_edge.edge_hub.message_process_duration_seconds.quantile
(gauge)
Quantile of time taken to process a message from the queue.
Shown as second
azure.iot_edge.edge_hub.reported_properties_update_duration_seconds.count
(gauge)
Count of time taken to update reported properties.
azure.iot_edge.edge_hub.reported_properties_update_duration_seconds.sum
(gauge)
Sum of time taken to update reported properties.
Shown as second
azure.iot_edge.edge_hub.reported_properties_update_duration_seconds.quantile
(gauge)
Quantile of time taken to update reported properties.
Shown as second
azure.iot_edge.edge_hub.direct_method_duration_seconds.count
(gauge)
Count of time taken to resolve a direct message.
azure.iot_edge.edge_hub.direct_method_duration_seconds.sum
(gauge)
Sum of time taken to resolve a direct message.
Shown as second
azure.iot_edge.edge_hub.direct_method_duration_seconds.quantile
(gauge)
Quantile of time taken to resolve a direct message.
Shown as second
azure.iot_edge.edge_hub.direct_methods_total
(count)
Total number of direct messages sent.
azure.iot_edge.edge_hub.queue_length
(gauge)
Current length of Edge Hub's queue for a given `priority`.
azure.iot_edge.edge_hub.messages_dropped_total
(count)
Total number of messages removed because of `reason`.
azure.iot_edge.edge_hub.messages_unack_total
(count)
Total number of messages unack because of storage failure.
azure.iot_edge.edge_hub.offline_count_total
(count)
Total number of times Edge Hub went offline.
azure.iot_edge.edge_hub.offline_duration_seconds.count
(gauge)
Count of time Edge Hub was offline.
azure.iot_edge.edge_hub.offline_duration_seconds.sum
(gauge)
Sum of time Edge Hub was offline.
Shown as second
azure.iot_edge.edge_hub.offline_duration_seconds.quantile
(gauge)
Quantile of time Edge Hub was offline.
Shown as second
azure.iot_edge.edge_hub.operation_retry_total
(count)
Total number of times Edge operations were retried.
azure.iot_edge.edge_hub.client_connect_failed_total
(count)
Total number of times clients failed to connect to Edge Hub.
azure.iot_edge.edge_agent.total_time_running_correctly_seconds
(gauge)
The amount of time the module `module_name` was specified in the deployment and was in the running state.
azure.iot_edge.edge_agent.total_time_expected_running_seconds
(gauge)
The amount of time the module `module_name` was specified in the deployment.
azure.iot_edge.edge_agent.module_start_total
(count)
Number of times the Edge Agent asked Docker to start the module `module_name`.
azure.iot_edge.edge_agent.module_stop_total
(count)
Number of times the Edge Agent asked Docker to stop the module `module_name`.
azure.iot_edge.edge_agent.command_latency_seconds.count
(gauge)
Count of how long it took for Docker to execute the given `command`. Possible commands are: create, update, remove, start, stop, restart.
azure.iot_edge.edge_agent.command_latency_seconds.sum
(gauge)
Sum of how long it took for Docker to execute the given `command`. Possible commands are: create, update, remove, start, stop, restart.
Shown as second
azure.iot_edge.edge_agent.command_latency_seconds.quantile
(gauge)
Quantile of how long it took for Docker to execute the given `command`. Possible commands are: create, update, remove, start, stop, restart.
Shown as second
azure.iot_edge.edge_agent.iothub_syncs_total
(count)
Total number of times the Edge Agent attempted to sync its twin with IoT Hub, both successful and unsuccessful. Includes both Edge Agent requesting a twin, and IoT Hub notifying of a twin update.
azure.iot_edge.edge_agent.unsuccessful_iothub_syncs_total
(count)
Total number of times the Edge Agent failed to sync its twin with IoT Hub.
azure.iot_edge.edge_agent.deployment_time_seconds.count
(gauge)
Count of amount of time it took to complete a new deployment after receiving a change.
azure.iot_edge.edge_agent.deployment_time_seconds.sum
(gauge)
Sum of amount of time it took to complete a new deployment after receiving a change.
Shown as second
azure.iot_edge.edge_agent.deployment_time_seconds.quantile
(gauge)
Quantile of amount of time it took to complete a new deployment after receiving a change.
Shown as second
azure.iot_edge.edge_agent.direct_method_invocations_count
(count)
Total number of times a built-in Edge Agent direct method is called, such as Ping or Restart.
azure.iot_edge.edge_agent.host_uptime_seconds
(gauge)
How long the host has been running.
Shown as second
azure.iot_edge.edge_agent.iotedged_uptime_seconds
(gauge)
How long `iotedged` has been running.
Shown as second
azure.iot_edge.edge_agent.available_disk_space_bytes
(gauge)
Amount of space left on the disk `disk_name.
Shown as byte
azure.iot_edge.edge_agent.total_disk_space_bytes
(gauge)
Size of the disk `disk_name.
Shown as byte
azure.iot_edge.edge_agent.used_memory_bytes
(gauge)
Amount of RAM used by all processes in module `module_name`.
Shown as byte
azure.iot_edge.edge_agent.total_memory_bytes
(gauge)
Total amount of RAM available to module `module_name`.
Shown as byte
azure.iot_edge.edge_agent.used_cpu_percent.count
(gauge)
Count of percent of CPU used by all processes in module `module_name`.
azure.iot_edge.edge_agent.used_cpu_percent.sum
(gauge)
Sum of percent of CPU used by all processes in module `module_name`.
Shown as percent
azure.iot_edge.edge_agent.used_cpu_percent.quantile
(gauge)
Quantile of percent of CPU used by all processes in module `module_name`.
Shown as percent
azure.iot_edge.edge_agent.created_pids_total
(gauge)
Total number of processes the module `module_name` has created.
azure.iot_edge.edge_agent.total_network_in_bytes
(count)
Total amount of bytes received from the network by module `module_name`.
Shown as byte
azure.iot_edge.edge_agent.total_network_out_bytes
(count)
Total amount of bytes sent to the network by module `module_name`.
Shown as byte
azure.iot_edge.edge_agent.total_disk_read_bytes
(count)
Total amount of bytes read from the disk by module `module_name`.
Shown as byte
azure.iot_edge.edge_agent.total_disk_write_bytes
(count)
Total amount of bytes written to the disk by module `module_name`.
Shown as byte

Events

Azure IoT Edge does not include any events.

Service Checks

azure.iot_edge.edge_agent.prometheus.health
Returns CRITICAL if the Agent is unable to reach the Edge Agent metrics Prometheus endpoint. Returns OK otherwise.
Statuses: ok, critical

azure.iot_edge.edge_hub.prometheus.health
Returns CRITICAL if the Agent is unable to reach the Edge Hub metrics Prometheus endpoint. Returns OK otherwise.
Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog support.

Further Reading