Azure IoT Edge is a fully managed service to deploy Cloud workloads to run on Internet of Things (IoT) Edge devices via standard containers.
Use the Datadog-Azure IoT Edge integration to collect metrics and health status from IoT Edge devices.
Note: This integration requires IoT Edge runtime version 1.0.10 or above.
Follow the instructions below to install and configure this check for an IoT Edge device running on a device host.
The Azure IoT Edge check is included in the Datadog Agent package.
No additional installation is needed on your device.
Configure the IoT Edge device so that the Agent runs as a custom module. Follow the Microsoft documentation on deploying Azure IoT Edge modules for information on installing and working with custom modules for Azure IoT Edge.
Follow the steps below to configure the IoT Edge device, runtime modules, and the Datadog Agent to start collecting IoT Edge metrics.
Configure the Edge Agent runtime module as follows:
Image version must be 1.0.10
or above.
Under “Create Options”, add the following Labels
. Edit the com.datadoghq.ad.instances
label as appropriate. See the sample azure_iot_edge.d/conf.yaml for all available configuration options. See the documentation on Docker Integrations Autodiscovery for more information on labels-based integration configuration.
"Labels": {
"com.datadoghq.ad.check_names": "[\"azure_iot_edge\"]",
"com.datadoghq.ad.init_configs": "[{}]",
"com.datadoghq.ad.instances": "[{\"edge_hub_prometheus_url\": \"http://edgeHub:9600/metrics\", \"edge_agent_prometheus_url\": \"http://edgeAgent:9600/metrics\"}]"
}
Under “Environment Variables”, enable experimental metrics by adding these environment variables (note the double underscores):
ExperimentalFeatures__Enabled
: true
ExperimentalFeatures__EnableMetrics
: true
Configure the Edge Hub runtime module as follows:
1.0.10
or above.ExperimentalFeatures__Enabled
: true
ExperimentalFeatures__EnableMetrics
: true
Install and configure the Datadog Agent as a custom module:
Set the module name. For example: datadog-agent
.
Set the Agent image URI. For example: datadog/agent:7
.
Under “Environment Variables”, configure your DD_API_KEY
. You may also set extra Agent configuration here (see Agent Environment Variables).
Under “Container Create Options”, enter the following configuration based on your device OS. Note: NetworkId
must correspond to the network name set in the device config.yaml
file.
{
"HostConfig": {
"NetworkMode": "default",
"Env": ["NetworkId=azure-iot-edge"],
"Binds": ["/var/run/docker.sock:/var/run/docker.sock"]
}
}
{
"HostConfig": {
"NetworkMode": "default",
"Env": ["NetworkId=nat"],
"Binds": ["//./pipe/iotedge_moby_engine:/./pipe/docker_engine"]
}
}
Save the Datadog Agent custom module.
Save and deploy changes to your device configuration.
Collecting logs is disabled by default in the Datadog Agent, enable it by configuring your Datadog Agent custom module:
Under “Environment Variables”, set the DD_LOGS_ENABLED
environment variable:
DD_LOGS_ENABLED: true
Configure the Edge Agent and Edge Hub modules: under “Create Options”, add the following label:
"Labels": {
"com.datadoghq.ad.logs": "[{\"source\": \"azure.iot_edge\", \"service\": \"<SERVICE>\"}]",
"...": "..."
}
Change the service
based on your environment.
Repeat this operation for any custom modules you’d like to collect logs for.
Save and deploy changes to your device configuration.
Once the Agent has been deployed to the device, run the Agent’s status subcommand and look for azure_iot_edge
under the Checks section.
azure.iot_edge.edge_hub.gettwin_total (count) | Total number of GetTwin calls. |
azure.iot_edge.edge_hub.messages_received_total (count) | Total number of messages received from clients. |
azure.iot_edge.edge_hub.messages_sent_total (count) | Total number of messages sent to clients of upstream. |
azure.iot_edge.edge_hub.reported_properties_total (count) | Total reported property updates calls. |
azure.iot_edge.edge_hub.message_size_bytes.count (gauge) | Count of message size from clients. |
azure.iot_edge.edge_hub.message_size_bytes.sum (gauge) | Sum of message size from clients. Shown as byte |
azure.iot_edge.edge_hub.message_size_bytes.quantile (gauge) | Quantile of message size from clients. Shown as byte |
azure.iot_edge.edge_hub.gettwin_duration_seconds.count (gauge) | Count of time taken for get twin operations. |
azure.iot_edge.edge_hub.gettwin_duration_seconds.sum (gauge) | Sum of time taken for get twin operations. Shown as second |
azure.iot_edge.edge_hub.gettwin_duration_seconds.quantile (gauge) | Quantile of time taken for get twin operations. Shown as second |
azure.iot_edge.edge_hub.message_send_duration_seconds.count (gauge) | Count of time taken to send a message. |
azure.iot_edge.edge_hub.message_send_duration_seconds.sum (gauge) | Sum of time taken to send a message. Shown as second |
azure.iot_edge.edge_hub.message_send_duration_seconds.quantile (gauge) | Quantile of time taken to send a message. Shown as second |
azure.iot_edge.edge_hub.message_process_duration_seconds.count (gauge) | Count of time taken to process a message from the queue. |
azure.iot_edge.edge_hub.message_process_duration_seconds.sum (gauge) | Sum of time taken to process a message from the queue. Shown as second |
azure.iot_edge.edge_hub.message_process_duration_seconds.quantile (gauge) | Quantile of time taken to process a message from the queue. Shown as second |
azure.iot_edge.edge_hub.reported_properties_update_duration_seconds.count (gauge) | Count of time taken to update reported properties. |
azure.iot_edge.edge_hub.reported_properties_update_duration_seconds.sum (gauge) | Sum of time taken to update reported properties. Shown as second |
azure.iot_edge.edge_hub.reported_properties_update_duration_seconds.quantile (gauge) | Quantile of time taken to update reported properties. Shown as second |
azure.iot_edge.edge_hub.direct_method_duration_seconds.count (gauge) | Count of time taken to resolve a direct message. |
azure.iot_edge.edge_hub.direct_method_duration_seconds.sum (gauge) | Sum of time taken to resolve a direct message. Shown as second |
azure.iot_edge.edge_hub.direct_method_duration_seconds.quantile (gauge) | Quantile of time taken to resolve a direct message. Shown as second |
azure.iot_edge.edge_hub.direct_methods_total (count) | Total number of direct messages sent. |
azure.iot_edge.edge_hub.queue_length (gauge) | Current length of Edge Hub's queue for a given `priority`. |
azure.iot_edge.edge_hub.messages_dropped_total (count) | Total number of messages removed because of `reason`. |
azure.iot_edge.edge_hub.messages_unack_total (count) | Total number of messages unack because of storage failure. |
azure.iot_edge.edge_hub.offline_count_total (count) | Total number of times Edge Hub went offline. |
azure.iot_edge.edge_hub.offline_duration_seconds.count (gauge) | Count of time Edge Hub was offline. |
azure.iot_edge.edge_hub.offline_duration_seconds.sum (gauge) | Sum of time Edge Hub was offline. Shown as second |
azure.iot_edge.edge_hub.offline_duration_seconds.quantile (gauge) | Quantile of time Edge Hub was offline. Shown as second |
azure.iot_edge.edge_hub.operation_retry_total (count) | Total number of times Edge operations were retried. |
azure.iot_edge.edge_hub.client_connect_failed_total (count) | Total number of times clients failed to connect to Edge Hub. |
azure.iot_edge.edge_agent.total_time_running_correctly_seconds (gauge) | The amount of time the module `module_name` was specified in the deployment and was in the running state. |
azure.iot_edge.edge_agent.total_time_expected_running_seconds (gauge) | The amount of time the module `module_name` was specified in the deployment. |
azure.iot_edge.edge_agent.module_start_total (count) | Number of times the Edge Agent asked Docker to start the module `module_name`. |
azure.iot_edge.edge_agent.module_stop_total (count) | Number of times the Edge Agent asked Docker to stop the module `module_name`. |
azure.iot_edge.edge_agent.command_latency_seconds.count (gauge) | Count of how long it took for Docker to execute the given `command`. Possible commands are: create, update, remove, start, stop, restart. |
azure.iot_edge.edge_agent.command_latency_seconds.sum (gauge) | Sum of how long it took for Docker to execute the given `command`. Possible commands are: create, update, remove, start, stop, restart. Shown as second |
azure.iot_edge.edge_agent.command_latency_seconds.quantile (gauge) | Quantile of how long it took for Docker to execute the given `command`. Possible commands are: create, update, remove, start, stop, restart. Shown as second |
azure.iot_edge.edge_agent.iothub_syncs_total (count) | Total number of times the Edge Agent attempted to sync its twin with IoT Hub, both successful and unsuccessful. Includes both Edge Agent requesting a twin, and IoT Hub notifying of a twin update. |
azure.iot_edge.edge_agent.unsuccessful_iothub_syncs_total (count) | Total number of times the Edge Agent failed to sync its twin with IoT Hub. |
azure.iot_edge.edge_agent.deployment_time_seconds.count (gauge) | Count of amount of time it took to complete a new deployment after receiving a change. |
azure.iot_edge.edge_agent.deployment_time_seconds.sum (gauge) | Sum of amount of time it took to complete a new deployment after receiving a change. Shown as second |
azure.iot_edge.edge_agent.deployment_time_seconds.quantile (gauge) | Quantile of amount of time it took to complete a new deployment after receiving a change. Shown as second |
azure.iot_edge.edge_agent.direct_method_invocations_count (count) | Total number of times a built-in Edge Agent direct method is called, such as Ping or Restart. |
azure.iot_edge.edge_agent.host_uptime_seconds (gauge) | How long the host has been running. Shown as second |
azure.iot_edge.edge_agent.iotedged_uptime_seconds (gauge) | How long `iotedged` has been running. Shown as second |
azure.iot_edge.edge_agent.available_disk_space_bytes (gauge) | Amount of space left on the disk `disk_name. Shown as byte |
azure.iot_edge.edge_agent.total_disk_space_bytes (gauge) | Size of the disk `disk_name. Shown as byte |
azure.iot_edge.edge_agent.used_memory_bytes (gauge) | Amount of RAM used by all processes in module `module_name`. Shown as byte |
azure.iot_edge.edge_agent.total_memory_bytes (gauge) | Total amount of RAM available to module `module_name`. Shown as byte |
azure.iot_edge.edge_agent.used_cpu_percent.count (gauge) | Count of percent of CPU used by all processes in module `module_name`. |
azure.iot_edge.edge_agent.used_cpu_percent.sum (gauge) | Sum of percent of CPU used by all processes in module `module_name`. Shown as percent |
azure.iot_edge.edge_agent.used_cpu_percent.quantile (gauge) | Quantile of percent of CPU used by all processes in module `module_name`. Shown as percent |
azure.iot_edge.edge_agent.created_pids_total (gauge) | Total number of processes the module `module_name` has created. |
azure.iot_edge.edge_agent.total_network_in_bytes (count) | Total amount of bytes received from the network by module `module_name`. Shown as byte |
azure.iot_edge.edge_agent.total_network_out_bytes (count) | Total amount of bytes sent to the network by module `module_name`. Shown as byte |
azure.iot_edge.edge_agent.total_disk_read_bytes (count) | Total amount of bytes read from the disk by module `module_name`. Shown as byte |
azure.iot_edge.edge_agent.total_disk_write_bytes (count) | Total amount of bytes written to the disk by module `module_name`. Shown as byte |
azure.iot_edge.edge_agent.prometheus.health:
Returns CRITICAL
if the Agent is unable to reach the Edge Agent metrics Prometheus endpoint. Returns OK
otherwise.
azure.iot_edge.edge_hub.prometheus.health:
Returns CRITICAL
if the Agent is unable to reach the Edge Hub metrics Prometheus endpoint. Returns OK
otherwise.
Azure IoT Edge does not include any events.
Need help? Contact Datadog support.
On this Page