KubeVirt Controller

Supported OS Linux Windows Mac OS

Integration version1.1.0
This integration is in public beta and should be enabled on production workloads with caution.

Overview

This check monitors KubeVirt Controller through the Datadog Agent.

Setup

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.

Installation

The KubeVirt Controller check is included in the Datadog Agent package. No additional installation is needed on your server.

Configuration

The main use case to run the kubevirt_controller check is as a cluster level check.

In order to do that, you will need to update some RBAC permissions to give the datadog-agent service account read-only access to theKubeVirt resources by following the steps below:

  1. Bind the kubevirt.io:view ClusterRole to the datadog-agent service account:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: datadog-agent-kubevirt
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kubevirt.io:view
subjects:
  - kind: ServiceAccount
  name: datadog-agent
  namespace: default
  1. Annotate the pods template of your virt-controller deployment by patching the KubeVirt resource as follows:
apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
  name: kubevirt
  namespace: kubevirt
spec:
  certificateRotateStrategy: {}
  configuration: {}
  customizeComponents:
    patches:
    - resourceType: Deployment
        resourceName: virt-controller
        patch: '{"spec": {"template":{"metadata":{"annotations":{ "ad.datadoghq.com/virt-controller.check_names": "[\"kubevirt_controller\"]", "ad.datadoghq.com/virt-controller.init_configs": "[{}]", "ad.datadoghq.com/virt-controller.instances": "[{ \"kubevirt_controller_metrics_endpoint\": \"https://%%host%%:%%port%%/metrics\",\"kubevirt_controller_healthz_endpoint\": \"https://%%host%%:%%port%%/healthz\", \"kube_namespace\":\"%%kube_namespace%%\", \"kube_pod_name\":\"%%kube_pod_name%%\", \"tls_verify\": \"false\"}]"}}}}}'
        type: strategic

Replace <DD_CLUSTER_NAME> with the name you want for your cluster.

Validation

Run the Cluster Agent’s clusterchecks subcommand inside your Cluster Agent container and look for the kubevirt_controller check under the Checks section.

Data Collected

Metrics

kubevirt_controller.can_connect
(gauge)
Value of 1 if the agent can connect to the KubeVirt Controller, and 0 otherwise.
kubevirt_controller.virt_controller.leading_status
(gauge)
Indication for an operating virt-controller.
kubevirt_controller.virt_controller.ready_status
(gauge)
Indication for a virt-controller that is ready to take the lead.
kubevirt_controller.vm.error_status_last_transition_timestamp_seconds.count
(count)
Virtual Machine last transition timestamp to error status.
Shown as second
kubevirt_controller.vm.migrating_status_last_transition_timestamp_seconds.count
(count)
Virtual Machine last transition timestamp to migrating status.
Shown as second
kubevirt_controller.vm.non_running_status_last_transition_timestamp_seconds.count
(count)
Virtual Machine last transition timestamp to paused/stopped status.
Shown as second
kubevirt_controller.vm.running_status_last_transition_timestamp_seconds.count
(count)
Virtual Machine last transition timestamp to running status.
Shown as second
kubevirt_controller.vm.starting_status_last_transition_timestamp_seconds.count
(count)
Virtual Machine last transition timestamp to starting status.
Shown as second
kubevirt_controller.vmi.migrations_in_pending_phase
(gauge)
Number of current pending migrations.
kubevirt_controller.vmi.migrations_in_running_phase
(gauge)
Number of current running migrations.
kubevirt_controller.vmi.migrations_in_scheduling_phase
(gauge)
Number of current scheduling migrations.
kubevirt_controller.vmi.non_evictable
(gauge)
Indication for a VirtualMachine that its eviction strategy is set to Live Migration but is not migratable.
kubevirt_controller.vmi.number_of_outdated
(gauge)
Indication for the total number of VirtualMachineInstance workloads that are not running within the most up-to-date version of the virt-launcher environment.
kubevirt_controller.vmi.phase_count
(gauge)
Sum of VMIs per phase and node. phase can be one of the following: [Pending, Scheduling, Scheduled, Running, Succeeded, Failed, Unknown].
kubevirt_controller.vmi.phase_transition_time_from_creation_seconds.bucket
(count)
Histogram of VM phase transitions duration from creation time in seconds.
Shown as second
kubevirt_controller.vmi.phase_transition_time_from_creation_seconds.count
(count)
Histogram of VM phase transitions duration from creation time in seconds.
Shown as second
kubevirt_controller.vmi.phase_transition_time_from_creation_seconds.sum
(count)
Histogram of VM phase transitions duration from creation time in seconds.
Shown as second
kubevirt_controller.vmi.phase_transition_time_from_deletion_seconds.bucket
(count)
Histogram of VM phase transitions duration from deletion time in seconds.
Shown as second
kubevirt_controller.vmi.phase_transition_time_from_deletion_seconds.count
(count)
Histogram of VM phase transitions duration from deletion time in seconds.
Shown as second
kubevirt_controller.vmi.phase_transition_time_from_deletion_seconds.sum
(count)
Histogram of VM phase transitions duration from deletion time in seconds.
Shown as second
kubevirt_controller.vmi.phase_transition_time_seconds.bucket
(count)
Histogram of VM phase transitions duration between different phases in seconds.
Shown as second
kubevirt_controller.vmi.phase_transition_time_seconds.count
(count)
Histogram of VM phase transitions duration between different phases in seconds.
Shown as second
kubevirt_controller.vmi.phase_transition_time_seconds.sum
(count)
Histogram of VM phase transitions duration between different phases in seconds.
Shown as second
kubevirt_controller.workqueue.adds.count
(count)
Total number of adds handled by workqueue
Shown as item
kubevirt_controller.workqueue.depth
(gauge)
Current depth of workqueue
Shown as item
kubevirt_controller.workqueue.longest_running_processor_seconds
(gauge)
How many seconds has the longest running processor for workqueue been running.
Shown as second
kubevirt_controller.workqueue.queue_duration_seconds.bucket
(count)
How long an item stays in workqueue before being requested.
Shown as second
kubevirt_controller.workqueue.queue_duration_seconds.count
(count)
How long an item stays in workqueue before being requested.
Shown as second
kubevirt_controller.workqueue.queue_duration_seconds.sum
(count)
How long an item stays in workqueue before being requested.
Shown as second
kubevirt_controller.workqueue.retries.count
(count)
Total number of retries handled by workqueue.
kubevirt_controller.workqueue.unfinished_work_seconds
(gauge)
How many seconds of work has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.
Shown as second
kubevirt_controller.workqueue.work_duration_seconds.bucket
(count)
How long in seconds processing an item from workqueue takes.
Shown as second
kubevirt_controller.workqueue.work_duration_seconds.count
(count)
How long in seconds processing an item from workqueue takes.
Shown as second
kubevirt_controller.workqueue.work_duration_seconds.sum
(count)
How long in seconds processing an item from workqueue takes.
Shown as second

Events

The KubeVirt Controller integration does not include any events.

Service Checks

The KubeVirt Controller integration does not include any service checks.

Troubleshooting

Need help? Contact Datadog support.