Argo Rollouts

Supported OS Linux Windows Mac OS

Integration version1.0.0

Overview

This check monitors Argo Rollouts through the Datadog Agent.

Setup

Follow the instructions below to install and configure this check for an Agent running in your Kubernetes environment. For more information about configuration in containerized environments, see the Autodiscovery Integration Templates for guidance.

Installation

Starting from Agent release 7.53.0, the Argo Rollouts check is included in the Datadog Agent package. No additional installation is needed in your environment.

This check uses OpenMetrics to collect metrics from the OpenMetrics endpoint that Karpenter exposes, which requires Python 3.

Configuration

The Argo Rollouts controller has Prometheus-formatted metrics readily available at /metrics on port 8090. For the Agent to start collecting metrics, the Argo Rollouts pods need to be annotated. For more information about annotations, refer to the Autodiscovery Integration Templates for guidance. You can find additional configuration options by reviewing the sample argo_rollouts.d/conf.yaml.

Note: The listed metrics can only be collected if they are available. Some metrics are generated only when certain actions are performed. For example, the argo_rollout.info.replicas.updated metric is exposed only after a replica is updated.

The only parameter required for configuring the Argo Rollouts check is:

  • openmetrics_endpoint: This parameter should be set to the location where the Prometheus-formatted metrics are exposed. The default port is 8090. In containerized environments, %%host%% should be used for host autodetection.
apiVersion: v1
kind: Pod
# (...)
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/argo-rollouts.checks: |
      {
        "argo_rollouts": {
          "init_config": {},
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:8090/metrics",
            }
          ]
        }
      }      
    # (...)
spec:
  containers:
    - name: 'argo-rollouts'
# (...)

Log collection

Available for Agent versions >6.0

Argo Rollouts logs can be collected from the different Argo Rollouts pods through Kubernetes. Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes Log Collection.

See the Autodiscovery Integration Templates for guidance on applying the parameters below.

ParameterValue
<LOG_CONFIG>{"source": "argo_rollouts", "service": "<SERVICE_NAME>"}

Validation

Run the Agent’s status subcommand and look for argo_rollouts under the Checks section.

Data Collected

Metrics

argo_rollouts.analysis.run.info
(gauge)
Information about analysis run
argo_rollouts.analysis.run.metric.phase
(gauge)
Information on the duration of a specific metric in the Analysis Run
argo_rollouts.analysis.run.metric.type
(gauge)
Information on the type of a specific metric in the Analysis Runs
argo_rollouts.analysis.run.phase
(gauge)
Information on the state of the Analysis Run
argo_rollouts.analysis.run.reconcile.bucket
(count)
The number of observations in the Analysis Run reconciliation performance histogram by upper_bound buckets
argo_rollouts.analysis.run.reconcile.count
(count)
The number of observations in the Analysis Run reconciliation performance histogram
argo_rollouts.analysis.run.reconcile.error.count
(count)
Error occurring during the analysis run
argo_rollouts.analysis.run.reconcile.sum
(count)
The duration sum of all observations in the Analysis Run reconciliation performance histogram
argo_rollouts.controller.clientset.k8s.request.count
(count)
The total number of Kubernetes requests executed during application reconciliation
argo_rollouts.experiment.info
(gauge)
Information about Experiment
argo_rollouts.experiment.phase
(gauge)
Information on the state of the experiment
argo_rollouts.experiment.reconcile.bucket
(count)
The number of observations in the Experiments reconciliation performance histogram by upper_bound buckets
argo_rollouts.experiment.reconcile.count
(count)
The number of observations in the Experiments reconciliation performance histogram
argo_rollouts.experiment.reconcile.error.count
(count)
Error occurring during the experiment
argo_rollouts.experiment.reconcile.sum
(count)
The duration sum of all observations in the Experiments reconciliation performance histogram
argo_rollouts.go.gc.duration.seconds.count
(count)
The summary count of garbage collection cycles in the Argo Rollouts instance
Shown as second
argo_rollouts.go.gc.duration.seconds.quantile
(gauge)
A summary of the pause duration of garbage collection cycles in the Argo Rollouts instance
Shown as second
argo_rollouts.go.gc.duration.seconds.sum
(count)
The sum of the pause duration of garbage collection cycles in the Argo Rollouts instance
Shown as second
argo_rollouts.go.goroutines
(gauge)
The number of goroutines that currently exist in the Argo Rollouts instance
argo_rollouts.go.info
(gauge)
Metric containing the Go version as a tag
argo_rollouts.go.memstats.alloc_bytes
(gauge)
The number of bytes allocated and still in use in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.alloc_bytes.count
(count)
The monotonic count of bytes allocated and still in use in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.buck_hash.sys_bytes
(gauge)
The number of bytes used by the profiling bucket hash table in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.frees.count
(count)
The total number of frees in the Argo Rollouts instance
argo_rollouts.go.memstats.gc.cpu_fraction
(gauge)
The fraction of this program's available CPU time used by the GC since the program started in the Argo Rollouts instance
Shown as fraction
argo_rollouts.go.memstats.gc.sys_bytes
(gauge)
The number of bytes used for garbage collection system metadata in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.heap.alloc_bytes
(gauge)
The number of heap bytes allocated and still in use in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.heap.idle_bytes
(gauge)
The number of heap bytes waiting to be used in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.heap.inuse_bytes
(gauge)
The number of heap bytes that are in use in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.heap.objects
(gauge)
The number of allocated objects in the Argo Rollouts instance
Shown as object
argo_rollouts.go.memstats.heap.released_bytes
(gauge)
The number of heap bytes released to the OS in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.heap.sys_bytes
(gauge)
The number of heap bytes obtained from system in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.lookups.count
(count)
The number of pointer lookups
argo_rollouts.go.memstats.mallocs.count
(count)
The number of mallocs
argo_rollouts.go.memstats.mcache.inuse_bytes
(gauge)
The number of bytes in use by mcache structures in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.mcache.sys_bytes
(gauge)
The number of bytes used for mcache structures obtained from system in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.mspan.inuse_bytes
(gauge)
The number of bytes in use by mspan structures in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.mspan.sys_bytes
(gauge)
The number of bytes used for mspan structures obtained from system in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.next.gc_bytes
(gauge)
The number of heap bytes when next garbage collection takes place in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.other.sys_bytes
(gauge)
The number of bytes used for other system allocations in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.stack.inuse_bytes
(gauge)
The number of bytes in use by the stack allocator in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.stack.sys_bytes
(gauge)
The number of bytes obtained from system for stack allocator in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.sys_bytes
(gauge)
The number of bytes obtained from system in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.threads
(gauge)
The number of OS threads created in the Argo Rollouts instance
Shown as thread
argo_rollouts.notification.send.bucket
(count)
The number of observations in the Notification send performance histogram by upper_bound buckets
argo_rollouts.notification.send.count
(count)
The number of observations in the Notification send performance histogram
argo_rollouts.notification.send.sum
(count)
The duration sum of all observations in the Notification send performance histogram
argo_rollouts.process.cpu.seconds.count
(count)
The total user and system CPU time spent in seconds in the Argo Rollouts instance
Shown as second
argo_rollouts.process.max_fds
(gauge)
The maximum number of open file descriptors in the Argo Rollouts instance
argo_rollouts.process.open_fds
(gauge)
The number of open file descriptors in the Argo Rollouts instance
argo_rollouts.process.resident_memory.bytes
(gauge)
The resident memory size in bytes in the Argo Rollouts instance
Shown as byte
argo_rollouts.process.start_time.seconds
(gauge)
The start time of the process since unix epoch in seconds in the Argo Rollouts instance
Shown as second
argo_rollouts.process.virtual_memory.bytes
(gauge)
The virtual memory size in bytes in the Argo Rollouts instance
Shown as byte
argo_rollouts.process.virtual_memory.max_bytes
(gauge)
The maximum amount of virtual memory available in bytes in the Argo Rollouts instance
Shown as byte
argo_rollouts.rollout.events.count
(count)
The count of rollout events
argo_rollouts.rollout.info
(gauge)
Information about rollout
argo_rollouts.rollout.info.replicas.available
(gauge)
The number of available replicas per rollout
argo_rollouts.rollout.info.replicas.desired
(gauge)
The number of desired replicas per rollout
argo_rollouts.rollout.info.replicas.unavailable
(gauge)
The number of unavailable replicas per rollout
argo_rollouts.rollout.info.replicas.updated
(gauge)
The number of updated replicas per rollout
argo_rollouts.rollout.phase
(gauge)
Information on the state of the rollout. This will be soon to be deprecated by Argo Rollouts, use argo_rollouts.rollout.info instead
argo_rollouts.rollout.reconcile.bucket
(count)
The number of observations in the Rollout reconciliation performance histogram by upper_bound buckets
argo_rollouts.rollout.reconcile.count
(count)
The number of observations in the Rollout reconciliation performance histogram
argo_rollouts.rollout.reconcile.error.count
(count)
Error occurring during the rollout
argo_rollouts.rollout.reconcile.sum
(count)
The duration sum of all observations in the Rollout reconciliation performance histogram
argo_rollouts.workqueue.adds.count
(count)
The total number of adds handled by workqueue
argo_rollouts.workqueue.depth
(gauge)
The current depth of the workqueue
argo_rollouts.workqueue.longest.running_processor.seconds
(gauge)
The number of seconds the longest running worqueue processor has been running
Shown as second
argo_rollouts.workqueue.queue.duration.seconds.bucket
(count)
The histogram bucket of how long in seconds an item stays in the workqueue before being requested
Shown as second
argo_rollouts.workqueue.queue.duration.seconds.count
(count)
The total number of events in the workqueue duration histogram
argo_rollouts.workqueue.queue.duration.seconds.sum
(count)
The sum the of events counted in the workqueue duration histogram
argo_rollouts.workqueue.retries.count
(count)
The total number of retries handled by workqueue
argo_rollouts.workqueue.unfinished_work.seconds
(gauge)
The number of seconds of work that has been done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases
Shown as second
argo_rollouts.workqueue.work.duration.seconds.bucket
(count)
The histogram bucket for time in seconds it takes for processing of an item in the workqueue
Shown as second
argo_rollouts.workqueue.work.duration.seconds.count
(count)
The total number of events in the workqueue item processing duration histogram
argo_rollouts.workqueue.work.duration.seconds.sum
(count)
The sum of events in the workqueue item processing duration histogram

Events

The Argo Rollouts integration does not include any events.

Service Checks

argo_rollouts.openmetrics.health
Returns CRITICAL if the Agent is unable to connect to the Argo Rollouts OpenMetrics endpoint, otherwise returns OK.
Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog support.