Argo Workflows

Supported OS Linux Windows Mac OS

Integration version2.1.0

Overview

This check monitors Argo Workflows through the Datadog Agent.

Setup

Follow the instructions below to install and configure this check for an Agent running in your Kubernetes environment. For more information about configuration in containerized environments, see the Autodiscovery Integration Templates for guidance.

Installation

Starting from Agent release 7.53.0, the Argo Workflows check is included in the Datadog Agent package. No additional installation is needed in your environment.

This check uses OpenMetrics to collect metrics from the OpenMetrics endpoint.

Configuration

The Argo Workflows Workflow Controller has Prometheus-formatted metrics available at /metrics on port 9090. For the Agent to start collecting metrics, the Workflow Controller pod needs to be annotated. For more information about annotations, refer to the Autodiscovery Integration Templates for guidance. You can find additional configuration options by reviewing the sample argo_workflows.d/conf.yaml.

The only parameter required for configuring the Argo Workflows check is:

  • openmetrics_endpoint: This parameter should be set to the location where the Prometheus-formatted metrics are exposed. The default port is 9090. In containerized environments, %%host%% should be used for host autodetection.
apiVersion: v1
kind: Pod
# (...)
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/argo-workflows.checks: |
      {
        "argo_workflows": {
          "init_config": {},
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:9090/metrics"
            }
          ]
        }
      }      
    # (...)
spec:
  containers:
    - name: 'argo-workflows'
# (...)

Log collection

Available for Agent versions >6.0

Argo Workflows logs can be collected from the different Argo Workflows pods through Kubernetes. Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes Log Collection.

See the Autodiscovery Integration Templates for guidance on applying the parameters below.

ParameterValue
<LOG_CONFIG>{"source": "argo_workflows", "service": "<SERVICE_NAME>"}

Validation

Run the Agent’s status subcommand and look for argo_workflows under the Checks section.

Data Collected

Metrics

argo_workflows.current_workflows
(gauge)
Number of Workflows currently accessible by the controller by status (refreshed every 15s)
argo_workflows.error.count
(count)
Number of errors encountered by the controller by cause
Shown as error
argo_workflows.go.gc.duration.seconds.count
(count)
The summary count of garbage collection cycles in the Argo Workflows instance
argo_workflows.go.gc.duration.seconds.quantile
(gauge)
The pause duration of garbage collection cycles in the Argo Workflows instance by quantile
argo_workflows.go.gc.duration.seconds.sum
(count)
The sum of the pause duration of garbage collection cycles in the Argo Workflows instance
Shown as second
argo_workflows.go.goroutines
(gauge)
Number of goroutines that currently exist.
argo_workflows.go.info
(gauge)
Information about the Go environment.
argo_workflows.go.memstats.alloc_bytes
(gauge)
Number of bytes allocated and still in use.
Shown as byte
argo_workflows.go.memstats.alloc_bytes.count
(count)
Total number of bytes allocated, even if freed.
Shown as byte
argo_workflows.go.memstats.buck_hash.sys_bytes
(gauge)
Number of bytes used by the profiling bucket hash table.
Shown as byte
argo_workflows.go.memstats.frees.count
(count)
Total number of frees.
argo_workflows.go.memstats.gc.sys_bytes
(gauge)
Number of bytes used for garbage collection system metadata.
Shown as byte
argo_workflows.go.memstats.heap.alloc_bytes
(gauge)
Number of heap bytes allocated and still in use.
Shown as byte
argo_workflows.go.memstats.heap.idle_bytes
(gauge)
Number of heap bytes waiting to be used.
Shown as byte
argo_workflows.go.memstats.heap.inuse_bytes
(gauge)
Number of heap bytes that are in use.
Shown as byte
argo_workflows.go.memstats.heap.objects
(gauge)
Number of allocated objects.
argo_workflows.go.memstats.heap.released_bytes
(gauge)
Number of heap bytes released to OS.
Shown as byte
argo_workflows.go.memstats.heap.sys_bytes
(gauge)
Number of heap bytes obtained from system.
Shown as byte
argo_workflows.go.memstats.last_gc_time_seconds
(gauge)
Number of seconds since 1970 of last garbage collection.
Shown as second
argo_workflows.go.memstats.lookups.count
(count)
Total number of pointer lookups.
argo_workflows.go.memstats.mallocs.count
(count)
Total number of mallocs.
argo_workflows.go.memstats.mcache.inuse_bytes
(gauge)
Number of bytes in use by mcache structures.
Shown as byte
argo_workflows.go.memstats.mcache.sys_bytes
(gauge)
Number of bytes used for mcache structures obtained from system.
Shown as byte
argo_workflows.go.memstats.mspan.inuse_bytes
(gauge)
Number of bytes in use by mspan structures.
Shown as byte
argo_workflows.go.memstats.mspan.sys_bytes
(gauge)
Number of bytes used for mspan structures obtained from system.
Shown as byte
argo_workflows.go.memstats.next.gc_bytes
(gauge)
Number of heap bytes when next garbage collection will take place.
Shown as byte
argo_workflows.go.memstats.other.sys_bytes
(gauge)
Number of bytes used for other system allocations.
Shown as byte
argo_workflows.go.memstats.stack.inuse_bytes
(gauge)
Number of bytes in use by the stack allocator.
Shown as byte
argo_workflows.go.memstats.stack.sys_bytes
(gauge)
Number of bytes obtained from system for stack allocator.
Shown as byte
argo_workflows.go.memstats.sys_bytes
(gauge)
Number of bytes obtained from system.
Shown as byte
argo_workflows.go.threads
(gauge)
Number of OS threads created.
argo_workflows.k8s_request.count
(count)
Number of kubernetes requests executed. https://argo-workflows.readthedocs.io/en/release-3.5/metrics/#argoworkflowsk8srequesttotal
Shown as request
argo_workflows.log_messages.count
(count)
Total number of log messages.
Shown as message
argo_workflows.operation_duration_seconds.bucket
(count)
The count of observations in the histogram of durations of operations split into buckets by upper bound.
argo_workflows.operation_duration_seconds.count
(count)
The total count of observations in the histogram of durations of operations
argo_workflows.operation_duration_seconds.sum
(count)
Total time in seconds spent on operations
Shown as second
argo_workflows.pods
(gauge)
Number of Pods from Workflows currently accessible by the controller by status (refreshed every 15s)
argo_workflows.queue_adds.count
(count)
Adds to the queue
argo_workflows.queue_depth
(gauge)
Depth of the queue
argo_workflows.queue_latency.bucket
(count)
The count of observations for the time that objects spend waiting in the queue. Split into buckets by upper bounds
argo_workflows.queue_latency.count
(count)
The total count of observations for the time that objects spend waiting in the queue.
argo_workflows.queue_latency.sum
(count)
The total time that objects spend waiting in the queue.
Shown as second
argo_workflows.workers_busy
(gauge)
Number of workers currently busy
Shown as worker
argo_workflows.workflow_condition
(gauge)
Workflow condition. https://argo-workflows.readthedocs.io/en/release-3.5/metrics/#argoworkflowsworkflow_condition
argo_workflows.workflows_processed.count
(count)
Number of workflow updates processed

Events

The Argo Workflows integration does not include any events.

Service Checks

argo_workflows.openmetrics.health
Returns CRITICAL if the check cannot access the OpenMetrics metrics endpoint of Argo Workflows.
Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog support.

Further Reading

Additional helpful documentation, links, and articles: