fluxcd

Supported OS Linux Windows Mac OS

Integration version2.0.0

Overview

This check monitors Flux through the Datadog Agent. Flux is a set of continuous and progressive delivery solutions for Kubernetes that is open and extensible.

Setup

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.

Installation

Starting from Agent release 7.51.0, the Fluxcd check is included in the Datadog Agent package. No additional installation is needed on your server.

For older versions of the Agent, use these steps to install the integration.

Configuration

This integration supports collecting metrics and logs from the following Flux services:

  • helm-controller
  • kustomize-controller
  • notification-controller
  • source-controller

You can pick and choose which services you monitor depending on your needs.

Metric collection

This is an example configuration with Kubernetes annotations on your Flux pods. See the sample configuration file for all available configuration options.

apiVersion: v1
kind: Pod
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/manager.checks: |-
      {
        "fluxcd": {
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:8080/metrics"
            }
          ]
        }
      }      
    # (...)
spec:
  containers:
    - name: 'manager'
# (...)

Log collection

Available for Agent versions >6.0

Flux logs can be collected from the different Flux pods through Kubernetes. Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes Log Collection.

See the Autodiscovery Integration Templates for guidance on applying the parameters below.

ParameterValue
<LOG_CONFIG>{"source": "fluxcd", "service": "<SERVICE_NAME>"}

Validation

Run the Agent’s status subcommand and look for fluxcd under the Checks section.

Data Collected

Metrics

fluxcd.controller.runtime.active.workers
(gauge)
Number of currently used workers per controller.
Shown as worker
fluxcd.controller.runtime.max.concurrent.reconciles
(gauge)
Maximum number of concurrent reconciles per controller.
fluxcd.controller.runtime.reconcile.count
(count)
Total number of reconciliations per controller.
fluxcd.controller.runtime.reconcile.errors.count
(count)
Total number of reconciliation errors per controller.
Shown as error
fluxcd.controller.runtime.reconcile.time.seconds.bucket
(count)
Bucket of length of time per reconciliation per controller.
fluxcd.controller.runtime.reconcile.time.seconds.count
(count)
Count of length of time per reconciliation per controller.
fluxcd.controller.runtime.reconcile.time.seconds.sum
(count)
Sum of length of time per reconciliation per controller.
Shown as second
fluxcd.gotk.reconcile.condition
(gauge)
The current condition status of a GitOps Toolkit resource reconciliation.
fluxcd.gotk.reconcile.duration.seconds.bucket
(count)
Bucket of the duration in seconds of a GitOps Toolkit resource reconciliation.
fluxcd.gotk.reconcile.duration.seconds.count
(count)
Count of the duration in seconds of a GitOps Toolkit resource reconciliation.
fluxcd.gotk.reconcile.duration.seconds.sum
(count)
Sum of the duration in seconds of a GitOps Toolkit resource reconciliation.
Shown as second
fluxcd.gotk.suspend.status
(gauge)
The current suspend status of a GitOps Toolkit resource.
fluxcd.leader_election_master_status
(gauge)
Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. 'name' is the string used to identify the lease. Make sure to group by name.
fluxcd.process.cpu_seconds.count
(count)
Total user and system CPU time spent in seconds.
Shown as second
fluxcd.process.max_fds
(gauge)
Maximum number of open file descriptors.
fluxcd.process.open_fds
(gauge)
Number of open file descriptors.
fluxcd.process.resident_memory
(gauge)
Resident memory size in bytes.
Shown as byte
fluxcd.process.start_time
(gauge)
Start time of the process since unix epoch in seconds.
Shown as second
fluxcd.process.virtual_memory
(gauge)
Virtual memory size in bytes.
Shown as byte
fluxcd.process.virtual_memory.max
(gauge)
Maximum amount of virtual memory available in bytes.
Shown as byte
fluxcd.rest_client_requests.count
(count)
Number of HTTP requests, partitioned by status code, method, and host.
Shown as request
fluxcd.workqueue.adds.count
(count)
Total number of adds handled by a workqueue.
fluxcd.workqueue.depth
(gauge)
Current depth of a workqueue.
fluxcd.workqueue.longest_running_processor
(gauge)
The number of seconds that has the longest running processor for a workqueue that has been running.
Shown as second
fluxcd.workqueue.retries.count
(count)
Total number of retries handled by workqueue.
fluxcd.workqueue.unfinished_work
(gauge)
The number of seconds of work that has been done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.
Shown as second

Events

The fluxcd integration does not include any events.

Service Checks

fluxcd.openmetrics.health
Returns CRITICAL if the check cannot access the OpenMetrics metrics endpoint of Fluxcd.
Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog support.

Further Reading

Additional helpful documentation, links, and articles: