概要
This check monitors Flux through the Datadog Agent. Flux is a set of continuous and progressive delivery solutions for Kubernetes that is open and extensible.
セットアップ
ホストで実行されている Agent 用にこのチェックをインストールおよび構成する場合は、以下の手順に従ってください。コンテナ環境の場合は、オートディスカバリーのインテグレーションテンプレートのガイドを参照してこの手順を行ってください。
インストール
Starting from Agent release 7.51.0, the Fluxcd check is included in the Datadog Agent package. No additional installation is needed on your server.
For older versions of the Agent, use these steps to install the integration.
構成
This integration supports collecting metrics and logs from the following Flux services:
- helm-controller
- kustomize-controller
- notification-controller
- source-controller
You can pick and choose which services you monitor depending on your needs.
メトリクスの収集
This is an example configuration with Kubernetes annotations on your Flux pods. See the sample configuration file for all available configuration options.
apiVersion: v1
kind: Pod
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/manager.checks: |-
      {
        "fluxcd": {
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:8080/metrics"
            }
          ]
        }
      }
    # (...)
spec:
  containers:
    - name: 'manager'
# (...)
ログ収集
Agent バージョン 6.0 以降で利用可能
Flux logs can be collected from the different Flux pods through Kubernetes. Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes Log Collection.
オートディスカバリーのインテグレーションテンプレートのガイドを参照して、次のパラメーターを適用してください。
| パラメーター | 値 | 
|---|
| <LOG_CONFIG> | {"source": "fluxcd", "service": "<SERVICE_NAME>"} | 
検証
Agent の status サブコマンドを実行し、Checks セクションで fluxcd を探します。
収集データ
メトリクス
|  |  | 
|---|
| fluxcd.controller.runtime.active.workers (gauge)
 | Number of currently used workers per controller. Shown as worker
 | 
| fluxcd.controller.runtime.max.concurrent.reconciles (gauge)
 | Maximum number of concurrent reconciles per controller. | 
| fluxcd.controller.runtime.reconcile.count (count)
 | Total number of reconciliations per controller. | 
| fluxcd.controller.runtime.reconcile.errors.count (count)
 | Total number of reconciliation errors per controller. Shown as error
 | 
| fluxcd.controller.runtime.reconcile.time.seconds.bucket (count)
 | Bucket of length of time per reconciliation per controller. | 
| fluxcd.controller.runtime.reconcile.time.seconds.count (count)
 | Count of length of time per reconciliation per controller. | 
| fluxcd.controller.runtime.reconcile.time.seconds.sum (count)
 | Sum of length of time per reconciliation per controller. Shown as second
 | 
| fluxcd.gotk.reconcile.condition (gauge)
 | The current condition status of a GitOps Toolkit resource reconciliation. | 
| fluxcd.gotk.reconcile.duration.seconds.bucket (count)
 | Bucket of the duration in seconds of a GitOps Toolkit resource reconciliation. | 
| fluxcd.gotk.reconcile.duration.seconds.count (count)
 | Count of the duration in seconds of a GitOps Toolkit resource reconciliation. | 
| fluxcd.gotk.reconcile.duration.seconds.sum (count)
 | Sum of the duration in seconds of a GitOps Toolkit resource reconciliation. Shown as second
 | 
| fluxcd.gotk.suspend.status (gauge)
 | The current suspend status of a GitOps Toolkit resource. | 
| fluxcd.leader_election_master_status (gauge)
 | Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. ’name’ is the string used to identify the lease. Make sure to group by name. | 
| fluxcd.process.cpu_seconds.count (count)
 | Total user and system CPU time spent in seconds. Shown as second
 | 
| fluxcd.process.max_fds (gauge)
 | Maximum number of open file descriptors. | 
| fluxcd.process.open_fds (gauge)
 | Number of open file descriptors. | 
| fluxcd.process.resident_memory (gauge)
 | Resident memory size in bytes. Shown as byte
 | 
| fluxcd.process.start_time (gauge)
 | Start time of the process since unix epoch in seconds. Shown as second
 | 
| fluxcd.process.virtual_memory (gauge)
 | Virtual memory size in bytes. Shown as byte
 | 
| fluxcd.process.virtual_memory.max (gauge)
 | Maximum amount of virtual memory available in bytes. Shown as byte
 | 
| fluxcd.rest_client_requests.count (count)
 | Number of HTTP requests, partitioned by status code, method, and host. Shown as request
 | 
| fluxcd.workqueue.adds.count (count)
 | Total number of adds handled by a workqueue. | 
| fluxcd.workqueue.depth (gauge)
 | Current depth of a workqueue. | 
| fluxcd.workqueue.longest_running_processor (gauge)
 | The number of seconds that has the longest running processor for a workqueue that has been running. Shown as second
 | 
| fluxcd.workqueue.retries.count (count)
 | Total number of retries handled by workqueue. | 
| fluxcd.workqueue.unfinished_work (gauge)
 | The number of seconds of work that has been done that is in progress and hasn’t been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases. Shown as second
 | 
イベント
fluxcd インテグレーションには、イベントは含まれません。
サービスチェック
fluxcd.openmetrics.health
Returns CRITICAL if the check cannot access the OpenMetrics metrics endpoint of Fluxcd.
Statuses: ok, critical
トラブルシューティング
ご不明な点は、Datadog のサポートチームまでお問合せください。
その他の参考資料
お役に立つドキュメント、リンクや記事: