Supported OS Linux Windows Mac OS

インテグレーションバージョン1.0.2

概要

このチェックは、Datadog Agent を通じて Argo Rollouts を監視します。

セットアップ

Kubernetes 環境で実行されている Agent 用にこのチェックをインストールおよび構成する場合は、以下の手順に従ってください。コンテナ環境での構成の詳細については、オートディスカバリーのインテグレーションテンプレートのガイドを参照してください。

インストール

Agent リリース 7.53.0 から、Argo Rollouts チェックは Datadog Agent パッケージに含まれています。お使いの環境に追加インストールする必要はありません。

このチェックは、OpenMetrics を使って、Argo Rollouts が公開している OpenMetrics エンドポイントからメトリクス を収集します。これには Python 3 が必要です。

構成

Argo Rollouts コントローラーでは、ポート 8090/metrics で Prometheus 形式のメトリクスが容易に利用可能です。Agent がメトリクスの収集を開始するには、Argo Rollouts ポッドにアノテーションを付ける必要があります。アノテーションの詳細については、オートディスカバリーインテグレーションテンプレートを参照してください。その他の構成オプションについては、サンプル argo_rollouts.d/conf.yaml を参照してください。

: リストされたメトリクスは、利用可能な場合にのみ収集できます。一部のメトリクスは、特定のアクションが実行されたときにのみ生成されます。例えば、argo_rollouts.info.replicas.updated メトリクスは、レプリカの更新後にのみ公開されます。

Argo Rollouts チェック の構成に必要なパラメーターはこれだけです。

  • openmetrics_endpoint: このパラメーターには、Prometheus 形式のメトリクスが公開される場所を設定する必要があります。デフォルトのポートは 8090 です。コンテナ環境では、%%host%%ホストの自動検出に使用します。
apiVersion: v1
kind: Pod
# (...)
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/argo-rollouts.checks: |
      {
        "argo_rollouts": {
          "init_config": {},
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:8090/metrics"
            }
          ]
        }
      }      
    # (...)
spec:
  containers:
    - name: 'argo-rollouts'
# (...)

ログ収集

Agent バージョン 6.0 以降で利用可能

Argo Rollouts のログは、Kubernetes を通じて、異なる Argo Rollouts ポッドから収集することができます。Datadog Agent では、ログの収集はデフォルトで無効になっています。有効にするには、Kubernetes ログ収集を参照してください。

オートディスカバリーのインテグレーションテンプレートのガイドを参照して、次のパラメーターを適用してください。

パラメーター
<LOG_CONFIG>{"source": "argo_rollouts", "service": "<SERVICE_NAME>"}

検証

Agent の status サブコマンドを実行し、Checks セクションで argo_rollouts を探します。

収集データ

メトリクス

argo_rollouts.analysis.run.info
(gauge)
Information about analysis run
argo_rollouts.analysis.run.metric.phase
(gauge)
Information on the duration of a specific metric in the Analysis Run
argo_rollouts.analysis.run.metric.type
(gauge)
Information on the type of a specific metric in the Analysis Runs
argo_rollouts.analysis.run.phase
(gauge)
Information on the state of the Analysis Run
argo_rollouts.analysis.run.reconcile.bucket
(count)
The number of observations in the Analysis Run reconciliation performance histogram by upper_bound buckets
argo_rollouts.analysis.run.reconcile.count
(count)
The number of observations in the Analysis Run reconciliation performance histogram
argo_rollouts.analysis.run.reconcile.error.count
(count)
Error occurring during the analysis run
argo_rollouts.analysis.run.reconcile.sum
(count)
The duration sum of all observations in the Analysis Run reconciliation performance histogram
argo_rollouts.controller.clientset.k8s.request.count
(count)
The total number of Kubernetes requests executed during application reconciliation
argo_rollouts.experiment.info
(gauge)
Information about Experiment
argo_rollouts.experiment.phase
(gauge)
Information on the state of the experiment
argo_rollouts.experiment.reconcile.bucket
(count)
The number of observations in the Experiments reconciliation performance histogram by upper_bound buckets
argo_rollouts.experiment.reconcile.count
(count)
The number of observations in the Experiments reconciliation performance histogram
argo_rollouts.experiment.reconcile.error.count
(count)
Error occurring during the experiment
argo_rollouts.experiment.reconcile.sum
(count)
The duration sum of all observations in the Experiments reconciliation performance histogram
argo_rollouts.go.gc.duration.seconds.count
(count)
The summary count of garbage collection cycles in the Argo Rollouts instance
Shown as second
argo_rollouts.go.gc.duration.seconds.quantile
(gauge)
A summary of the pause duration of garbage collection cycles in the Argo Rollouts instance
Shown as second
argo_rollouts.go.gc.duration.seconds.sum
(count)
The sum of the pause duration of garbage collection cycles in the Argo Rollouts instance
Shown as second
argo_rollouts.go.goroutines
(gauge)
The number of goroutines that currently exist in the Argo Rollouts instance
argo_rollouts.go.info
(gauge)
Metric containing the Go version as a tag
argo_rollouts.go.memstats.alloc_bytes
(gauge)
The number of bytes allocated and still in use in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.alloc_bytes.count
(count)
The monotonic count of bytes allocated and still in use in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.buck_hash.sys_bytes
(gauge)
The number of bytes used by the profiling bucket hash table in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.frees.count
(count)
The total number of frees in the Argo Rollouts instance
argo_rollouts.go.memstats.gc.cpu_fraction
(gauge)
The fraction of this program's available CPU time used by the GC since the program started in the Argo Rollouts instance
Shown as fraction
argo_rollouts.go.memstats.gc.sys_bytes
(gauge)
The number of bytes used for garbage collection system metadata in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.heap.alloc_bytes
(gauge)
The number of heap bytes allocated and still in use in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.heap.idle_bytes
(gauge)
The number of heap bytes waiting to be used in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.heap.inuse_bytes
(gauge)
The number of heap bytes that are in use in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.heap.objects
(gauge)
The number of allocated objects in the Argo Rollouts instance
Shown as object
argo_rollouts.go.memstats.heap.released_bytes
(gauge)
The number of heap bytes released to the OS in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.heap.sys_bytes
(gauge)
The number of heap bytes obtained from system in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.lookups.count
(count)
The number of pointer lookups
argo_rollouts.go.memstats.mallocs.count
(count)
The number of mallocs
argo_rollouts.go.memstats.mcache.inuse_bytes
(gauge)
The number of bytes in use by mcache structures in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.mcache.sys_bytes
(gauge)
The number of bytes used for mcache structures obtained from system in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.mspan.inuse_bytes
(gauge)
The number of bytes in use by mspan structures in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.mspan.sys_bytes
(gauge)
The number of bytes used for mspan structures obtained from system in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.next.gc_bytes
(gauge)
The number of heap bytes when next garbage collection takes place in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.other.sys_bytes
(gauge)
The number of bytes used for other system allocations in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.stack.inuse_bytes
(gauge)
The number of bytes in use by the stack allocator in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.stack.sys_bytes
(gauge)
The number of bytes obtained from system for stack allocator in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.memstats.sys_bytes
(gauge)
The number of bytes obtained from system in the Argo Rollouts instance
Shown as byte
argo_rollouts.go.threads
(gauge)
The number of OS threads created in the Argo Rollouts instance
Shown as thread
argo_rollouts.notification.send.bucket
(count)
The number of observations in the Notification send performance histogram by upper_bound buckets
argo_rollouts.notification.send.count
(count)
The number of observations in the Notification send performance histogram
argo_rollouts.notification.send.sum
(count)
The duration sum of all observations in the Notification send performance histogram
argo_rollouts.process.cpu.seconds.count
(count)
The total user and system CPU time spent in seconds in the Argo Rollouts instance
Shown as second
argo_rollouts.process.max_fds
(gauge)
The maximum number of open file descriptors in the Argo Rollouts instance
argo_rollouts.process.open_fds
(gauge)
The number of open file descriptors in the Argo Rollouts instance
argo_rollouts.process.resident_memory.bytes
(gauge)
The resident memory size in bytes in the Argo Rollouts instance
Shown as byte
argo_rollouts.process.start_time.seconds
(gauge)
The start time of the process since unix epoch in seconds in the Argo Rollouts instance
Shown as second
argo_rollouts.process.virtual_memory.bytes
(gauge)
The virtual memory size in bytes in the Argo Rollouts instance
Shown as byte
argo_rollouts.process.virtual_memory.max_bytes
(gauge)
The maximum amount of virtual memory available in bytes in the Argo Rollouts instance
Shown as byte
argo_rollouts.rollout.events.count
(count)
The count of rollout events
argo_rollouts.rollout.info
(gauge)
Information about rollout
argo_rollouts.rollout.info.replicas.available
(gauge)
The number of available replicas per rollout
argo_rollouts.rollout.info.replicas.desired
(gauge)
The number of desired replicas per rollout
argo_rollouts.rollout.info.replicas.unavailable
(gauge)
The number of unavailable replicas per rollout
argo_rollouts.rollout.info.replicas.updated
(gauge)
The number of updated replicas per rollout
argo_rollouts.rollout.phase
(gauge)
Information on the state of the rollout. This will be soon to be deprecated by Argo Rollouts, use argo_rollouts.rollout.info instead
argo_rollouts.rollout.reconcile.bucket
(count)
The number of observations in the Rollout reconciliation performance histogram by upper_bound buckets
argo_rollouts.rollout.reconcile.count
(count)
The number of observations in the Rollout reconciliation performance histogram
argo_rollouts.rollout.reconcile.error.count
(count)
Error occurring during the rollout
argo_rollouts.rollout.reconcile.sum
(count)
The duration sum of all observations in the Rollout reconciliation performance histogram
argo_rollouts.workqueue.adds.count
(count)
The total number of adds handled by workqueue
argo_rollouts.workqueue.depth
(gauge)
The current depth of the workqueue
argo_rollouts.workqueue.longest.running_processor.seconds
(gauge)
The number of seconds the longest running worqueue processor has been running
Shown as second
argo_rollouts.workqueue.queue.duration.seconds.bucket
(count)
The histogram bucket of how long in seconds an item stays in the workqueue before being requested
Shown as second
argo_rollouts.workqueue.queue.duration.seconds.count
(count)
The total number of events in the workqueue duration histogram
argo_rollouts.workqueue.queue.duration.seconds.sum
(count)
The sum the of events counted in the workqueue duration histogram
argo_rollouts.workqueue.retries.count
(count)
The total number of retries handled by workqueue
argo_rollouts.workqueue.unfinished_work.seconds
(gauge)
The number of seconds of work that has been done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases
Shown as second
argo_rollouts.workqueue.work.duration.seconds.bucket
(count)
The histogram bucket for time in seconds it takes for processing of an item in the workqueue
Shown as second
argo_rollouts.workqueue.work.duration.seconds.count
(count)
The total number of events in the workqueue item processing duration histogram
argo_rollouts.workqueue.work.duration.seconds.sum
(count)
The sum of events in the workqueue item processing duration histogram

イベント

Argo Rollouts インテグレーションには、イベントは含まれません。

サービスチェック

argo_rollouts.openmetrics.health
Returns CRITICAL if the Agent is unable to connect to the Argo Rollouts OpenMetrics endpoint, otherwise returns OK.
Statuses: ok, critical

トラブルシューティング

ご不明な点は、Datadog のサポートチームまでお問合せください。

その他の参考資料

お役に立つドキュメント、リンクや記事: