Datadog Cluster Agent

Supported OS Linux Windows

Integrationv2.4.0

概要

このチェックは、Datadog Agent を通じて Datadog Cluster Agent を監視します。

セットアップ

ホストで実行されている Agent 用にこのチェックをインストールおよび構成する場合は、以下の手順に従ってください。コンテナ環境の場合は、オートディスカバリーのインテグレーションテンプレートのガイドを参照してこの手順を行ってください。

インストール

Datadog-Cluster-Agent チェックは Datadog Agent パッケージに含まれています。 サーバーに追加でインストールする必要はありません。

コンフィギュレーション

  1. datadog_cluster_agent のパフォーマンスデータの収集を開始するには、Agent のコンフィギュレーションディレクトリのルートにある conf.d/ フォルダーの datadog_cluster_agent.d/conf.yaml ファイルを編集します。使用可能なすべてのコンフィギュレーションオプションについては、サンプル datadog_cluster_agent.d/conf.yaml を参照してください。

  2. Agent を再起動します

検証

Agent の status サブコマンドを実行し、Checks セクションで datadog_cluster_agent を探します。

収集データ

メトリクス

datadog.cluster_agent.admission_webhooks.certificate_expiry
(gauge)
Time left before the certificate expires
Shown as hour
datadog.cluster_agent.admission_webhooks.mutation_attempts
(gauge)
Number of pod mutation attempts by mutation type
datadog.cluster_agent.admission_webhooks.mutation_errors
(gauge)
Number of mutation failures by mutation type
datadog.cluster_agent.admission_webhooks.reconcile_errors
(gauge)
Number of reconcile errors per controller
datadog.cluster_agent.admission_webhooks.reconcile_success
(gauge)
Number of reconcile successes per controller
Shown as success
datadog.cluster_agent.admission_webhooks.webhooks_received
(gauge)
Number of mutation webhook requests received
datadog.cluster_agent.aggregator.flush
(count)
Number of metrics/service checks/events flushed by (data_type, state)
datadog.cluster_agent.aggregator.processed
(count)
Amount of metrics/serviceschecks/events processed by the aggregator by datatype
datadog.cluster_agent.api_requests
(count)
Requests made to the cluster agent API by (handler, status)
Shown as request
datadog.cluster_agent.cluster_checks.busyness
(gauge)
Busyness of a node per the number of metrics submitted and average duration of all checks run
datadog.cluster_agent.cluster_checks.configs_dangling
(gauge)
Number of check configurations not dispatched
datadog.cluster_agent.cluster_checks.configs_dispatched
(gauge)
Number of check configurations dispatched by node
datadog.cluster_agent.endpoint_checks.configs_dispatched
(gauge)
Number of endpoint-check configurations dispatched by node
datadog.cluster_agent.cluster_checks.configs_info
(gauge)
Information about check configurations dispatched (node and check ID)
datadog.cluster_agent.cluster_checks.failed_stats_collection
(count)
Total number of unsuccessful stats collection attempts
datadog.cluster_agent.cluster_checks.nodes_reporting
(gauge)
Number of node agents reporting
datadog.cluster_agent.cluster_checks.rebalancing_decisions
(count)
Total number of check rebalancing decisions
datadog.cluster_agent.cluster_checks.rebalancing_duration_seconds
(gauge)
Duration of the check rebalancing algorithm last execution
Shown as second
datadog.cluster_agent.cluster_checks.successful_rebalancing_moves
(count)
Total number of successful check rebalancing decisions
Shown as check
datadog.cluster_agent.cluster_checks.updating_stats_duration_seconds
(gauge)
Duration of collecting stats from check runners and updating cache
Shown as second
datadog.cluster_agent.datadog.rate_limit_queries.limit
(gauge)
Maximum number of queries to the Datadog API allowed in the period by endpoint
Shown as query
datadog.cluster_agent.datadog.rate_limit_queries.period
(gauge)
Period of rate limiting for the Datadog API by endpoint
Shown as second
datadog.cluster_agent.datadog.rate_limit_queries.remaining
(gauge)
Number of queries to the Datadog API remaining before next reset by endpoint
Shown as query
datadog.cluster_agent.datadog.rate_limit_queries.reset
(gauge)
Number of seconds before next reset applied to the Datadog API by endpoint
Shown as second
datadog.cluster_agent.datadog.requests
(count)
Requests made to Datadog by status
Shown as request
datadog.cluster_agent.external_metrics
(gauge)
Number of external metrics tagged
datadog.cluster_agent.external_metrics.datadog_metrics
(gauge)
The label valid is true if the DatadogMetric CR is valid, false otherwise
datadog.cluster_agent.external_metrics.delay_seconds
(gauge)
Freshness of the metric evaluated from querying Datadog
Shown as second
datadog.cluster_agent.external_metrics.processed_value
(gauge)
Value processed from querying Datadog by metric
datadog.cluster_agent.secret_backend.elapsed
(gauge)
The elapsed time of secret backend invocation
Shown as millisecond
datadog.cluster_agent.go.goroutines
(gauge)
Number of goroutines that currently exist
datadog.cluster_agent.go.memstats.alloc_bytes
(gauge)
Number of bytes allocated and still in use
Shown as byte
datadog.cluster_agent.go.threads
(gauge)
Number of OS threads created
Shown as thread
datadog.cluster_agent.autodiscovery.errors
(gauge)
Number of Autodiscovery errors
datadog.cluster_agent.autodiscovery.watched_resources
(gauge)
Number of watched resources (Services and Endpoints)
datadog.cluster_agent.autodiscovery.poll_duration.count
(count)
Autodiscovery poll duration count
datadog.cluster_agent.autodiscovery.poll_duration.sum
(count)
Autodiscovery poll duration sum
datadog.cluster_agent.admission_webhooks.library_injection_attempts
(count)
Number of library injection attempts by language
datadog.cluster_agent.admission_webhooks.library_injection_errors
(count)
Number of library injection failures by language
datadog.cluster_agent.kubernetes_apiserver.kube_events
(count)
Kubernetes events processed by the kubernetes_apiserver check
datadog.cluster_agent.kubernetes_apiserver.emitted_events
(count)
Datadog events emitted by the kubernetes_apiserver check

イベント

Datadog_Cluster_Agent インテグレーションには、イベントは含まれません。

サービスのチェック

datadog.cluster_agent.prometheus.health
Returns CRITICAL if the check cannot access the metrics endpoint. Returns OK otherwise.
Statuses: ok, critical

トラブルシューティング

ご不明な点は、Datadog のサポートチームまでお問合せください。