Datadog 클러스터 에이전트

Supported OS Mac OS Windows

통합 버전3.1.1

개요

본 점검은 Datadog 에이전트로 Datadog 클러스터 에이전트를 모니터링합니다.

설정

호스트에서 실행 중인 에이전트의 경우 다음 지침에 따라 설치하고 구성하세요. 컨테이너화된 환경의 경우 자동탐지 통합 템플릿에 다음 지침을 적용하는 방법이 안내되어 있습니다.

설치

Datadog-Cluster-Agent 점검은 Datadog 에이전트 패키지에 포함됩니다. 서버에 추가 설치할 필요가 없습니다.

설정

  1. 에이전트의 설정 디렉터리의 루트에 있는 conf.d/폴더의 datadog_cluster_agent.d/conf.yaml 파일을 편집하여 datadog_cluster_agent 성능 데이터 수집을 시작합니다. 사용 가능한 모든 설정 옵션은 datadog_cluster_agent.d/conf.yaml 샘플을 참조하세요.

  2. 에이전트를 다시 시작합니다.

검증

에이전트의 상태 하위 명령을 실행하고 점검 섹션에서 datadog_cluster_agent를 검색합니다.

수집한 데이터

메트릭

datadog.cluster_agent.admission_webhooks.certificate_expiry
(gauge)
Time left before the certificate expires
Shown as hour
datadog.cluster_agent.admission_webhooks.cws_exec_instrumentation_attempts.count
(count)
CWS exec Instrumentation attempts count
datadog.cluster_agent.admission_webhooks.cws_exec_instrumentation_attempts.sum
(count)
CWS exec Instrumentation attempts sum
datadog.cluster_agent.admission_webhooks.cws_pod_instrumentation_attempts.count
(count)
CWS pod Instrumentation attempts count
datadog.cluster_agent.admission_webhooks.cws_pod_instrumentation_attempts.sum
(count)
CWS pod Instrumentation attempts sum
datadog.cluster_agent.admission_webhooks.library_injection_attempts
(count)
Number of library injection attempts by language
datadog.cluster_agent.admission_webhooks.library_injection_errors
(count)
Number of library injection failures by language
datadog.cluster_agent.admission_webhooks.mutation_attempts
(gauge)
Number of pod mutation attempts by mutation type
datadog.cluster_agent.admission_webhooks.mutation_errors
(gauge)
Number of mutation failures by mutation type
datadog.cluster_agent.admission_webhooks.patcher.attempts
(count)
Number of patch attempts
datadog.cluster_agent.admission_webhooks.patcher.completed
(count)
Number of completed patch attempts
datadog.cluster_agent.admission_webhooks.patcher.errors
(count)
Number of patch errors
datadog.cluster_agent.admission_webhooks.rc_provider.configs
(gauge)
Number of valid remote configuration
datadog.cluster_agent.admission_webhooks.rc_provider.invalid_configs
(gauge)
Number of invalid remote configurations
datadog.cluster_agent.admission_webhooks.reconcile_errors
(gauge)
Number of reconcile errors per controller
datadog.cluster_agent.admission_webhooks.reconcile_success
(gauge)
Number of reconcile successes per controller
Shown as success
datadog.cluster_agent.admission_webhooks.response_duration.count
(count)
Webhook response duration count
datadog.cluster_agent.admission_webhooks.response_duration.sum
(count)
Webhook response duration sum
Shown as second
datadog.cluster_agent.admission_webhooks.webhooks_received
(gauge)
Number of mutation webhook requests received
datadog.cluster_agent.aggregator.flush
(count)
Number of metrics/service checks/events flushed by (data_type, state)
datadog.cluster_agent.aggregator.processed
(count)
Amount of metrics/serviceschecks/events processed by the aggregator by datatype
datadog.cluster_agent.api_requests
(count)
Requests made to the cluster agent API by (handler, status)
Shown as request
datadog.cluster_agent.autodiscovery.errors
(gauge)
Number of Autodiscovery errors
datadog.cluster_agent.autodiscovery.poll_duration.count
(count)
Autodiscovery poll duration count
datadog.cluster_agent.autodiscovery.poll_duration.sum
(count)
Autodiscovery poll duration sum
Shown as second
datadog.cluster_agent.autodiscovery.watched_resources
(gauge)
Number of watched resources (Services and Endpoints)
datadog.cluster_agent.cluster_checks.busyness
(gauge)
Busyness of a node per the number of metrics submitted and average duration of all checks run
datadog.cluster_agent.cluster_checks.configs_dangling
(gauge)
Number of check configurations not dispatched
datadog.cluster_agent.cluster_checks.configs_dispatched
(gauge)
Number of check configurations dispatched by node
datadog.cluster_agent.cluster_checks.configs_info
(gauge)
Information about check configurations dispatched (node and check ID)
datadog.cluster_agent.cluster_checks.failed_stats_collection
(count)
Total number of unsuccessful stats collection attempts
datadog.cluster_agent.cluster_checks.nodes_reporting
(gauge)
Number of node agents reporting
datadog.cluster_agent.cluster_checks.rebalancing_decisions
(count)
Total number of check rebalancing decisions
datadog.cluster_agent.cluster_checks.rebalancing_duration_seconds
(gauge)
Duration of the check rebalancing algorithm last execution
Shown as second
datadog.cluster_agent.cluster_checks.successful_rebalancing_moves
(count)
Total number of successful check rebalancing decisions
Shown as check
datadog.cluster_agent.cluster_checks.updating_stats_duration_seconds
(gauge)
Duration of collecting stats from check runners and updating cache
Shown as second
datadog.cluster_agent.datadog.rate_limit_queries.limit
(gauge)
Maximum number of queries to the Datadog API allowed in the period by endpoint
Shown as query
datadog.cluster_agent.datadog.rate_limit_queries.period
(gauge)
Period of rate limiting for the Datadog API by endpoint
Shown as second
datadog.cluster_agent.datadog.rate_limit_queries.remaining
(gauge)
Number of queries to the Datadog API remaining before next reset by endpoint
Shown as query
datadog.cluster_agent.datadog.rate_limit_queries.remaining_min
(gauge)
Minimum number of queries remaining before next reset observed during an expiration interval of 2*refresh period
Shown as query
datadog.cluster_agent.datadog.rate_limit_queries.reset
(gauge)
Number of seconds before next reset applied to the Datadog API by endpoint
Shown as second
datadog.cluster_agent.datadog.requests
(count)
Requests made to Datadog by status
Shown as request
datadog.cluster_agent.endpoint_checks.configs_dispatched
(gauge)
Number of endpoint-check configurations dispatched by node
datadog.cluster_agent.external_metrics
(gauge)
Number of external metrics tagged
datadog.cluster_agent.external_metrics.api_elapsed.count
(count)
Count of API Requests received
datadog.cluster_agent.external_metrics.api_elapsed.sum
(count)
Count of API Requests received
datadog.cluster_agent.external_metrics.api_requests
(gauge)
Count of API Requests received
datadog.cluster_agent.external_metrics.datadog_metrics
(gauge)
The label valid is true if the DatadogMetric CR is valid, false otherwise
datadog.cluster_agent.external_metrics.delay_seconds
(gauge)
Freshness of the metric evaluated from querying Datadog
Shown as second
datadog.cluster_agent.external_metrics.processed_value
(gauge)
Value processed from querying Datadog by metric
datadog.cluster_agent.go.goroutines
(gauge)
Number of goroutines that currently exist
datadog.cluster_agent.go.memstats.alloc_bytes
(gauge)
Number of bytes allocated and still in use
Shown as byte
datadog.cluster_agent.go.threads
(gauge)
Number of OS threads created
Shown as thread
datadog.cluster_agent.kubernetes_apiserver.emitted_events
(count)
Datadog events emitted by the kubernetes_apiserver check
datadog.cluster_agent.kubernetes_apiserver.kube_events
(count)
Kubernetes events processed by the kubernetes_apiserver check
datadog.cluster_agent.language_detection_dca_handler.processed_requests
(count)
The number of process language detection requests processed by the handler
datadog.cluster_agent.language_detection_patcher.patches
(count)
The number of patch requests sent by the patcher to the kube api server
datadog.cluster_agent.secret_backend.elapsed
(gauge)
The elapsed time of secret backend invocation
Shown as millisecond
datadog.cluster_agent.tagger.stored_entities
(gauge)
Number of entities stored in the tagger
datadog.cluster_agent.tagger.updated_entities
(count)
Number of updates made to entities in the tagger
datadog.cluster_agent.workloadmeta.events_received
(count)
Number of events received by workloadmeta
datadog.cluster_agent.workloadmeta.notifications_sent
(count)
Number of notifications sent by workloadmeta to its subscribers
datadog.cluster_agent.workloadmeta.stored_entities
(gauge)
Number of entities stored in workloadmeta
datadog.cluster_agent.workloadmeta.subscribers
(gauge)
Number of workloadmeta subscribers

이벤트

Datadog-Cluster-Agent 통합은 이벤트를 포함하지 않습니다.

서비스 점검

datadog.cluster_agent.prometheus.health
Returns CRITICAL if the check cannot access the metrics endpoint. Returns OK otherwise.
Statuses: ok, critical

트러블슈팅

도움이 필요하신가요? Datadog 지원팀에 문의하세요.