Agent de cluster Datadog

Supported OS Linux Mac OS Windows

Intégration2.2.0

Présentation

Ce check permet de surveiller l’Agent de cluster Datadog avec l’Agent Datadog.

Configuration

Suivez les instructions ci-dessous pour installer et configurer ce check lorsque l’Agent est exécuté sur un host. Consultez la documentation relative aux modèles d’intégration Autodiscovery pour découvrir comment appliquer ces instructions à des environnements conteneurisés.

Installation

Le check Datadog-Cluster-Agent est inclus avec le package de l’Agent Datadog. Vous n’avez rien d’autre à installer sur votre serveur.

Configuration

  1. Modifiez le fichier datadog_cluster_agent.d/conf.yaml dans le dossier conf.d/ à la racine du répertoire de configuration de votre Agent pour commencer à recueillir vos données de performance datadog_cluster_agent. Consultez le fichier d’exemple datadog_cluster_agent.d/conf.yaml pour découvrir toutes les options de configuration disponibles.

  2. Redémarrez l’Agent.

Validation

Lancez la sous-commande status de l’Agent et cherchez datadog_cluster_agent dans la section Checks.

Données collectées

Métriques

datadog.cluster_agent.admission_webhooks.certificate_expiry
(gauge)
Time left before the certificate expires
Shown as hour
datadog.cluster_agent.admission_webhooks.cws_exec_instrumentation_attempts.count
(count)
CWS exec Instrumentation attempts count
datadog.cluster_agent.admission_webhooks.cws_exec_instrumentation_attempts.sum
(count)
CWS exec Instrumentation attempts sum
datadog.cluster_agent.admission_webhooks.cws_pod_instrumentation_attempts.count
(count)
CWS pod Instrumentation attempts count
datadog.cluster_agent.admission_webhooks.cws_pod_instrumentation_attempts.sum
(count)
CWS pod Instrumentation attempts sum
datadog.cluster_agent.admission_webhooks.library_injection_attempts
(count)
Number of library injection attempts by language
datadog.cluster_agent.admission_webhooks.library_injection_errors
(count)
Number of library injection failures by language
datadog.cluster_agent.admission_webhooks.mutation_attempts
(gauge)
Number of pod mutation attempts by mutation type
datadog.cluster_agent.admission_webhooks.mutation_errors
(gauge)
Number of mutation failures by mutation type
datadog.cluster_agent.admission_webhooks.patcher.attempts
(count)
Number of patch attempts
datadog.cluster_agent.admission_webhooks.patcher.completed
(count)
Number of completed patch attempts
datadog.cluster_agent.admission_webhooks.patcher.errors
(count)
Number of patch errors
datadog.cluster_agent.admission_webhooks.rc_provider.configs
(gauge)
Number of valid remote configuration
datadog.cluster_agent.admission_webhooks.rc_provider.invalid_configs
(gauge)
Number of invalid remote configurations
datadog.cluster_agent.admission_webhooks.reconcile_errors
(gauge)
Number of reconcile errors per controller
datadog.cluster_agent.admission_webhooks.reconcile_success
(gauge)
Number of reconcile successes per controller
Shown as success
datadog.cluster_agent.admission_webhooks.response_duration.count
(count)
Webhook response duration count
datadog.cluster_agent.admission_webhooks.response_duration.sum
(count)
Webhook response duration sum
Shown as second
datadog.cluster_agent.admission_webhooks.validation_attempts
(gauge)
Number of pod validation attempts by validation type
datadog.cluster_agent.admission_webhooks.webhooks_received
(gauge)
Number of webhook requests received
datadog.cluster_agent.aggregator.flush
(count)
Number of metrics/service checks/events flushed by (data_type, state)
datadog.cluster_agent.aggregator.processed
(count)
Amount of metrics/serviceschecks/events processed by the aggregator by datatype
datadog.cluster_agent.api_requests
(count)
Requests made to the cluster agent API by (handler, status)
Shown as request
datadog.cluster_agent.autodiscovery.errors
(gauge)
Number of Autodiscovery errors
datadog.cluster_agent.autodiscovery.poll_duration.count
(count)
Autodiscovery poll duration count
datadog.cluster_agent.autodiscovery.poll_duration.sum
(count)
Autodiscovery poll duration sum
Shown as second
datadog.cluster_agent.autodiscovery.watched_resources
(gauge)
Number of watched resources (Services and Endpoints)
datadog.cluster_agent.cluster_checks.busyness
(gauge)
Busyness of a node per the number of metrics submitted and average duration of all checks run
datadog.cluster_agent.cluster_checks.configs_dangling
(gauge)
Number of check configurations not dispatched
datadog.cluster_agent.cluster_checks.configs_dispatched
(gauge)
Number of check configurations dispatched by node
datadog.cluster_agent.cluster_checks.configs_info
(gauge)
Information about check configurations dispatched (node and check ID)
datadog.cluster_agent.cluster_checks.failed_stats_collection
(count)
Total number of unsuccessful stats collection attempts
datadog.cluster_agent.cluster_checks.nodes_reporting
(gauge)
Number of node agents reporting
datadog.cluster_agent.cluster_checks.rebalancing_decisions
(count)
Total number of check rebalancing decisions
datadog.cluster_agent.cluster_checks.rebalancing_duration_seconds
(gauge)
Duration of the check rebalancing algorithm last execution
Shown as second
datadog.cluster_agent.cluster_checks.successful_rebalancing_moves
(count)
Total number of successful check rebalancing decisions
Shown as check
datadog.cluster_agent.cluster_checks.updating_stats_duration_seconds
(gauge)
Duration of collecting stats from check runners and updating cache
Shown as second
datadog.cluster_agent.datadog.rate_limit_queries.limit
(gauge)
Maximum number of queries to the Datadog API allowed in the period by endpoint
Shown as query
datadog.cluster_agent.datadog.rate_limit_queries.period
(gauge)
Period of rate limiting for the Datadog API by endpoint
Shown as second
datadog.cluster_agent.datadog.rate_limit_queries.remaining
(gauge)
Number of queries to the Datadog API remaining before next reset by endpoint
Shown as query
datadog.cluster_agent.datadog.rate_limit_queries.remaining_min
(gauge)
Minimum number of queries remaining before next reset observed during an expiration interval of 2*refresh period
Shown as query
datadog.cluster_agent.datadog.rate_limit_queries.reset
(gauge)
Number of seconds before next reset applied to the Datadog API by endpoint
Shown as second
datadog.cluster_agent.datadog.requests
(count)
Requests made to Datadog by status
Shown as request
datadog.cluster_agent.endpoint_checks.configs_dispatched
(gauge)
Number of endpoint-check configurations dispatched by node
datadog.cluster_agent.external_metrics
(gauge)
Number of external metrics tagged
datadog.cluster_agent.external_metrics.api_elapsed.count
(count)
Count of API Requests received
datadog.cluster_agent.external_metrics.api_elapsed.sum
(count)
Count of API Requests received
datadog.cluster_agent.external_metrics.api_requests
(gauge)
Count of API Requests received
datadog.cluster_agent.external_metrics.datadog_metrics
(gauge)
The label valid is true if the DatadogMetric CR is valid, false otherwise
datadog.cluster_agent.external_metrics.delay_seconds
(gauge)
Freshness of the metric evaluated from querying Datadog
Shown as second
datadog.cluster_agent.external_metrics.processed_value
(gauge)
Value processed from querying Datadog by metric
datadog.cluster_agent.go.goroutines
(gauge)
Number of goroutines that currently exist
datadog.cluster_agent.go.memstats.alloc_bytes
(gauge)
Number of bytes allocated and still in use
Shown as byte
datadog.cluster_agent.go.threads
(gauge)
Number of OS threads created
Shown as thread
datadog.cluster_agent.kubernetes_apiserver.emitted_events
(count)
Datadog events emitted by the kubernetes_apiserver check
datadog.cluster_agent.kubernetes_apiserver.kube_events
(count)
Kubernetes events processed by the kubernetes_apiserver check
datadog.cluster_agent.language_detection_dca_handler.processed_requests
(count)
The number of process language detection requests processed by the handler
datadog.cluster_agent.language_detection_patcher.patches
(count)
The number of patch requests sent by the patcher to the kube api server
datadog.cluster_agent.secret_backend.elapsed
(gauge)
The elapsed time of secret backend invocation
Shown as millisecond
datadog.cluster_agent.tagger.stored_entities
(gauge)
Number of entities stored in the tagger
datadog.cluster_agent.tagger.updated_entities
(count)
Number of updates made to entities in the tagger
datadog.cluster_agent.workloadmeta.events_received
(count)
Number of events received by workloadmeta
datadog.cluster_agent.workloadmeta.notifications_sent
(count)
Number of notifications sent by workloadmeta to its subscribers
datadog.cluster_agent.workloadmeta.stored_entities
(gauge)
Number of entities stored in workloadmeta
datadog.cluster_agent.workloadmeta.subscribers
(gauge)
Number of workloadmeta subscribers

Événements

L’intégration Datadog-Cluster-Agent n’inclut aucun événement.

Checks de service

datadog.cluster_agent.prometheus.health
Returns CRITICAL if the check cannot access the metrics endpoint. Returns OK otherwise.
Statuses: ok, critical

Dépannage

Besoin d’aide ? Contactez l’assistance Datadog.