Consul

Consul

Agent Check Check de l'Agent

Linux Mac OS Windows OS Supported

Dashboard Consul

Présentation

L’Agent Datadog recueille de nombreuses métriques sur les nœuds Consul, notamment pour :

  • Le nombre total de pairs Consul
  • La santé des services : le nombre de nœuds avec le statut Up, Passing, Warning ou Critical d’un service donné
  • La santé des nœuds : le nombre de services avec le statut Up, Passing, Warning ou Critical d’un nœud donné
  • Les coordonnées réseau : latences entre les centres de données et au sein de ces derniers

L’Agent Consul peut fournir davantage de métriques via DogStatsD. Ces métriques sont davantage orientées sur la santé interne de Consul, et non sur celle des services qui dépendent de Consul. Elles concernent :

  • Les événements Serf et les bagottements de membre
  • Le protocole Raft
  • Les performances DNS

Et bien plus encore.

Enfin, en plus des métriques, l’Agent Datadog envoie également un check de service pour chaque check de santé de Consul, ainsi qu’un événement après chaque nouvelle élection de leader.

Configuration

Installation

Le check Consul de l’Agent Datadog est inclus avec le package de l'Agent Datadog : vous n’avez donc rien d’autre à installer sur vos nœuds Consul.

Configuration

Host

Pour configurer ce check lorsque l’Agent est exécuté sur un host :

Collecte de métriques
  1. Modifiez le fichier consul.d/conf.yaml dans le dossier conf.d/ à la racine du répertoire de configuration de votre Agent pour commencer à recueillir vos métriques de performance Consul. Consultez le fichier d’exemple consul.d/conf.yaml pour découvrir toutes les options de configuration disponibles.

    init_config:
    
    instances:
      ## @param url - string - required
      ## Where your Consul HTTP Server Lives
      ## Point the URL at the leader to get metrics about your Consul Cluster.
      ## Remind to use https instead of http if your Consul setup is configured to do so.
      #
      - url: http://localhost:8500
    
  2. Redémarrez l’Agent.

Rechargez l’Agent Consul pour commencer à envoyer davantage de métriques Consul à DogStatsD.

Collecte de logs

Disponible à partir des versions > 6.0 de l’Agent

  1. La collecte de logs est désactivée par défaut dans l’Agent Datadog. Vous devez l’activer dans datadog.yaml avec :

    logs_enabled: true
    
  2. Ajoutez ce bloc de configuration à votre fichier consul.yaml pour commencer à recueillir vos logs Consul :

    logs:
      - type: file
        path: /var/log/consul_server.log
        source: consul
        service: myservice
    

    Modifiez les valeurs des paramètres path et service et configurez-les pour votre environnement. Consultez le fichier d’exemple consul.d/conf.yaml pour découvrir toutes les options de configuration disponibles.

  3. Redémarrez l’Agent.

Environnement conteneurisé

Consultez la documentation relative aux modèles d’intégration Autodiscovery pour découvrir comment appliquer les paramètres ci-dessous à un environnement conteneurisé.

Collecte de métriques
ParamètreValeur
<NOM_INTÉGRATION>consul
<CONFIG_INIT>vide ou {}
<CONFIG_INSTANCE>{"url": "https://%%host%%:8500"}
Collecte de logs

Disponible à partir des versions > 6.0 de l’Agent

La collecte des logs est désactivée par défaut dans l’Agent Datadog. Pour l’activer, consultez la section Collecte de logs avec Kubernetes.

ParamètreValeur
<CONFIG_LOG>{"source": "consul", "service": "<NOM_SERVICE>"}

DogStatsD

Si vous le souhaitez, vous pouvez configurer Consul de façon à ce qu’il envoie les données à l’Agent via DogStatsD au lieu de demander à l’Agent de récupérer les données auprès de Consul.

  1. Pour configurer Consul de façon à ce qu’il envoie des métriques DogStatsD, ajoutez votre dogstatsd_addr imbriqué sous la clé telemetry de premier niveau dans le fichier de configuration principal de Consul :

    {
      ...
      "telemetry": {
        "dogstatsd_addr": "127.0.0.1:8125"
      },
      ...
    }
    
  2. Pour veiller à ce que les métriques soient correctement taguées, modifiez le fichier de configuration principal de l’Agent Datadog datadog.yaml en y ajoutant les paramètres suivants :

    # dogstatsd_mapper_cache_size: 1000  # default to 1000
    dogstatsd_mapper_profiles:
      - name: consul
        prefix: "consul."
        mappings:
          - match: 'consul\.http\.([a-zA-Z]+)\.(.*)'
            match_type: "regex"
            name: "consul.http.request"
            tags:
              http_method: "$1"
              path: "$2"
          - match: 'consul\.raft\.replication\.appendEntries\.logs\.([0-9a-f-]+)'
            match_type: "regex"
            name: "consul.raft.replication.appendEntries.logs"
            tags:
              consul_node_id: "$1"
          - match: 'consul\.raft\.replication\.appendEntries\.rpc\.([0-9a-f-]+)'
            match_type: "regex"
            name: "consul.raft.replication.appendEntries.rpc"
            tags:
              consul_node_id: "$1"
          - match: 'consul\.raft\.replication\.heartbeat\.([0-9a-f-]+)'
            match_type: "regex"
            name: "consul.raft.replication.heartbeat"
            tags:
              consul_node_id: "$1"
    
  3. Redémarrez l’Agent.

OpenMetrics

Au lieu d’utiliser DogStatsD, vous pouvez activer l’option de configuration use_prometheus_endpoint pour obtenir les mêmes métriques à partir de l’endpoint Prometheus.

Remarque : utilisez soit la méthode DogStatsD, soit la méthode Prometheus. N’activez pas les deux pour la même instance.

  1. Configurez Consul de façon à ce qu’il expose des métriques à l’endpoint Prometheus. Définissez le paramètre prometheus_retention_time imbriqué sous la clé telemetry de premier niveau dans le fichier de configuration principal de Consul :

    {
      ...
      "telemetry": {
        "prometheus_retention_time": "360h"
      },
      ...
    }
    
  2. Pour commencer à utiliser l’endpoint Prometheus, modifiez le fichier consul.d/conf.yaml dans le dossier conf.d/ à la racine du répertoire de configuration de votre Agent.

    instances:
        - url: <EXAMPLE>
          use_prometheus_endpoint: true
    
  3. Redémarrez l’Agent.

Validation

Lancez la sous-commande status de l’Agent et cherchez consul dans la section Checks.

Remarque : si la journalisation de debugging est activée sur vos nœuds Consul, l’interrogation habituelle de l’Agent Datadog s’affichera dans le log Consul :

2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/status/leader (59.344us) from=127.0.0.1:53768
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/status/peers (62.678us) from=127.0.0.1:53770
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/health/state/any (106.725us) from=127.0.0.1:53772
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/catalog/services (79.657us) from=127.0.0.1:53774
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/health/service/consul (153.917us) from=127.0.0.1:53776
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/coordinate/datacenters (71.778us) from=127.0.0.1:53778
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/coordinate/nodes (84.95us) from=127.0.0.1:53780

De l’Agent Consul à DogStatsD

Utilisez netstat pour vérifier que Consul envoie également ses métriques :

$ sudo netstat -nup | grep "127.0.0.1:8125.*ESTABLISHED"
udp        0      0 127.0.0.1:53874         127.0.0.1:8125          ESTABLISHED 23176/consul

Données collectées

Métriques

consul.catalog.nodes_critical
(gauge)
[Integration] The number of nodes with service status `critical` from those registered
Shown as node
consul.catalog.nodes_passing
(gauge)
[Integration] The number of nodes with service status `passing` from those registered
Shown as node
consul.catalog.nodes_up
(gauge)
[Integration] The number of nodes
Shown as node
consul.catalog.nodes_warning
(gauge)
[Integration] The number of nodes with service status `warning` from those registered
Shown as node
consul.catalog.total_nodes
(gauge)
[Integration] The number of nodes registered in the consul cluster
Shown as node
consul.catalog.services_critical
(gauge)
[Integration] Total critical services on nodes
Shown as service
consul.catalog.services_passing
(gauge)
[Integration] Total passing services on nodes
Shown as service
consul.catalog.services_up
(gauge)
[Integration] Total services registered on nodes
Shown as service
consul.catalog.services_warning
(gauge)
[Integration] Total warning services on nodes
Shown as service
consul.catalog.services_count
(gauge)
[Integration] Metrics to count the number of services matching criteria like the service tag, node name, or status. To be queried using the `sum by` aggregator.
Shown as service
consul.net.node.latency.min
(gauge)
[Integration] Minimum latency from this node to all others
Shown as millisecond
consul.net.node.latency.p25
(gauge)
[Integration] P25 latency from this node to all others
Shown as millisecond
consul.net.node.latency.median
(gauge)
[Integration] Median latency from this node to all others
Shown as millisecond
consul.net.node.latency.p75
(gauge)
[Integration] P75 latency from this node to all others
Shown as millisecond
consul.net.node.latency.p90
(gauge)
[Integration] P90 latency from this node to all others
Shown as millisecond
consul.net.node.latency.p95
(gauge)
[Integration] P95 latency from this node to all others
Shown as millisecond
consul.net.node.latency.p99
(gauge)
[Integration] P99 latency from this node to all others
Shown as millisecond
consul.net.node.latency.max
(gauge)
[Integration] Maximum latency from this node to all others
Shown as millisecond
consul.peers
(gauge)
[Integration] The number of peers in the peer set
consul.client.rpc
(count)
[DogStatsD] [Prometheus] This increments whenever a Consul agent in client mode makes an RPC request to a Consul server. This gives a measure of how much a given agent is loading the Consul servers. This is only generated by agents in client mode, not Consul servers.
Shown as request
consul.client.rpc.failed
(count)
[DogStatsD] [Prometheus] Increments whenever a Consul agent in client mode makes an RPC request to a Consul server and fails
Shown as request
consul.http.request
(gauge)
[DogStatsD] [Prometheus] Tracks how long it takes to service the given HTTP request for the given verb and path. Using a DogStatsD mapper as described in the README, the paths are mapped to tags and do not include details like service or key names. For these paths, an underscore is present as a placeholder, for example: `http_method:GET, path:v1.kv._)`
Shown as millisecond
consul.http.request.count
(count)
[DogStatsD] [Prometheus] A count of how long it takes to service the given HTTP request for the given verb and path. It includes labels for path and method. Path does not include details like service or key names. For these paths, an underscore is present as a placeholder, for example: `path=v1.kv._)`
Shown as millisecond
consul.http.request.quantile
(gauge)
[DogStatsD] [Prometheus] A quantile of how long it takes to service the given HTTP request for the given verb and path. Includes labels for path and method. Path does not include details like service or key names. For these paths, an underscore is present as a placeholder, for example: `path=v1.kv._)`
Shown as millisecond
consul.http.request.sum
(count)
[DogStatsD] [Prometheus] The sum of how long it takes to service the given HTTP request for the given verb and path. Includes labels for path and method. Path does not include details like service or key names. For these paths, an underscore is present as a placeholder, for example: `path=v1.kv._)`
Shown as millisecond
consul.memberlist.degraded.probe
(gauge)
[DogStatsD] [Prometheus] This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate. The agent uses its own health metric as an indicator to perform this action. If its health score is low, it means that the node is healthy, and vice versa.
consul.memberlist.gossip.95percentile
(gauge)
[DogStatsD] [Prometheus] The p95 for the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.gossip.quantile
(gauge)
[DogStatsD] [Prometheus] The quantile for the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.gossip.sum
(count)
[DogStatsD] [Prometheus] The sum of the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.gossip.avg
(gauge)
[DogStatsD] [Prometheus] The avg for the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.gossip.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.memberlist.gossip
consul.memberlist.gossip.max
(gauge)
[DogStatsD] [Prometheus] The max for the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.gossip.median
(gauge)
[DogStatsD] [Prometheus] The median for the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.health.score
(gauge)
[DogStatsD] [Prometheus] This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol. This metric ranges from 0 to 8, where 0 indicates "totally healthy". For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf
consul.memberlist.msg.alive
(count)
[DogStatsD] [Prometheus] This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer.
consul.memberlist.msg.dead
(count)
[DogStatsD] [Prometheus] This metric counts the number of times a Consul agent has marked another agent to be a dead node.
Shown as message
consul.memberlist.msg.suspect
(count)
[DogStatsD] [Prometheus] The number of times a Consul agent suspects another as failed while probing during gossip protocol
consul.memberlist.probenode.95percentile
(gauge)
[DogStatsD] [Prometheus] The p95 for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.probenode.quantile
(gauge)
[DogStatsD] [Prometheus] The quantile for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.probenode.sum
(count)
[DogStatsD] [Prometheus] The sum for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.probenode.avg
(gauge)
[DogStatsD] [Prometheus] The avg for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.probenode.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.memberlist.probenode
consul.memberlist.probenode.max
(gauge)
[DogStatsD] [Prometheus] The max for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.probenode.median
(gauge)
[DogStatsD] [Prometheus] The median for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.pushpullnode.95percentile
(gauge)
[DogStatsD] [Prometheus] The p95 for the number of Consul agents that have exchanged state with this agent.
Shown as node
consul.memberlist.pushpullnode.quantile
(gauge)
[DogStatsD] [Prometheus] The quantile for the number of Consul agents that have exchanged state with this agent.
consul.memberlist.pushpullnode.sum
(count)
[DogStatsD] [Prometheus] The sum for the number of Consul agents that have exchanged state with this agent.
consul.memberlist.pushpullnode.avg
(gauge)
[DogStatsD] [Prometheus] The avg for the number of Consul agents that have exchanged state with this agent.
Shown as node
consul.memberlist.pushpullnode.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.memberlist.pushpullnode
consul.memberlist.pushpullnode.max
(gauge)
[DogStatsD] [Prometheus] The max for the number of Consul agents that have exchanged state with this agent.
Shown as node
consul.memberlist.pushpullnode.median
(gauge)
[DogStatsD] [Prometheus] The median for the number of Consul agents that have exchanged state with this agent.
Shown as node
consul.memberlist.tcp.accept
(count)
[DogStatsD] [Prometheus] This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection.
Shown as connection
consul.memberlist.tcp.connect
(count)
[DogStatsD] [Prometheus] This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent.
Shown as connection
consul.memberlist.tcp.sent
(count)
[DogStatsD] [Prometheus] This metric measures the total number of bytes sent by a Consul agent through the TCP protocol
Shown as byte
consul.memberlist.udp.received
(count)
[DogStatsD] [Prometheus] This metric measures the total number of bytes sent/received by a Consul agent through the UDP protocol.
Shown as byte
consul.memberlist.udp.sent
(count)
[DogStatsD] [Prometheus] This metric measures the total number of bytes sent/received by a Consul agent through the UDP protocol.
Shown as byte
consul.raft.state.leader
(count)
[DogStatsD] [Prometheus] The number of completed leader elections
Shown as event
consul.raft.state.candidate
(count)
[DogStatsD] [Prometheus]The number of initiated leader elections
Shown as event
consul.raft.apply
(count)
[DogStatsD] [Prometheus] The number of raft transactions occurring
Shown as transaction
consul.raft.commitTime.avg
(gauge)
[DogStatsD] [Prometheus] The average time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.commitTime.count
(count)
[DogStatsD] [Prometheus] The number of samples of raft.commitTime
consul.raft.commitTime.max
(gauge)
[DogStatsD] [Prometheus] The max time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.commitTime.median
(gauge)
[DogStatsD] [Prometheus] The median time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.commitTime.95percentile
(gauge)
[DogStatsD] [Prometheus] The p95 time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.commitTime.quantile
(gauge)
[DogStatsD] [Prometheus] The quantile time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.commitTime.sum
(count)
[DogStatsD] [Prometheus] The sum of the time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.leader.dispatchLog.avg
(gauge)
[DogStatsD] [Prometheus] The average time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.dispatchLog.count
(count)
[DogStatsD] [Prometheus] The number of samples of raft.leader.dispatchLog
consul.raft.leader.dispatchLog.max
(gauge)
[DogStatsD] [Prometheus] The max time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.dispatchLog.median
(gauge)
[DogStatsD] [Prometheus] The median time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.dispatchLog.95percentile
(gauge)
[DogStatsD] [Prometheus] The p95 time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.dispatchLog.quantile
(gauge)
[DogStatsD] [Prometheus] The quantile time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.dispatchLog.sum
(count)
[DogStatsD] [Prometheus] The sum of the time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.lastContact.avg
(gauge)
[DogStatsD] [Prometheus] The average time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.leader.lastContact.count
(count)
[DogStatsD] [Prometheus] The number of samples of raft.leader.lastContact
consul.raft.leader.lastContact.max
(gauge)
[DogStatsD] [Prometheus] The max time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.leader.lastContact.median
(gauge)
[DogStatsD] [Prometheus] The median time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.leader.lastContact.95percentile
(gauge)
[DogStatsD] [Prometheus] The p95 time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.leader.lastContact.quantile
(gauge)
[DogStatsD] [Prometheus] The quantile time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.leader.lastContact.sum
(count)
[DogStatsD] [Prometheus] The sum of the time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.replication.appendEntries.logs
(count)
[DogStatsD] [Prometheus] Measures the number of logs replicated to an agent, to bring it up to speed with the leader's logs.
Shown as entry
consul.raft.replication.appendEntries.rpc.count
(count)
[DogStatsD] [Prometheus] The count the time taken by the append entries RFC to replicate the log entries of a leader agent onto its follower agent(s)
Shown as millisecond
consul.raft.replication.appendEntries.rpc.quantile
(gauge)
[DogStatsD] [Prometheus] The quantile of the time taken by the append entries RFC to replicate the log entries of a leader agent onto its follower agent(s)
Shown as millisecond
consul.raft.replication.appendEntries.rpc.sum
(count)
[DogStatsD] [Prometheus] The sum the time taken by the append entries RFC to replicate the log entries of a leader agent onto its follower agent(s)
Shown as millisecond
consul.raft.replication.heartbeat.count
(count)
[DogStatsD] [Prometheus] The count the time taken to invoke appendEntries on a peer.
Shown as millisecond
consul.raft.replication.heartbeat.quantile
(gauge)
[DogStatsD] [Prometheus] The quantile of the time taken to invoke appendEntries on a peer.
Shown as millisecond
consul.raft.replication.heartbeat.sum
(count)
[DogStatsD] [Prometheus] The sum of the time taken to invoke appendEntries on a peer.
Shown as millisecond
consul.runtime.gc_pause_ns.95percentile
(gauge)
[DogStatsD] [Prometheus] The p95 for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.runtime.gc_pause_ns.avg
(gauge)
[DogStatsD] [Prometheus] The avg for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.runtime.gc_pause_ns.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.runtime.gc_pause_ns
consul.runtime.gc_pause_ns.max
(gauge)
[DogStatsD] [Prometheus] The max for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.runtime.gc_pause_ns.sum
(count)
[DogStatsD] [Prometheus] The sum of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.runtime.gc_pause_ns.quantile
(gauge)
[DogStatsD] [Prometheus] The quantile of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.runtime.gc_pause_ns.median
(gauge)
[DogStatsD] [Prometheus] The median for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.serf.events
(count)
[DogStatsD] [Prometheus] This increments when a Consul agent processes a serf event
Shown as event
consul.serf.coordinate.adjustment_ms.95percentile
(gauge)
[DogStatsD] [Prometheus] The p95 in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.coordinate.adjustment_ms.quantile
(gauge)
[DogStatsD] [Prometheus] The quantile in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.coordinate.adjustment_ms.sum
(count)
[DogStatsD] [Prometheus] The sum in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.coordinate.adjustment_ms.avg
(gauge)
[DogStatsD] [Prometheus] The avg in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.coordinate.adjustment_ms.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.serf.coordinate.adjustment_ms
consul.serf.coordinate.adjustment_ms.max
(gauge)
[DogStatsD] [Prometheus] The max in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.coordinate.adjustment_ms.median
(gauge)
[DogStatsD] [Prometheus] The median in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.member.flap
(count)
[DogStatsD] [Prometheus] The number of times a Consul agent is marked dead and then quickly recovers
consul.serf.member.join
(count)
[DogStatsD] [Prometheus] This increments when a Consul agent processes a join event
Shown as event
consul.serf.member.update
(count)
[DogStatsD] [Prometheus] This increments when a Consul agent updates.
consul.serf.member.failed
(count)
[DogStatsD] [Prometheus] This increments when a Consul agent is marked dead. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports.
consul.serf.member.left
(count)
[DogStatsD] [Prometheus] This increments when a Consul agent leaves the cluster.
consul.serf.msgs.received.95percentile
(gauge)
[DogStatsD] [Prometheus] The p95 for the number of serf messages received
Shown as message
consul.serf.msgs.received.quantile
(gauge)
[DogStatsD] [Prometheus] The quantile for the number of serf messages received
Shown as message
consul.serf.msgs.received.sum
(count)
[DogStatsD] [Prometheus] The sum for the number of serf messages received
Shown as message
consul.serf.msgs.received.avg
(gauge)
[DogStatsD] [Prometheus] The avg for the number of serf messages received
Shown as message
consul.serf.msgs.received.count
(count)
[DogStatsD] [Prometheus] The count of serf messages received
consul.serf.msgs.received.max
(gauge)
[DogStatsD] [Prometheus] The max for the number of serf messages received
Shown as message
consul.serf.msgs.received.median
(gauge)
[DogStatsD] [Prometheus] The median for the number of serf messages received
Shown as message
consul.serf.msgs.sent.95percentile
(gauge)
[DogStatsD] [Prometheus] The p95 for the number of serf messages sent
Shown as message
consul.serf.msgs.sent.quantile
(gauge)
[DogStatsD] [Prometheus] The quantile for the number of serf messages sent
Shown as message
consul.serf.msgs.sent.sum
(count)
[DogStatsD] [Prometheus] The sum of the number of serf messages sent
Shown as message
consul.serf.msgs.sent.avg
(gauge)
[DogStatsD] [Prometheus] The avg for the number of serf messages sent
Shown as message
consul.serf.msgs.sent.count
(count)
[DogStatsD] [Prometheus] The count of serf messages sent
consul.serf.msgs.sent.max
(gauge)
[DogStatsD] [Prometheus] The max for the number of serf messages sent
Shown as message
consul.serf.msgs.sent.median
(gauge)
[DogStatsD] [Prometheus] The median for the number of serf messages sent
Shown as message
consul.serf.queue.event.95percentile
(gauge)
[DogStatsD] [Prometheus] The p95 for the size of the serf event queue
consul.serf.queue.event.avg
(gauge)
[DogStatsD] [Prometheus] The avg size of the serf event queue
consul.serf.queue.event.count
(count)
[DogStatsD] [Prometheus] The number of items in the serf event queue
consul.serf.queue.event.max
(gauge)
[DogStatsD] [Prometheus] The max size of the serf event queue
consul.serf.queue.event.median
(gauge)
[DogStatsD] [Prometheus] The median size of the serf event queue
consul.serf.queue.intent.95percentile
(gauge)
[DogStatsD] [Prometheus] The p95 for the size of the serf intent queue
consul.serf.queue.intent.avg
(gauge)
[DogStatsD] [Prometheus] The avg size of the serf intent queue
consul.serf.queue.intent.count
(count)
[DogStatsD] [Prometheus] The number of items in the serf intent queue
consul.serf.queue.intent.max
(gauge)
[DogStatsD] [Prometheus] The max size of the serf intent queue
consul.serf.queue.intent.median
(gauge)
[DogStatsD] [Prometheus] The median size of the serf intent queue
consul.serf.queue.query.95percentile
(gauge)
[DogStatsD] [Prometheus] The p95 for the size of the serf query queue
consul.serf.queue.query.avg
(gauge)
[DogStatsD] [Prometheus] The avg size of the serf query queue
consul.serf.queue.query.count
(count)
[DogStatsD] [Prometheus] The number of items in the serf query queue
consul.serf.queue.query.max
(gauge)
[DogStatsD] [Prometheus] The max size of the serf query queue
consul.serf.queue.query.median
(gauge)
[DogStatsD] [Prometheus] The median size of the serf query queue
consul.serf.snapshot.appendline.95percentile
(gauge)
[DogStatsD] [Prometheus] The p95 of the time taken by the Consul agent to append an entry into the existing log.
Shown as millisecond
consul.serf.snapshot.appendline.avg
(gauge)
[DogStatsD] [Prometheus] The avg of the time taken by the Consul agent to append an entry into the existing log.
Shown as millisecond
consul.serf.snapshot.appendline.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.serf.snapshot.appendline
consul.serf.snapshot.appendline.max
(gauge)
[DogStatsD] [Prometheus] The max of the time taken by the Consul agent to append an entry into the existing log.
Shown as millisecond
consul.serf.snapshot.appendline.median
(gauge)
[DogStatsD] [Prometheus] The median of the time taken by the Consul agent to append an entry into the existing log.
Shown as millisecond
consul.serf.snapshot.compact.95percentile
(gauge)
[DogStatsD] [Prometheus] The p95 of the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction .
Shown as millisecond
consul.serf.snapshot.compact.avg
(gauge)
[DogStatsD] [Prometheus] The avg of the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction .
Shown as millisecond
consul.serf.snapshot.compact.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.serf.snapshot.compact
consul.serf.snapshot.compact.max
(gauge)
[DogStatsD] [Prometheus] The max of the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction .
Shown as millisecond
consul.serf.snapshot.compact.median
(gauge)
[DogStatsD] [Prometheus] The median of the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction .
Shown as millisecond

Consultez la documentation de Consul relative à la télémétrie pour obtenir la description des métriques envoyées par l’agent Consul à DogStatsD.

Consultez la documentation de Consul relative aux coordonnées réseau pour découvrir comment les métriques de latence réseau sont calculées.

Événements

consul.new_leader :
L’Agent Datadog génère un événement lorsque le cluster Consul élit un nouveau leader, et lui attribue les tags prev_consul_leader, curr_consul_leader et consul_datacenter.

Checks de service

consul.check :
L’Agent Datadog envoie un check de service pour chaque check de santé de Consul, et lui attribue les tags :

  • service:<nom> si Consul transmet un ServiceName
  • consul_service_id:<id> si Consul transmet un ServiceID

Dépannage

Besoin d’aide ? Contactez l’assistance Datadog.

Pour aller plus loin