Cilium

Supported OS Linux Windows

Integrationv2.2.1

Overview

This check monitors Cilium through the Datadog Agent. The integration can either collect metrics from the cilium-agent or cilium-operator.

Setup

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.

Installation

The Cilium check is included in the Datadog Agent package, but it requires additional setup steps to expose Prometheus metrics.

  1. In order to enable Prometheus metrics in both the cilium-agent and cilium-operator, deploy Cilium with the following Helm values set according to your version of Cilium:
    • Cilium < v1.8.x: global.prometheus.enabled=true
    • Cilium >= v1.8.x and < v1.9.x: global.prometheus.enabled=true and global.operatorPrometheus.enabled=true
    • Cilium >= 1.9.x: prometheus.enabled=true and operator.prometheus.enabled=true

Or, separately enable Prometheus metrics in the Kubernetes manifests:

  • In the cilium-agent add --prometheus-serve-addr=:9090 to the args section of the Cilium DaemonSet config:

    # [...]
    spec:
      containers:
        - args:
            - --prometheus-serve-addr=:9090
    
  • In the cilium-operator add --enable-metrics to the args section of the Cilium deployment config:

    # [...]
    spec:
      containers:
        - args:
            - --enable-metrics
    

Configuration

Host

To configure this check for an Agent running on a host:

  1. Edit the cilium.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your Cilium performance data. See the sample cilium.d/conf.yaml for all available configuration options.

    • To collect cilium-agent metrics, enable the agent_endpoint option.
    • To collect cilium-operator metrics, enable the operator_endpoint option.
        instances:
    
            ## @param use_openmetrics - boolean - optional - default: false
            ## Use the latest OpenMetrics V2 implementation for more features and better performance.
            ##
            ## Note: To see the configuration options for the legacy OpenMetrics implementation (Agent 7.33 or older),
            ## https://github.com/DataDog/integrations-core/blob/7.33.x/cilium/datadog_checks/cilium/data/conf.yaml.example
            #
          - use_openmetrics: true # Enables OpenMetrics V2
    
            ## @param agent_endpoint - string - optional
            ## The URL where your application metrics are exposed by Prometheus.
            ## By default, the Cilium integration collects `cilium-agent` metrics.
            ## One of agent_endpoint or operator_endpoint must be provided.
            #
            agent_endpoint: http://localhost:9090/metrics
    
            ## @param operator_endpoint - string - optional
            ## Provide instead of `agent_endpoint` to collect `cilium-operator` metrics.
            ## Cilium operator metrics are exposed on port 6942.
            #
            operator_endpoint: http://localhost:6942/metrics
    

    NOTE: By default, the use_openmetrics option is enabled in the conf.yaml.example. Set the use_openmetrics configuration option to false to use the OpenMetrics V1 implementation. To view the configuration parameters for OpenMetrics V1, see the conf.yaml.example file.

    You can read more about OpenMetrics V2.

  2. Restart the Agent.

Log collection

Cilium contains two types of logs: cilium-agent and cilium-operator.

  1. Collecting logs is disabled by default in the Datadog Agent. Enable it in your DaemonSet configuration:

      # (...)
        env:
        #  (...)
          - name: DD_LOGS_ENABLED
              value: "true"
          - name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
              value: "true"
      # (...)
    
  2. Mount the Docker socket to the Datadog Agent through the manifest or mount the /var/log/pods directory if you are not using Docker. For example manifests see the Kubernetes Installation instructions for DaemonSet.

  3. Restart the Agent.

Containerized

For containerized environments, see the Autodiscovery Integration Templates for guidance on applying the parameters below.

Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes Log Collection.

To collect cilium-agent metrics and logs:
  • Metric collection
ParameterValue
<INTEGRATION_NAME>"cilium"
<INIT_CONFIG>blank or {}
<INSTANCE_CONFIG>{"agent_endpoint": "http://%%host%%:9090/metrics", "use_openmetrics": "true"}
  • Log collection
ParameterValue
<LOG_CONFIG>{"source": "cilium-agent", "service": "cilium-agent"}
To collect cilium-operator metrics and logs:
  • Metric collection
ParameterValue
<INTEGRATION_NAME>"cilium"
<INIT_CONFIG>blank or {}
<INSTANCE_CONFIG>{"operator_endpoint": "http://%%host%%:6942/metrics", "use_openmetrics": "true"}
  • Log collection
ParameterValue
<LOG_CONFIG>{"source": "cilium-operator", "service": "cilium-operator"}

Validation

Run the Agent’s status subcommand and look for cilium under the Checks section.

Data Collected

Metrics

cilium.agent.api_process_time.seconds.count
(count)
[OpenMetrics V1 and V2] Count of processing time for all API calls
Shown as request
cilium.agent.api_process_time.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of processing time for all API calls
Shown as second
cilium.agent.api_process_time.seconds.bucket
(count)
[OpenMetrics V2] Amount of processing time for all API calls
Shown as second
cilium.agent.bootstrap.seconds.count
(count)
[OpenMetrics V1 and V2] Count of bootstrap durations
cilium.agent.bootstrap.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of bootstrap durations
Shown as second
cilium.agent.bootstrap.seconds.bucket
(count)
[OpenMetrics V2] Sample of bootstrap durations
Shown as second
cilium.bpf.map_ops.total
(count)
[OpenMetrics V1] Total BPF map operations performed
Shown as operation
cilium.bpf.map_ops.count
(count)
[OpenMetrics V2] Total BPF map operations performed
Shown as operation
cilium.controllers.failing.count
(gauge)
[OpenMetrics V1 and V2] Number of failing controllers
Shown as error
cilium.controllers.runs_duration.seconds.count
(count)
[OpenMetrics V1 and V2] Count of controller processes duration
Shown as operation
cilium.controllers.runs_duration.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of controller processes duration
Shown as second
cilium.controllers.runs_duration.seconds.bucket
(count)
[OpenMetrics V2] Sample of controller processes duration
Shown as second
cilium.controllers.runs.total
(count)
[OpenMetrics V1] Total number of controller runs
Shown as event
cilium.controllers.runs.count
(count)
[OpenMetrics V2] Total number of controller runs
Shown as event
cilium.datapath.conntrack_gc.duration.seconds.count
(count)
[OpenMetrics V1 and V2] Count of garbage collector process duration
Shown as operation
cilium.datapath.conntrack_gc.duration.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of garbage collector process duration
Shown as second
cilium.datapath.conntrack_gc.duration.seconds.bucket
(count)
[OpenMetrics V2] Sample of garbage collector process duration
Shown as second
cilium.datapath.conntrack_gc.entries
(gauge)
[OpenMetrics V1 and V2] The number of alive and deleted conntrack entries
Shown as garbage collection
cilium.datapath.conntrack_gc.key_fallbacks.total
(count)
[OpenMetrics V1] The total number of conntrack entries
Shown as garbage collection
cilium.datapath.conntrack_gc.key_fallbacks.count
(count)
[OpenMetrics V2] The total number of conntrack entries.
Shown as garbage collection
cilium.datapath.conntrack_gc.runs.total
(count)
[OpenMetrics V1] Total number of the conntrack garbage collector process runs
Shown as garbage collection
cilium.datapath.conntrack_gc.runs.count
(count)
[OpenMetrics V2] Total number of the conntrack garbage collector process runs
Shown as garbage collection
cilium.drop_bytes.total
(count)
[OpenMetrics V1] Total dropped bytes
Shown as byte
cilium.drop_bytes.count
(count)
[OpenMetrics V2] Total dropped bytes
Shown as byte
cilium.drop_count.total
(count)
[OpenMetrics V1] Total dropped packets
Shown as packet
cilium.drop_count.count
(count)
[OpenMetrics V2] Total dropped packets
Shown as packet
cilium.endpoint.regeneration_time_stats.seconds.count
(count)
[OpenMetrics V1 and V2] Count of endpoint regeneration time stats
Shown as operation
cilium.endpoint.regeneration_time_stats.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of endpoint regeneration time stats
Shown as second
cilium.endpoint.regeneration_time_stats.seconds.bucket
(count)
[OpenMetrics V2] Sample of endpoint regeneration time stats
Shown as second
cilium.endpoint.regenerations.count
(count)
[OpenMetrics V1 and V2] Count of completed endpoint regenerations
Shown as unit
cilium.endpoint.state
(gauge)
[OpenMetrics V1 and V2] Count of all endpoints
Shown as unit
cilium.errors_warning.total
(count)
[OpenMetrics V1] Total error warnings
Shown as error
cilium.errors_warning.count
(count)
[OpenMetrics V2] Total error warnings
Shown as error
cilium.event_timestamp
(gauge)
[OpenMetrics V1 and V2] Last timestamp of event received
Shown as time
cilium.forward_bytes.total
(count)
[OpenMetrics V1] Total forwarded bytes
Shown as byte
cilium.forward_bytes.count
(count)
[OpenMetrics V2] Total forwarded bytes
Shown as byte
cilium.forward_count.total
(count)
[OpenMetrics V1] Total forwarded packets
Shown as packet
cilium.forward_count.count
(count)
[OpenMetrics V2] Total forwarded packets
Shown as packet
cilium.fqdn.gc_deletions.total
(count)
[OpenMetrics V1] Total number of FQDNs cleaned in FQDN garbage collector job.
Shown as event
cilium.fqdn.gc_deletions.count
(count)
[OpenMetrics V2] Total number of FQDNs cleaned in FQDN garbage collector job.
Shown as event
cilium.ip_addresses.count
(gauge)
[OpenMetrics V1 and V2] Number of allocated ip_addresses
Shown as unit
cilium.ipam.events.total
(count)
[OpenMetrics V1] Number of IPAM events received by action and datapath family type
Shown as event
cilium.ipam.events.count
(count)
[OpenMetrics V2] Number of IPAM events received by action and datapath family type
Shown as event
cilium.k8s_client.api_latency_time.seconds.count
(count)
[OpenMetrics V1 and V2] Count of processed API call duration
Shown as request
cilium.k8s_client.api_latency_time.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of processed API call duration
Shown as second
cilium.k8s_client.api_latency_time.seconds.bucket
(count)
[OpenMetrics V2] Sample of processed API call duration
Shown as second
cilium.kubernetes.events_received.total
(count)
[OpenMetrics V1] Number of Kubernetes received events processed
Shown as event
cilium.kubernetes.events_received.count
(count)
[OpenMetrics V2] Number of Kubernetes received events processed
Shown as event
cilium.kubernetes.events.total
(count)
[OpenMetrics V1] Number of Kubernetes events processed
Shown as event
cilium.kubernetes.events.count
(count)
[OpenMetrics V2] Number of Kubernetes events processed
Shown as event
cilium.nodes.all_datapath_validations.total
(count)
[OpenMetrics V1] Number of validation calls to implement the datapath implementation of a node
Shown as unit
cilium.nodes.all_datapath_validations.count
(count)
[OpenMetrics V2] Number of validation calls to implement the datapath implementation of a node
Shown as unit
cilium.nodes.all_events_received.total
(count)
[OpenMetrics V1] Number of node events received.
Shown as event
cilium.nodes.all_events_received.count
(count)
[OpenMetrics V2] Number of node events received
Shown as event
cilium.nodes.managed.total
(gauge)
[OpenMetrics V1 and V2] Number of nodes managed
Shown as node
cilium.endpoint.count
(gauge)
[OpenMetrics V1 and V2] Total ready endpoints managed by agent
Shown as unit
cilium.identity.count
(gauge)
[OpenMetrics V1 and V2] Number of identities allocate.
Shown as unit
cilium.policy.count
(gauge)
[OpenMetrics V1 and V2] Number of policies currently loaded
Shown as unit
cilium.policy.import_errors.count
(count)
[OpenMetrics V1 and V2] Number of failed policy imports
Shown as error
cilium.policy.endpoint_enforcement_status
(gauge)
[OpenMetrics V1 and V2] Number of endpoints labeled by policy enforcement status
Shown as unit
cilium.policy.max_revision
(gauge)
[OpenMetrics V1 and V2] Highest policy revision number in the agent
Shown as unit
cilium.policy.regeneration_time_stats.seconds.count
(count)
[OpenMetrics V1 and V2] Policy regeneration time stats count
Shown as operation
cilium.policy.regeneration_time_stats.seconds.sum
(count)
[OpenMetrics V1 and V2] Policy regeneration time stats count
Shown as second
cilium.policy.regeneration_time_stats.seconds.bucket
(count)
[OpenMetrics V2] Policy regeneration time stats sample
Shown as second
cilium.policy.regeneration.total
(count)
[OpenMetrics V1] Total number of successful policy regenerations
Shown as unit
cilium.policy.regeneration.count
(count)
[OpenMetrics V2] Total number of successful policy regenerations
Shown as unit
cilium.process.cpu.seconds.total
(gauge)
[OpenMetrics V1] Process CPU time in seconds
Shown as second
cilium.process.cpu.seconds.count
(count)
[OpenMetrics V2] Process CPU time in seconds
Shown as second
cilium.process.max_fds
(gauge)
[OpenMetrics V1 and V2] Process file descriptor maximum
Shown as file
cilium.process.open_fds
(gauge)
[OpenMetrics V1 and V2] Number of open file descriptors
Shown as file
cilium.process.resident_memory.bytes
(gauge)
[OpenMetrics V1 and V2] Total resident memory bytes
Shown as byte
cilium.process.start_time.seconds
(gauge)
[OpenMetrics V1 and V2] Processes start time
Shown as second
cilium.process.virtual_memory.bytes
(gauge)
[OpenMetrics V1 and V2] Virtual memory bytes
Shown as byte
cilium.process.virtual_memory.max.bytes
(gauge)
[OpenMetrics V1 and V2] Maximum virtual memory bytes
Shown as byte
cilium.subprocess.start.total
(count)
[OpenMetrics V1] Number of times that Cilium has started a subprocess
Shown as unit
cilium.subprocess.start.count
(count)
[OpenMetrics V2] Number of times that Cilium has started a subprocess
Shown as unit
cilium.triggers_policy.update_call_duration.seconds.count
(count)
[OpenMetrics V1 and V2] Count of policy update trigger duration
Shown as operation
cilium.triggers_policy.update_call_duration.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of policy update trigger duration
Shown as second
cilium.triggers_policy.update_call_duration.seconds.bucket
(count)
[OpenMetrics V2] Sample of policy update trigger duration
Shown as second
cilium.triggers_policy.update_folds
(gauge)
[OpenMetrics V1 and V2] Number of folds
Shown as unit
cilium.triggers_policy.update.total
(count)
[OpenMetrics V1] Total number of policy update trigger invocations
Shown as unit
cilium.triggers_policy.update.count
(count)
[OpenMetrics V2] Total number of policy update trigger invocations
Shown as unit
cilium.unreachable.health_endpoints
(gauge)
[OpenMetrics V1 and V2] Number of health endpoints that cannot be reached
Shown as unit
cilium.unreachable.nodes
(gauge)
[OpenMetrics V1 and V2] Number of nodes that cannot be reached
Shown as node
cilium.kvstore.operations_duration.seconds.count
(count)
[OpenMetrics V1 and V2] Duration of kvstore operation count
Shown as operation
cilium.kvstore.operations_duration.seconds.sum
(count)
[OpenMetrics V1 and V2] Duration of kvstore operation sum
Shown as second
cilium.kvstore.operations_duration.seconds.bucket
(count)
[OpenMetrics V2] Duration of kvstore operation sample
Shown as second
cilium.kvstore.events_queue.seconds.count
(count)
[OpenMetrics V1 and V2] Count of duration in seconds of received event was blocked before it could be queued
cilium.kvstore.events_queue.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of duration in seconds received event was blocked before it could be queued
Shown as second
cilium.kvstore.events_queue.seconds.bucket
(count)
[OpenMetrics V2] Sum of duration in seconds received event was blocked before it could be queued
Shown as second
cilium.policy.l7_denied.total
(count)
[OpenMetrics V1] Number of total L7 denied requests/responses due to policy. Available in Cilium <= v1.7
Shown as unit
cilium.policy.l7_denied.count
(count)
[OpenMetrics V2] Number of total L7 denied requests/responses due to policy. Available in Cilium <= v1.7
Shown as unit
cilium.policy.l7_forwarded.total
(count)
[OpenMetrics V1] Number of total L7 forwarded requests/responses. Available in Cilium <= v1.7
Shown as unit
cilium.policy.l7_forwarded.count
(count)
[OpenMetrics V2] Number of total L7 forwarded requests/response. Available in Cilium <= v1.7
Shown as unit
cilium.policy.l7_parse_errors.total
(count)
[OpenMetrics V1] Number of total L7 parse errors. Available in Cilium <= v1.7
Shown as error
cilium.policy.l7_parse_errors.count
(count)
[OpenMetrics V2] Number of total L7 parse errors. Available in Cilium <= v1.7
Shown as error
cilium.policy.l7_received.total
(count)
[OpenMetrics V1] Number of total L7 received requests/responses. Available in Cilium <= v1.7
Shown as unit
cilium.policy.l7_received.count
(count)
[OpenMetrics V2] Number of total L7 received requests/responses. Available in Cilium <= v1.7
Shown as unit
cilium.datapath.errors.total
(count)
[OpenMetrics V1] Total number of errors in datapath management. Available in Cilium <= v1.9
Shown as error
cilium.datapath.errors.count
(count)
[OpenMetrics V2] Total number of errors in datapath management. Available in Cilium <= v1.9
Shown as error
cilium.k8s_client.api_calls.count
(count)
[OpenMetrics V1 and V2] Number of API calls made to kube-apiserver. Available in Cilium v1.10+
Shown as request
cilium.operator.process.cpu.seconds
(count)
[OpenMetrics V1] Total user and system CPU time spent in seconds
Shown as second
cilium.operator.process.cpu.seconds.count
(count)
[OpenMetrics V2] Total user and system CPU time spent in seconds
Shown as second
cilium.operator.process.max_fds
(gauge)
[OpenMetrics V1 and V2] Maximum number of open file descriptors
Shown as file
cilium.operator.process.open_fds
(gauge)
[OpenMetrics V1 and V2] Number of open file descriptors
Shown as file
cilium.operator.process.resident_memory.bytes
(gauge)
[OpenMetrics V1 and V2] Resident memory size in bytes
Shown as byte
cilium.operator.process.start_time.seconds
(gauge)
[OpenMetrics V1 and V2] Start time of the process since unix epoch in seconds
Shown as second
cilium.operator.process.virtual_memory.bytes
(gauge)
[OpenMetrics V1 and V2] Virtual memory size in bytes
Shown as byte
cilium.operator.process.virtual_memory_max.bytes
(gauge)
[OpenMetrics V1 and V2] Maximum amount of virtual memory available in bytes
Shown as byte
cilium.operator.eni.deficit_resolver.duration.seconds.count
(count)
[OpenMetrics V1 and V2] Count of duration of deficit resolver trigger runs
Shown as operation
cilium.operator.eni.deficit_resolver.duration.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of duration of deficit resolver trigger runs
Shown as second
cilium.operator.eni.deficit_resolver.duration.seconds.bucket
(count)
[OpenMetrics V2] Sample of duration of deficit resolver trigger runs
Shown as second
cilium.operator.eni.deficit_resolver.folds
(gauge)
[OpenMetrics V1 and V2] Current level of deficit resolver folding
Shown as unit
cilium.operator.eni.deficit_resolver.latency.seconds.count
(count)
[OpenMetrics V1 and V2] Count of latency between deficit resolver queue and trigger run
Shown as operation
cilium.operator.eni.deficit_resolver.latency.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of latency between deficit resolver queue and trigger run
Shown as second
cilium.operator.eni.deficit_resolver.latency.seconds.bucket
(count)
[OpenMetrics V2] Sample of latency between deficit resolver queue and trigger run
Shown as second
cilium.operator.eni.deficit_resolver.queued.total
(gauge)
[OpenMetrics V1] Number of queued deficit resolver triggers
Shown as event
cilium.operator.eni.deficit_resolver.queued.count
(count)
[OpenMetrics V2] Number of queued deficit resolver triggers
Shown as event
cilium.operator.eni.available
(gauge)
[OpenMetrics V2] Number of available IPs per subnet ID. Available in Cilium <= v1.8
Shown as unit
cilium.operator.eni.available.ips_per_subnet
(gauge)
[OpenMetrics V1 and V2] Number of available IPs per subnet ID. Available in Cilium <= v1.8
Shown as unit
cilium.operator.eni.aws_api_duration.seconds.count
(count)
[OpenMetrics V1 and V2] Count of duration of interactions with AWS API. Available in Cilium <= v1.8
Shown as request
cilium.operator.eni.aws_api_duration.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of duration of interactions with AWS API. Available in Cilium <= v1.8
Shown as second
cilium.operator.eni.aws_api_duration.seconds.bucket
(count)
[OpenMetrics V2] Sample of duration of interactions with AWS API. Available in Cilium <= v1.8
Shown as second
cilium.operator.eni_ec2.rate_limit.duration.seconds.count
(count)
[OpenMetrics V1 and V2] Count of duration of client-side rate limiter blocking. Available in Cilium <= v1.9
Shown as request
cilium.operator.eni_ec2.rate_limit.duration.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of duration of client-side rate limiter blocking. Available in Cilium <= v1.9
Shown as second
cilium.operator.eni_ec2.rate_limit.duration.seconds.bucket
(count)
[OpenMetrics V2] Sample of duration of client-side rate limiter blocking. Available in Cilium <= v1.9
Shown as second
cilium.operator.eni.ec2_resync.duration.seconds.count
(count)
[OpenMetrics V1 and V2] Count of duration of ec2 resync trigger runs. Available in Cilium <= v1.9
Shown as operation
cilium.operator.eni.ec2_resync.duration.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of duration of ec2 resync trigger runs. Available in Cilium <= v1.9
Shown as second
cilium.operator.eni.ec2_resync.duration.seconds.bucket
(count)
[OpenMetrics V2] Sample of duration of ec2 resync trigger runs. Available in Cilium <= v1.9
Shown as second
cilium.operator.eni.ec2_resync.folds
(gauge)
[OpenMetrics V1 and V2] Current level of ec2 resync folding. Available in Cilium <= v1.9
Shown as unit
cilium.operator.eni.ec2_resync.latency.seconds.count
(count)
[OpenMetrics V1 and V2] Count of latency between ec2 resync queue and trigger run. Available in Cilium <= v1.9
Shown as operation
cilium.operator.eni.ec2_resync.latency.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of latency between ec2 resync queue and trigger run. Available in Cilium <= v1.9
Shown as second
cilium.operator.eni.ec2_resync.latency.seconds.bucket
(count)
[OpenMetrics V2] Sample of latency between ec2 resync queue and trigger run. Available in Cilium <= v1.9
Shown as second
cilium.operator.eni.ec2_resync.queued.total
(gauge)
[OpenMetrics V1] Number of queued ec2 resync triggers. Available in Cilium <= v1.9
Shown as unit
cilium.operator.eni.ec2_resync.queued.count
(count)
[OpenMetrics V2] Number of queued ec2 resync triggers. Available in Cilium <= v1.9
Shown as unit
cilium.operator.eni.interface_creation_ops
(count)
[OpenMetrics V1] Number of ENIs allocated. Available in Cilium <= v1.9
Shown as operation
cilium.operator.eni.interface_creation_ops.count
(count)
[OpenMetrics V2] Number of ENIs allocated. Available in Cilium <= v1.9
Shown as operation
cilium.operator.eni.ips.total
(gauge)
[OpenMetrics V1 and V2] Number of IPs allocated. Available in Cilium <= v1.9
Shown as unit
cilium.operator.eni.k8s_sync.duration.seconds.count
(count)
[OpenMetrics V1 and V2] Count of duration of k8s sync trigger run. Available in Cilium <= v1.9
Shown as operation
cilium.operator.eni.k8s_sync.duration.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of duration of k8s sync trigger run. Available in Cilium <= v1.9
Shown as second
cilium.operator.eni.k8s_sync.duration.seconds.bucket
(count)
[OpenMetrics V2] Sample of duration of k8s sync trigger run. Available in Cilium <= v1.9
Shown as second
cilium.operator.eni.k8s_sync.folds
(gauge)
[OpenMetrics V1 and V2] Current level of k8s sync folding. Available in Cilium <= v1.9
Shown as second
cilium.operator.eni.k8s_sync.latency.seconds.count
(count)
[OpenMetrics V1 and V2] Count of duration of k8s sync latency between queue and trigger run. Available in Cilium <= v1.9
Shown as operation
cilium.operator.eni.k8s_sync.latency.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of duration of k8s sync latency between queue and trigger run. Available in Cilium <= v1.9
Shown as second
cilium.operator.eni.k8s_sync.latency.seconds.bucket
(count)
[OpenMetrics V2] Sample of duration of k8s sync latency between queue and trigger run. Available in Cilium <= v1.9
Shown as second
cilium.operator.eni.k8s_sync.queued.total
(gauge)
[OpenMetrics V1] Number of queued k8s sync triggers. Available in Cilium <= v1.9
Shown as unit
cilium.operator.eni.k8s_sync.queued.count
(count)
[OpenMetrics V2] Number of queued k8s sync triggers. Available in Cilium <= v1.9
Shown as unit
cilium.operator.eni.nodes.total
(gauge)
[OpenMetrics V1] Number of nodes by category. Available in Cilium <= v1.9
Shown as node
cilium.operator.eni.resync.total
(count)
[OpenMetrics V1] Number of resync operations to synchronize AWS EC2 metadata. Available in Cilium <= v1.9
Shown as unit
cilium.operator.eni.resync.count
(count)
[OpenMetrics V2] Number of resync operations to synchronize AWS EC2 metadata. Available in Cilium <= v1.9
Shown as unit
cilium.operator.ipam.ips
(gauge)
[OpenMetrics V1 and V2] Number of IPs allocated. Available in Cilium v1.8+
Shown as unit
cilium.operator.ipam.allocation_ops
(count)
[OpenMetrics V1] Count of IP allocation operations. Available in Cilium v1.8+
Shown as operation
cilium.operator.ipam.allocation_ops.count
(count)
[OpenMetrics V2] Count of IP allocation operations. Available in Cilium v1.8+
Shown as operation
cilium.operator.ipam.release_ops
(count)
[OpenMetrics V1] Count of IP release operations. Available in Cilium v1.8+
Shown as operation
cilium.operator.ipam.release_ops.count
(count)
[OpenMetrics V2] Count of IP release operations. Available in Cilium v1.8+
Shown as operation
cilium.operator.ipam.interface_creation_ops
(count)
[OpenMetrics V1] Count of interfaces allocated. Available in Cilium v1.8+
Shown as operation
cilium.operator.ipam.interface_creation_ops.count
(count)
[OpenMetrics V2] Count of interfaces allocated. Available in Cilium v1.8+
Shown as operation
cilium.operator.ipam.available
(gauge)
[OpenMetrics V1 and V2] Number of interfaces with addresses available. Available in Cilium v1.8+
Shown as unit
cilium.operator.ipam.available.ips_per_subnet
(gauge)
[OpenMetrics V1 and V2] Number of available IPs per subnet ID. Available in Cilium v1.8+
Shown as unit
cilium.operator.ipam.deficit_resolver.duration.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of duration of deficit resolver trigger runs. Available in Cilium v1.8+
Shown as second
cilium.operator.ipam.deficit_resolver.duration.seconds.count
(count)
[OpenMetrics V1 and V2] Count of duration of deficit resolver trigger runs. Available in Cilium v1.8+
Shown as request
cilium.operator.ipam.deficit_resolver.duration.seconds.bucket
(count)
[OpenMetrics V2] Sample of duration of deficit resolver trigger runs. Available in Cilium v1.8+
Shown as second
cilium.operator.ipam.deficit_resolver.queued.count
(count)
[OpenMetrics V2] Number of queued triggers. Available in Cilium v1.8+
Shown as unit
cilium.operator.ipam.deficit_resolver.queued.total
(count)
[OpenMetrics V1] Number of queued triggers. Available in Cilium v1.8+
Shown as unit
cilium.operator.ipam.deficit_resolver.folds
(gauge)
[OpenMetrics V1 and V2] Current level of deficit resolver folding. Available in Cilium v1.8+
Shown as unit
cilium.operator.ipam.deficit_resolver.latency.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of deficit resolver latency between queue and trigger run. Available in Cilium v1.8+
Shown as second
cilium.operator.ipam.deficit_resolver.latency.seconds.count
(count)
[OpenMetrics V1 and V2] Count of deficit resolver latency between queue and trigger run. Available in Cilium v1.8+
Shown as operation
cilium.operator.ipam.deficit_resolver.latency.seconds.bucket
(count)
[OpenMetrics V2] Sample of deficit resolver latency between queue and trigger run. Available in Cilium v1.8+
Shown as second
cilium.operator.ipam.k8s_sync.duration.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of duration of K8s sync trigger runs. Available in Cilium v1.8+
Shown as second
cilium.operator.ipam.k8s_sync.duration.seconds.count
(count)
[OpenMetrics V1 and V2] Count of duration of K8s sync trigger runs. Available in Cilium v1.8+
Shown as request
cilium.operator.ipam.k8s_sync.duration.seconds.bucket
(count)
[OpenMetrics V2] Sample of duration of K8s sync trigger runs. Available in Cilium v1.8+
Shown as second
cilium.operator.ipam.k8s_sync.queued.total
(count)
[OpenMetrics V1] Number of queued k8s sync triggers. Available in Cilium v1.8+
Shown as unit
cilium.operator.ipam.k8s_sync.queued.count
(count)
[OpenMetrics V2] Number of queued k8s sync triggers. Available in Cilium v1.8+
Shown as unit
cilium.operator.ipam.k8s_sync.folds
(gauge)
[OpenMetrics V1 and V2] Current level of K8s sync folding. Available in Cilium v1.8+
Shown as unit
cilium.operator.ipam.k8s_sync.latency.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of K8s sync latency between queue and trigger run. Available in Cilium v1.8+
Shown as second
cilium.operator.ipam.k8s_sync.latency.seconds.count
(count)
[OpenMetrics V1 and V2] Count of K8s sync latency between queue and trigger run. Available in Cilium v1.8+
Shown as operation
cilium.operator.ipam.k8s_sync.latency.seconds.bucket
(count)
[OpenMetrics V2] Count of K8s sync latency between queue and trigger run. Available in Cilium v1.8+
Shown as second
cilium.operator.ipam.nodes
(gauge)
[OpenMetrics V1 and V2] Number of nodes by category. Available in Cilium v1.8+
Shown as node
cilium.operator.ipam.api.duration.seconds.sum
(count)
[OpenMetricsV1 and V2] Duration of interactions with external IPAM API. Available in Cilium v1.9+
Shown as second
cilium.operator.ipam.api.duration.seconds.count
(count)
[OpenMetricsV1 and V2] Duration of interactions with external IPAM API. Available in Cilium v1.9+
Shown as request
cilium.operator.ipam.api.duration.seconds.bucket
(count)
[OpenMetricsV2] Duration of interactions with external IPAM API. Available in Cilium v1.9+
Shown as second
cilium.operator.ipam.api.rate_limit.duration.seconds.sum
(count)
[OpenMetricsV1 and V2] Sum of duration of rate limiting while accessing external IPAM API. Available in Cilium v1.9+
Shown as second
cilium.operator.ipam.api.rate_limit.duration.seconds.count
(count)
[OpenMetricsV1 and V2] Count of duration of rate limiting while accessing external IPAM API. Available in Cilium v1.9+
Shown as request
cilium.operator.ipam.api.rate_limit.duration.seconds.bucket
(count)
[OpenMetrics V2] Sample of duration of rate limiting while accessing external IPAM API. Available in Cilium v1.9+
Shown as second
cilium.operator.ipam.resync.duration.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of duration of resync trigger runs. Available in Cilium v1.9+
Shown as second
cilium.operator.ipam.resync.duration.seconds.count
(count)
[OpenMetrics V1 and V2] Count of duration of resync trigger runs. Available in Cilium v1.9+
Shown as request
cilium.operator.ipam.resync.duration.seconds.bucket
(count)
[OpenMetrics V2] Sample of duration of resync trigger runs. Available in Cilium v1.9+
Shown as second
cilium.operator.ipam.resync.total
(count)
[OpenMetrics V1] Number of resync operations to synchronize and resolve IP deficit of nodes. Available in Cilium v1.8+
Shown as operation
cilium.operator.ipam.resync.count
(count)
[OpenMetrics V2] Number of resync operations to synchronize and resolve IP deficit of nodes. Available in Cilium v1.8+
Shown as operation
cilium.operator.ipam.resync.queued.total
(count)
[OpenMetrics V1] Number of IPAM queued triggers. Available in Cilium v1.9+
Shown as unit
cilium.operator.ipam.resync.queued.count
(count)
[OpenMetrics V2] Number of IPAM queued triggers. Available in Cilium v1.9+
Shown as unit
cilium.operator.ipam.resync.folds
(gauge)
[OpenMetrics V1 and V2] Current level of resync folding. Available in Cilium v1.9+
Shown as unit
cilium.operator.ipam.resync.latency.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of resync latency between queue and trigger run. Available in Cilium v1.9+
Shown as second
cilium.operator.ipam.resync.latency.seconds.count
(count)
[OpenMetrics V1 and V2] Count of resync latency between queue and trigger run. Available in Cilium v1.9+
Shown as operation
cilium.operator.ipam.resync.latency.seconds.bucket
(count)
[OpenMetrics V2] Sample of resync latency between queue and trigger run. Available in Cilium v1.9+
Shown as second
cilium.operator.ec2.api.duration.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of duration of interactions with API. Available in Cilium v1.9+
Shown as second
cilium.operator.ec2.api.duration.seconds.count
(count)
[OpenMetrics V1 and V2] Count of duration of interactions with API. Available in Cilium v1.9+
Shown as request
cilium.operator.ec2.api.duration.seconds.bucket
(count)
[OpenMetrics V2] Sample of duration of interactions with API. Available in Cilium v1.9+
Shown as second
cilium.operator.ec2.api.rate_limit.duration.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of duration of client-side rate limiter blocking. Available in Cilium v1.9+
Shown as second
cilium.operator.ec2.api.rate_limit.duration.seconds.count
(count)
[OpenMetrics V1 and V2] Count of duration of client-side rate limiter blocking. Available in Cilium v1.9+
Shown as request
cilium.operator.ec2.api.rate_limit.duration.seconds.bucket
(count)
[OpenMetrics V2] Sample of duration of client-side rate limiter blocking. Available in Cilium v1.9+
Shown as second
cilium.operator.ces.queueing_delay.seconds.sum
(count)
[OpenMetrics V1 and V2] Sum of CiliumEndpointSlice queueing delay in seconds. Available in Cilium v1.11+
Shown as second
cilium.operator.ces.queueing_delay.seconds.count
(count)
[OpenMetrics V1 and V2] Count of CiliumEndpointSlice queueing delay in seconds. Available in Cilium v1.11+
Shown as unit
cilium.operator.ces.queueing_delay.seconds.bucket
(count)
[OpenMetrics V2] Sample of CiliumEndpointSlice queueing delay in seconds. Available in Cilium v1.11+
Shown as second
cilium.operator.ces.sync_errors.total
(count)
[OpenMetrics V1] Number of CES sync errors. Available in Cilium v1.11+
Shown as error
cilium.operator.ces.sync_errors.count
(count)
[OpenMetrics V2] Number of CES sync errors. Available in Cilium v1.11+
Shown as error
cilium.operator.identity_gc.entries
(gauge)
[OpenMetrics V1 and V2] The number of alive and deleted identities at the end of a garbage collector run. Available in Cilium v1.11+
Shown as garbage collection
cilium.operator.identity_gc.runs
(gauge)
[OpenMetrics V1 and V2] The number of times identity garbage collector has run. Available in Cilium v1.11+
Shown as garbage collection
cilium.operator.num_ceps_per_ces.sum
(count)
[OpenMetrics V1 and V2] Sum of CEPs batched in a CES. Available in Cilium v1.11+
Shown as unit
cilium.operator.num_ceps_per_ces.count
(count)
[OpenMetrics V1 and V2] Count of CEPs batched in a CES. Available in Cilium v1.11+
Shown as unit
cilium.operator.num_ceps_per_ces.bucket
(count)
[OpenMetrics V2] Sample of CEPs batched in a CES. Available in Cilium v1.11+
Shown as unit

Events

The Cilium integration does not include any events.

Service Checks

cilium.prometheus.health
Returns CRITICAL if the check cannot access the metrics endpoint. Returns OK otherwise.
Statuses: ok, critical

cilium.openmetrics.health
Returns CRITICAL if the Agent is unable to connect to the OpenMetrics endpoint, otherwise returns OK.
Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog support.