Cilium
セキュリティモニタリングが使用可能です セキュリティモニタリングが使用可能です

Cilium

Agent Check Agentチェック

Supported OS: Linux Mac OS Windows

概要

このチェックは、Datadog Agent を通じて Cilium を監視します。このインテグレーションにより、cilium-agent または cilium-operator からメトリクスを収集できます。

セットアップ

ホストで実行されている Agent 用にこのチェックをインストールおよび構成する場合は、以下の手順に従ってください。コンテナ環境の場合は、オートディスカバリーのインテグレーションテンプレートのガイドを参照してこの手順を行ってください。

インストール

Cilium チェックは Datadog Agent パッケージに含まれていますが、Prometheus のメトリクスを公開するための追加のセットアップが必要です。

  1. Prometheus のメトリクスを cilium-agentcilium-operator に対して有効にするには、Helm の値を global.prometheus.enabled=true に設定して Cilium をデプロイするか、次の手順に従います。

  2. Prometheus のメトリクスを個別に有効にする

    • cilium-agent で、Cilium DaemonSet コンフィギュレーションの args セクションに --prometheus-serve-addr=:9090 を追加します。

      # [...]
      spec:
      containers:
       - args:
           - --prometheus-serve-addr=:9090
    • または、cilium-operator で、Cilium デプロイコンフィギュレーションの args セクションに --enable-metrics を追加します。

      # [...]
      spec:
      containers:
       - args:
           - --enable-metrics

コンフィギュレーション

ホスト

  1. Agent のコンフィギュレーションディレクトリのルートにある conf.d/ フォルダーの cilium.d/conf.yaml ファイルを編集し、Cilium のパフォーマンスデータを収集します。使用可能なすべてのコンフィギュレーションオプションについては、cilium.d/conf.yaml のサンプルを参照してください。

    • cilium-agent メトリクスを収集するには、agent_endpoint オプションを有効にします。
    • cilium-operator メトリクスを収集するには、operator_endpoint オプションを有効にします。
  2. Agent を再起動します

ログの収集

Cilium には cilium-agentcilium-operator の 2 種類のログがあります。

  1. Datadog Agent で、ログの収集はデフォルトで無効になっています。以下のように、DaemonSet コンフィギュレーションでこれを有効にします。

     # (...)
       env:
       #  (...)
         - name: DD_LOGS_ENABLED
             value: "true"
         - name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
             value: "true"
     # (...)
  2. こちらのマニフェストのように、Docker ソケットを Datadog Agent にマウントします。Docker を使用していない場合は /var/log/pods ディレクトリをマウントします。

  3. Agent を再起動します

コンテナ化

コンテナ環境の場合は、オートディスカバリーのインテグレーションテンプレートのガイドを参照して、次のパラメーターを適用してください。

メトリクスの収集
パラメーター
<INTEGRATION_NAME>cilium
<INIT_CONFIG>空白または {}
<INSTANCE_CONFIG>{"agent_endpoint": "http://%%host%%:9090/metrics"}
ログの収集

Datadog Agent で、ログの収集はデフォルトで無効になっています。有効にする方法については、Kubernetes ログ収集のドキュメントを参照してください。

パラメーター
<LOG_CONFIG>{"source": "cilium-agent", "service": "cilium-agent"}

検証

Agent の status サブコマンドを実行し、Checks セクションで cilium を探します。

収集データ

メトリクス

cilium.agent.api_process_time.seconds.count
(count)
Count of processing time for all API calls
Shown as request
cilium.agent.api_process_time.seconds.sum
(gauge)
Sum of processing time for all API calls
Shown as second
cilium.agent.bootstrap.seconds.count
(count)
Count of bootstrap durations
cilium.agent.bootstrap.seconds.sum
(gauge)
Sum of bootstrap durations
Shown as second
cilium.bpf.map_ops.total
(count)
Total BPF map operations performed
Shown as operation
cilium.controllers.failing.count
(count)
Number of failing controllers
Shown as error
cilium.controllers.runs_duration.seconds.count
(count)
Count of controller processes duration
Shown as operation
cilium.controllers.runs_duration.seconds.sum
(gauge)
Sum of controller processes duration
Shown as second
cilium.controllers.runs.total
(count)
Total number of controller runs
Shown as event
cilium.datapath.conntrack_gc.duration.seconds.count
(count)
Count of garbage collector process duration
Shown as operation
cilium.datapath.conntrack_gc.duration.seconds.sum
(gauge)
Sum of garbage collector process duration
Shown as second
cilium.datapath.conntrack_gc.entries
(gauge)
The number of alive and deleted conntrack entries
Shown as garbage collection
cilium.datapath.conntrack_gc.key_fallbacks.total
(count)
The total number of conntrack entries
Shown as garbage collection
cilium.datapath.conntrack_gc.runs.total
(count)
Total number of the conntrack garbage collector process runs
Shown as garbage collection
cilium.datapath.errors.total
(count)
Total number of errors in datapath management
Shown as error
cilium.drop_bytes.total
(count)
Total dropped bytes
Shown as byte
cilium.drop_count.total
(count)
Total dropped packets
Shown as packet
cilium.endpoint.count
(count)
Total ready endpoints managed by agent
Shown as unit
cilium.endpoint.regeneration_time_stats.seconds.count
(count)
Count of endpoint regeneration time stats
Shown as operation
cilium.endpoint.regeneration_time_stats.seconds.sum
(gauge)
Sum of endpoint regeneration time stats
Shown as second
cilium.endpoint.regenerations.count
(count)
Count of completed endpoint regenerations
Shown as unit
cilium.endpoint.state
(gauge)
Count of all endpoints
Shown as unit
cilium.errors_warning.total
(count)
Total error warnings
Shown as error
cilium.event_timestamp
(gauge)
Last timestamp of event received
Shown as time
cilium.forward_bytes.total
(count)
Total forwarded bytes
Shown as byte
cilium.forward_count.total
(count)
Total forwarded packets
Shown as packet
cilium.fqdn.gc_deletions.total
(count)
Total number of FQDNs cleaned in FQDN garbage collector job
Shown as event
cilium.identity.count
(gauge)
Number of identities allocated
Shown as unit
cilium.ip_addresses.count
(gauge)
Number of allocated ip_addresses
Shown as unit
cilium.ipam.events.total
(count)
Number of IPAM events received by action and datapath family type
Shown as event
cilium.k8s_client.api_calls.count
(count)
Number of API calls made to kube-apiserver
Shown as request
cilium.k8s_client.api_latency_time.seconds.count
(count)
Count of processed API call duration
Shown as request
cilium.k8s_client.api_latency_time.seconds.sum
(gauge)
Sum of processed API call duration
Shown as second
cilium.kubernetes.events_received.total
(count)
Number of Kubernetes received events processed
Shown as event
cilium.kubernetes.events.total
(count)
Number of Kubernetes events processed
Shown as event
cilium.nodes.all_datapath_validations.total
(count)
Number of validation calls to implement the datapath implemention of a node
Shown as unit
cilium.nodes.all_events_received.total
(count)
Number of node events received
Shown as event
cilium.nodes.managed.total
(gauge)
Number of nodes managed
Shown as node
cilium.policy.count
(gauge)
Number of policies currently loaded
Shown as unit
cilium.policy.endpoint_enforcement_status
(gauge)
Number of endpoints labeled by polict enforcement status
Shown as unit
cilium.policy.import_errors.count
(count)
Number of failed policy imports
Shown as error
cilium.policy.l7_denied.total
(count)
Number of total L7 denied requests/responses due to policy
Shown as unit
cilium.policy.l7_forwarded.total
(count)
Number of total L7 forwarded requests/responses
Shown as unit
cilium.policy.l7_parse_errors.total
(count)
Number of total L7 parse errors
Shown as error
cilium.policy.l7_received.total
(count)
Number of total L7 received requests/responses
Shown as unit
cilium.policy.max_revision
(gauge)
Highest policy revision number in the agent
Shown as unit
cilium.policy.regeneration_time_stats.seconds.count
(count)
Policy regeneration time stats count
Shown as operation
cilium.policy.regeneration_time_stats.seconds.sum
(gauge)
Policy regeneration time stats count
Shown as second
cilium.policy.regeneration.total
(count)
Total number of successful policy regenerations
Shown as unit
cilium.process.cpu.seconds.total
(gauge)
Process CPU time in seconds
Shown as second
cilium.process.max_fds
(gauge)
Process file descriptor maximum
Shown as file
cilium.process.open_fds
(gauge)
Number of open file descriptors
Shown as file
cilium.process.resident_memory.bytes
(gauge)
Total resident memory bytes
Shown as byte
cilium.process.start_time.seconds
(gauge)
Processes start time
Shown as second
cilium.process.virtual_memory.bytes
(gauge)
Virtual memory bytes
Shown as byte
cilium.process.virtual_memory.max.bytes
(gauge)
Maximum virtual memory bytes
Shown as byte
cilium.subprocess.start.total
(count)
Number of times that Cilium has started a subprocess
Shown as unit
cilium.triggers_policy.update_call_duration.seconds.count
(count)
Count of policy update trigger duration
Shown as operation
cilium.triggers_policy.update_call_duration.seconds.sum
(gauge)
Sum of policy update trigger duration
Shown as second
cilium.triggers_policy.update_folds
(gauge)
Number of folds
Shown as unit
cilium.triggers_policy.update.total
(count)
Total number of policy update trigger invocations
Shown as unit
cilium.unreachable.health_endpoints
(gauge)
Number of health endpoints that cannot be reached
Shown as unit
cilium.unreachable.nodes
(gauge)
Number of nodes that cannot be reached
Shown as node
cilium.operator.process.cpu.seconds
(count)
Total user and system CPU time spent in seconds
Shown as second
cilium.operator.process.max_fds
(gauge)
Maximum number of open file descriptors
Shown as file
cilium.operator.process.open_fds
(gauge)
Number of open file descriptors
Shown as file
cilium.operator.process.resident_memory.bytes
(gauge)
Resident memory size in bytes
Shown as byte
cilium.operator.process.start_time.second
(gauge)
Start time of the process since unix epoch in seconds
Shown as second
cilium.operator.process.virtual_memory.bytes
(gauge)
Virtual memory size in bytes
Shown as byte
cilium.operator.process.virtual_memory_max.bytes
(gauge)
Maximum amount of virtual memory available in bytes
Shown as byte
cilium.kvstore.operations_duration.seconds.count
(count)
Duration of kvstore operation count
Shown as operation
cilium.kvstore.operations_duration.seconds.sum
(gauge)
Duration of kvstore operation sum
Shown as second
cilium.kvstore.events_queue.seconds.count
(count)
Count of duration in seconds of received event was blocked before it could be queued
cilium.kvstore.events_queue.seconds.sum
(gauge)
Sum of duration in seconds received event was blocked before it could be queued
Shown as second
cilium.operator.eni.available
(gauge)
Number of ENI with addresses available
Shown as unit
cilium.operator.eni.available.ips_per_subnet
(gauge)
Number of available IPs per subnet ID
Shown as unit
cilium.operator.eni.aws_api_duration.seconds.count
(count)
Count of duration of interactions with AWS API
Shown as request
cilium.operator.eni.aws_api_duration.seconds.sum
(gauge)
Sum of duration of interactions with AWS API
Shown as second
cilium.operator.eni.deficit_resolver.duration.seconds.count
(count)
Count of duration of deficit resolver trigger runs
Shown as operation
cilium.operator.eni.deficit_resolver.duration.seconds.sum
(gauge)
Sum of duration of deficit resolver trigger runs
Shown as second
cilium.operator.eni.deficit_resolver.folds
(gauge)
Current level of deficit resolver folding
Shown as unit
cilium.operator.eni.deficit_resolver.latency.seconds.count
(count)
Count of latency between deficit resolver queue and trigger run
Shown as operation
cilium.operator.eni.deficit_resolver.latency.seconds.sum
(gauge)
Sum of latency between deficit resolver queue and trigger run
Shown as second
cilium.operator.eni.deficit_resolver.queued.total
(gauge)
Number of queued deficit resolver triggers
Shown as event
cilium.operator.eni.ec2_resync.duration.seconds.count
(count)
Count of duration of ec2 resync trigger runs
Shown as operation
cilium.operator.eni.ec2_resync.duration.seconds.sum
(gauge)
Sum of duration of ec2 resync trigger runs
Shown as second
cilium.operator.eni.ec2_resync.folds
(gauge)
Current level of ec2 resync folding
Shown as unit
cilium.operator.eni.ec2_resync.latency.seconds.count
(count)
Count of latency between ec2 resync queue and trigger run
Shown as operation
cilium.operator.eni.ec2_resync.latency.seconds.sum
(gauge)
Sum of latency between ec2 resync queue and trigger run
Shown as second
cilium.operator.eni.ec2_resync.queued.total
(gauge)
Number of queued ec2 resync triggers
Shown as unit
cilium.operator.eni.interface_creation_ops
(count)
Number of ENIs allocated
Shown as operation
cilium.operator.eni.ips.total
(gauge)
Number of IPs allocated
Shown as unit
cilium.operator.eni.k8s_sync.duration.seconds.count
(count)
Count of duration of k8s sync trigger run
Shown as operation
cilium.operator.eni.k8s_sync.duration.seconds.sum
(gauge)
Sum of duration of k8s sync trigger run
Shown as second
cilium.operator.eni.k8s_sync.folds
(gauge)
Current level of k8s sync folding
Shown as second
cilium.operator.eni.k8s_sync.latency.seconds.count
(count)
Count of duration of k8s sync latency between queue and trigger run
Shown as operation
cilium.operator.eni.k8s_sync.latency.seconds.sum
(gauge)
Sum of duration of k8s sync latency between queue and trigger run
Shown as second
cilium.operator.eni.k8s_sync.queued.total
(gauge)
Number of queued k8s sync triggers
Shown as unit
cilium.operator.eni.nodes.total
(gauge)
Number of nodes by category
Shown as node
cilium.operator.eni.resync.total
(count)
Number of resync operations to synchronize AWS EC2 metadata
Shown as unit

サービスチェック

cilium.prometheus.health: Agent がメトリクスのエンドポイントに到達できない場合は CRITICAL を返します。それ以外の場合は、OK を返します。

イベント

Cilium には、イベントは含まれません。

トラブルシューティング

ご不明な点は、Datadog のサポートチームまでお問合せください。