Traefik Mesh

Supported OS Linux Windows Mac OS

Integration version1.0.1

Overview

Traefik Mesh is a lightweight and easy-to-deploy service mesh that offers advanced traffic management, security, and observability features for microservices applications, leveraging the capabilities of Traefik Proxy. With Datadog’s Traefik integration, you can:

  • Obtain insights into the traffic entering your service mesh.
  • Gain critical insights into the performance, reliability, and security of individual services within your mesh which ensures your services are operating efficiently while also helping to identify and resolve issues quickly.
  • Gain detailed insights into the internal traffic flows within your service mesh which helps monitor performance and ensure reliability.

This check monitors Traefik Mesh through the Datadog Agent.

Setup

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.

Installation

Starting from Agent release v7.55.0, the Traefik Mesh check is included in the Datadog Agent package. No additional installation is needed on your server.

Note: This check requires Agent v7.55.0 or later.

Configuration

Traefik Mesh can be configured to expose Prometheus-formatted metrics. The Datadog Agent can collect these metrics using the integration described below. Follow the instructions to configure data collection for your Traefik Mesh instances. For the required configurations to expose the Prometheus metrics, see the Observability page in the official Traefik Mesh documentation.

In addition, a small subset of metrics can be collected by communicating with different API endpoints. Specifically:

  • /api/version: Version information on the Traefik proxy.
  • /api/status/nodes: Ready status of nodes visible by the Traefik controller.
  • /api/status/readiness: Ready status of the Traefik controller.

Note: This check uses OpenMetrics for metric collection, which requires Python 3.

Containerized

Metric collection

Make sure that the Prometheus-formatted metrics are exposed in your Traefik Mesh cluster. You can configure and customize this by following the instructions on the Observability page in the official Traefik Mesh documentation. In order for the Agent to start collecting metrics, the Traefik Mesh pods need to be annotated. For more information about annotations, refer to the Autodiscovery Integration Templates for guidance. You can find additional configuration options by reviewing the traefik_mesh.d/conf.yaml sample.

Note: The following metrics can only be collected if they are available. Some metrics are generated only when certain actions are performed.

When configuring the Traefik Mesh check, you can use the following parameters:

  • openmetrics_endpoint: This parameter should be set to the location where the Prometheus-formatted metrics are exposed. The default port is 8082, but it can be configured using the --entryPoints.metrics.address. In containerized environments, %%host%% can be used for host autodetection.
  • traefik_proxy_api_endpooint: This parameter is optional. The default port is 8080 and can be configured using --entryPoints.traefik.address. In containerized environments, %%host%% can be used for host autodetection.
  • traefik_controller_api_endpoint: This parameter is optional. The default port is set to 9000.

Traefik Proxy

# (...)
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/<CONTAINER_NAME>.checks: |
      {
        "traefik_mesh": {
          "init_config": {},
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:8082/metrics",
              "traefik_proxy_api_endpoint": "http://%%host%%:8080"
            }
          ]
        }
      }      
    # (...)
spec:
  containers:
    - name: <CONTAINER_NAME>
# (...)

Traefik Controller

# (...)
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/<CONTAINER_NAME>.checks: |
      {
        "traefik_mesh": {
          "init_config": {},
          "instances": [
            {
              "traefik_controller_api_endpoint": "http://%%host%%:9000"
            }
          ]
        }
      }      
    # (...)
spec:
  containers:
    - name: <CONTAINER_NAME>
# (...)

See the sample traefik_mesh.d/conf.yaml for all available configuration options.

Log collection

Available for Agent versions >6.0

Traefik Mesh logs can be collected from the different Traefik Mesh pods through Kubernetes. Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes Log Collection.

See the Autodiscovery Integration Templates for guidance on applying the parameters below.

ParameterValue
<LOG_CONFIG>{"source": "traefik_mesh", "service": "<SERVICE_NAME>"}

Validation

Run the Agent’s status subcommand and look for traefik_mesh under the Checks section.

Data Collected

Metrics

traefik_mesh.config.last_reload.failure
(gauge)
The last config reload failure
traefik_mesh.config.last_reload.success
(gauge)
The last config reload success
traefik_mesh.config.reloads.count
(count)
The total count of configuration reloads
traefik_mesh.config.reloads.failure.count
(count)
The total count of configuration reload failures
traefik_mesh.entrypoint.open_connections
(gauge)
The current count of open connections on an entrypoint
traefik_mesh.entrypoint.request.duration.seconds.bucket
(count)
Request processing duration histogram on an entrypoint
Shown as second
traefik_mesh.entrypoint.request.duration.seconds.count
(count)
Request processing duration histogram on an entrypoint
Shown as second
traefik_mesh.entrypoint.request.duration.seconds.sum
(count)
Request processing duration histogram on an entrypoint
Shown as second
traefik_mesh.entrypoint.requests.bytes.count
(count)
The total size of HTTP requests in bytes handled by an entrypoint
traefik_mesh.entrypoint.requests.count
(count)
The total count of HTTP requests received by an entrypoint
traefik_mesh.entrypoint.requests.tls.count
(count)
The total count of HTTPS requests received by an entrypoint
traefik_mesh.entrypoint.responses.bytes.count
(count)
The total size of HTTP responses in bytes handled by an entrypoint
traefik_mesh.go.gc.duration.seconds.count
(count)
The summary count of garbage collection cycles in the Traefik Mesh instance
Shown as second
traefik_mesh.go.gc.duration.seconds.quantile
(gauge)
The summary of the pause duration of garbage collection cycles in the Traefik Mesh instance
Shown as second
traefik_mesh.go.gc.duration.seconds.sum
(count)
The sum of the pause duration of garbage collection cycles in the Traefik Mesh instance
Shown as second
traefik_mesh.go.goroutines
(gauge)
The number of goroutines that currently exist in the Traefik Mesh instance
traefik_mesh.go.info
(gauge)
A metric containing the Go version as a tag
traefik_mesh.go.memstats.alloc_bytes
(gauge)
The number of bytes allocated and still in use by the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.alloc_bytes.count
(count)
The total number of bytes allocated - even if freed - for the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.buck_hash.sys_bytes
(gauge)
The number of bytes used by the profiling bucket hash table in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.frees.count
(count)
The total number of frees in the Traefik Mesh instance
traefik_mesh.go.memstats.gc.cpu_fraction
(gauge)
The fraction of this program's available CPU time used by the GC since the program started in the Traefik Mesh instance
Shown as fraction
traefik_mesh.go.memstats.gc.sys_bytes
(gauge)
The number of bytes used for garbage collection system metadata in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.heap.alloc_bytes
(gauge)
The number of heap bytes allocated and still in use in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.heap.idle_bytes
(gauge)
The number of heap bytes waiting to be used in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.heap.inuse_bytes
(gauge)
The number of heap bytes that are in use in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.heap.objects
(gauge)
The number of allocated objects in the Traefik Mesh instance
Shown as object
traefik_mesh.go.memstats.heap.released_bytes
(gauge)
The number of heap bytes released to the OS in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.heap.sys_bytes
(gauge)
The number of heap bytes obtained from system in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.last_gc_time.seconds
(gauge)
The number of seconds since 1970 of last garbage collection in the Traefik Mesh instance
traefik_mesh.go.memstats.lookups.count
(count)
The number of pointer lookups
traefik_mesh.go.memstats.mallocs.count
(count)
The number of mallocs
traefik_mesh.go.memstats.mcache.inuse_bytes
(gauge)
The number of bytes in use by mcache structures in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.mcache.sys_bytes
(gauge)
The number of bytes used for mcache structures obtained from system in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.mspan.inuse_bytes
(gauge)
The number of bytes in use by mspan structures in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.mspan.sys_bytes
(gauge)
The number of bytes used for mspan structures obtained from system in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.next.gc_bytes
(gauge)
The number of heap bytes when next garbage collection takes place in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.other.sys_bytes
(gauge)
The number of bytes used for other system allocations in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.stack.inuse_bytes
(gauge)
The number of bytes in use by the stack allocator in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.stack.sys_bytes
(gauge)
The number of bytes obtained from system for stack allocator in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.memstats.sys_bytes
(gauge)
The number of bytes obtained from system in the Traefik Mesh instance
Shown as byte
traefik_mesh.go.threads
(gauge)
The number of OS threads created in the Traefik Mesh instance
Shown as thread
traefik_mesh.node.ready
(gauge)
The current count of ready nodes in the Traefik Mesh instance
traefik_mesh.process.cpu.seconds.count
(count)
The total user and system CPU time spent in seconds in the Traefik Mesh instance
Shown as second
traefik_mesh.process.max_fds
(gauge)
The maximum number of open file descriptors in the Traefik Mesh instance
Shown as file
traefik_mesh.process.open_fds
(gauge)
The number of open file descriptors in the Traefik Mesh instance
Shown as file
traefik_mesh.process.resident_memory.bytes
(gauge)
The resident memory size in bytes in the Traefik Mesh instance
Shown as byte
traefik_mesh.process.start_time.seconds
(gauge)
The start time of the process since unix epoch in seconds in the Traefik Mesh instance
Shown as second
traefik_mesh.process.virtual_memory.bytes
(gauge)
The virtual memory size in bytes in the Traefik Mesh instance
Shown as byte
traefik_mesh.process.virtual_memory.max_bytes
(gauge)
The maximum amount of virtual memory available in bytes in the Traefik Mesh instance
Shown as byte
traefik_mesh.router.open_connections
(gauge)
The current count of open connections for a router
traefik_mesh.router.request.duration.seconds.bucket
(count)
Request processing duration histogram for a router
Shown as second
traefik_mesh.router.request.duration.seconds.count
(count)
Request processing duration histogram for a router
Shown as second
traefik_mesh.router.request.duration.seconds.sum
(count)
Request processing duration histogram for a router
Shown as second
traefik_mesh.router.requests.bytes.count
(count)
The total size of HTTP requests in bytes handled by a router
Shown as byte
traefik_mesh.router.requests.count
(count)
The total count of HTTP requests handled by a router
traefik_mesh.router.requests.tls.count
(count)
The total count of HTTPS requests handled by a router
traefik_mesh.router.responses.bytes.count
(count)
The total size of HTTP responses in bytes handled by a router
Shown as byte
traefik_mesh.service.open_connections
(gauge)
The current count of open connections for a service
traefik_mesh.service.request.duration.seconds.bucket
(count)
Request processing duration histogram for a service
Shown as second
traefik_mesh.service.request.duration.seconds.count
(count)
Request processing duration histogram for a service
Shown as second
traefik_mesh.service.request.duration.seconds.sum
(count)
Request processing duration histogram for a service
Shown as second
traefik_mesh.service.requests.bytes.count
(count)
The total size of HTTP requests in bytes handled by a service
Shown as byte
traefik_mesh.service.requests.count
(count)
The total count of HTTP requests received by a service

Events

The Traefik Mesh integration does not include any events.

Service Checks

traefik_mesh.openmetrics.health
Returns CRITICAL if the Agent is unable to connect to the Traefik Mesh OpenMetrics endpoint, otherwise returns OK.
Statuses: ok, critical

traefik_mesh.controller.ready
Returns OK if the /api/status/readiness for the Mesh Controller returns 200, otherwise returns CRITICAL.
Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog support.